pystanで時変係数モデル

実践Data Scienceシリーズ RとStanではじめるベイズ統計モデリングによるデータ分析入門 (KS情報科学専門書)

　「RとStanで始めるベイズ統計モデリングによるデータ分析入門」「実践編第5部第4章時変係数モデル」を対象に，公開されているR，Stanのコードをpython，pystanのコードへと書き直した一例です。Stanの代わりにpystanを利用しています。

　本ページでは公開されていない書籍の内容については一切触れません。理論や詳しい説明は書籍を参照してください。

　なお，こちらで紹介しているコードには誤りが含まれる可能性があります。内容やコードについてお気づきの点等ございましたら，ご指摘いただけると幸いです。

（本ページにて紹介しているコードはgithubにて公開しています。）

DataAnalOji.hatena.sample/python_samples/stan/5-4-時変係数モデル.ipynb at master · Data-Anal-Ojisan/DataAnalOji.hatena.sample

samples for my own blog. Contribute to Data-Anal-Ojisan/DataAnalOji.hatena.sample development by creating an account on GitHub.

分析の準備

パッケージの読み込み

plotSSM関数についてはこちらを参照してください。

import arviz
import pystan
import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['font.family'] = 'Meiryo'
import seaborn as sns

# 自作のplotSSM関数を読み込み
from plotSSM import plotSSM

データの読み込み

sales_df_2 = pd.read_csv('5-4-1-sales-ts-2.csv')
sales_df_2['date'] = pd.to_datetime(sales_df_2['date'])
sales_df_2.head(n=3)

図示

# サブプロットで2つの描画領域の作成
fig = plt.figure(figsize=(15, 10))
ax1 = fig.add_subplot(2, 1, 1)
ax2 = fig.add_subplot(2, 1, 2)

# publicityのグラフを作成
ax1.plot(sales_df_2['publicity'])
ax1.set_title('publicity')

# salesのグラフを作成
ax2.plot(sales_df_2['sales'])
ax2.set_title('sales')

# グラフを描画
plt.show()

普通の単回帰モデルの適用

補足：データの作成

data_list = dict(N=len(sales_df_2),
                 sales=sales_df_2['sales'],
                 d_publicity=sales_df_2['publicity'])

display(data_list)

{'N': 100,
 'sales': 0      95.8
 1      83.6
 2      94.1
 3      98.1
 4     122.8
       ...  
 95    119.3
 96    131.9
 97    123.3
 98    109.4
 99    146.6
 Name: sales, Length: 100, dtype: float64,
 'd_publicity': 0     0
 1     0
 2     0
 3     0
 4     1
      ..
 95    1
 96    4
 97    3
 98    0
 99    4
 Name: publicity, Length: 100, dtype: int64}

補足：推定の実行

# stanコードの記述
stan_code = '''
data {
  int N;                  
  vector[N] sales;       
  vector[N] d_publicity; 
}

parameters {
  real Intercept;        
  real publicity;             
  real<lower=0> sigma;    
}

model {
  for (i in 1:N) {
    sales[i] ~ normal(Intercept + publicity*d_publicity[i], sigma);
  }
}

'''

# モデルのコンパイル
stan_model = pystan.StanModel(model_code=stan_code)

# サンプリング
linear_reg_stan = stan_model.sampling(data=data_list, seed=1, n_jobs=1)

補足：固定効果の係数のみ出力

print(linear_reg_stan.stansummary(probs=[0.025, 0.5, 0.975],
                                  pars=['Intercept', 'publicity']))

Inference for Stan model: anon_model_5adbed5f32da08d3526402b7b2c39e09.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean     sd   2.5%    50%  97.5%  n_eff   Rhat
Intercept 115.94    0.04   2.05 111.99 115.96 119.96   2214    1.0
publicity   7.38    0.02   0.74    5.9   7.37   8.81   2135    1.0

Samples were drawn using NUTS at Tue Sep  8 20:30:50 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

時点を分けた、2つの単回帰モデルの適用

データの分割

sales_df_2_head = sales_df_2.head(n=50)
sales_df_2_tail = sales_df_2.tail(n=50)

前半のモデル化

# データの作成
data_list_head = dict(N=len(sales_df_2_head),
                      sales=sales_df_2_head['sales'],
                      d_publicity=sales_df_2_head['publicity'])

# stanコードの記述
stan_code = '''
data {
  int N;                  
  vector[N] sales;       
  vector[N] d_publicity; 
}

parameters {
  real Intercept;        
  real publicity;             
  real<lower=0> sigma;    
}

model {
  for (i in 1:N) {
    sales[i] ~ normal(Intercept + publicity*d_publicity[i], sigma);
  }
}

'''

# モデルのコンパイル
stan_model = pystan.StanModel(model_code=stan_code)

# サンプリング
mod_lm_head = stan_model.sampling(data=data_list_head,
                                  seed=1, 
                                  n_jobs=1)

後半のモデル化

# データの作成
data_list_tail = dict(N=len(sales_df_2_tail),
                      sales=sales_df_2_tail['sales'],
                      d_publicity=sales_df_2_tail['publicity'])

# stanコードの記述
stan_code = '''
data {
  int N;                  
  vector[N] sales;       
  vector[N] d_publicity; 
}

parameters {
  real Intercept;        
  real publicity;             
  real<lower=0> sigma;    
}

model {
  for (i in 1:N) {
    sales[i] ~ normal(Intercept + publicity*d_publicity[i], sigma);
  }
}

'''

# モデルのコンパイル
stan_model = pystan.StanModel(model_code=stan_code)

# サンプリング
mod_lm_tail = stan_model.sampling(data=data_list_tail,
                                  seed=1, 
                                  n_jobs=1)

結果の表示

前半

print(mod_lm_head.stansummary(probs=[0.025, 0.5, 0.975],
                              pars=['Intercept', 'publicity']))

Inference for Stan model: anon_model_5adbed5f32da08d3526402b7b2c39e09.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean     sd   2.5%    50%  97.5%  n_eff   Rhat
Intercept 110.73    0.05   2.49 105.78 110.74  115.7   2416    1.0
publicity  11.62    0.03   1.31   9.03   11.6  14.23   2394    1.0

Samples were drawn using NUTS at Tue Sep  8 20:31:29 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

後半

print(mod_lm_tail.stansummary(probs=[0.025, 0.5, 0.975],
                              pars=['Intercept', 'publicity']))

Inference for Stan model: anon_model_5adbed5f32da08d3526402b7b2c39e09.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean     sd   2.5%    50%  97.5%  n_eff   Rhat
Intercept 122.29    0.09   3.57 115.16  122.3 129.06   1621    1.0
publicity   5.07    0.03   1.04   3.05   5.07   7.25   1665    1.0

Samples were drawn using NUTS at Tue Sep  8 20:32:06 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

時変係数モデルの推定

データの準備

data_list = dict(y=sales_df_2['sales'],
                 ex=sales_df_2['publicity'],
                 T=len(sales_df_2))

モデルの推定

# stanコードの記述（）
stan_code = '''
data {
  int T;        // データ取得期間の長さ
  vector[T] ex; // 説明変数
  vector[T] y;  // 観測値
}

parameters {
  vector[T] mu;       // 水準成分の推定値
  vector[T] b;        // 時変係数の推定値
  real<lower=0> s_w;  // 水準成分の過程誤差の標準偏差
  real<lower=0> s_t;  // 時変係数の変動の大きさを表す標準偏差
  real<lower=0> s_v;  // 観測誤差の標準偏差
}

transformed parameters {
  vector[T] alpha;        // 各成分の和として得られる状態推定値
  
  for(i in 1:T) {
    alpha[i] = mu[i] + b[i] * ex[i];
  }

}

model {
  // 状態方程式に従い、状態が遷移する
  for(i in 2:T) {
    mu[i] ~ normal(mu[i-1], s_w);
    b[i] ~ normal(b[i-1], s_t);
  }
  
  // 観測方程式に従い、観測値が得られる
  for(i in 1:T) {
    y[i] ~ normal(alpha[i], s_v);
  }

}

'''

# モデルのコンパイル
stan_model = pystan.StanModel(model_code=stan_code)

# サンプリング
time_varying_coef_stan = stan_model.sampling(data=data_list,
                                             seed=1, 
                                             iter=8000,
                                             warmup=2000,
                                             thin=6,
                                             n_jobs=1)

結果の表示

print(time_varying_coef_stan.stansummary(probs=[0.025, 0.5, 0.975],
                                         pars=['s_w', 's_t', 's_v', 'b[100]']))

Inference for Stan model: anon_model_8b20732ce01b4a8a224224c9130169c3.
4 chains, each with iter=8000; warmup=2000; thin=6; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

         mean se_mean     sd   2.5%    50%  97.5%  n_eff   Rhat
s_w      3.74    0.05   1.25   1.54   3.65   6.59    611    1.0
s_t      1.01    0.02   0.36   0.47   0.96   1.88    562   1.01
s_v      9.58    0.03   1.08   7.62    9.5  11.86   1331    1.0
b[100]   6.33    0.04    2.2    2.1   6.27  10.76   3123    1.0

Samples were drawn using NUTS at Tue Sep  8 20:34:42 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

参考：収束の確認など

# 収束確認用のRhatのプロット関数
def mcmc_rhat(dataframe, column='Rhat', figsize=(5, 10)):
    plt.figure(figsize=figsize)
    plt.hlines(y=dataframe[column].sort_values().index,
               xmin=1,
               xmax=dataframe[column].sort_values(),
               color='b',
               alpha=0.5)
    plt.vlines(x=1.05, ymin=0, ymax=len(dataframe[column]), linestyles='--')
    plt.plot(dataframe[column].sort_values().values,
             dataframe[column].sort_values().index,
             marker='.',
             linestyle='None',
             color='b',
             alpha=0.5)
    plt.yticks(color='None')
    plt.tick_params(length=0)
    plt.xlabel(column)
    plt.show()


# 各推定結果のデータフレームを作成
summary = pd.DataFrame(
    time_varying_coef_stan.summary()['summary'],
    columns=time_varying_coef_stan.summary()['summary_colnames'],
    index=time_varying_coef_stan.summary()['summary_rownames'])

# プロット
mcmc_rhat(summary)

print('hmc_diagnostics of local_level:\n',
      pystan.diagnostics.check_hmc_diagnostics(time_varying_coef_stan))

hmc_diagnostics of local_level:
 {'n_eff': True, 'Rhat': True, 'divergence': False, 'treedepth': True, 'energy': True}

参考：トレースなどのチェック

すべての推定値を出力

print(time_varying_coef_stan.stansummary(probs=[0.025, 0.5, 0.975]))

Inference for Stan model: anon_model_8b20732ce01b4a8a224224c9130169c3.
4 chains, each with iter=8000; warmup=2000; thin=6; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

             mean se_mean     sd   2.5%    50%  97.5%  n_eff   Rhat
mu[1]        96.6    0.13   5.85  85.12  96.63 108.42   1936    1.0
…省略…
lp__       -488.8    2.22  45.08 -569.8 -491.8 -395.2    413   1.01

Samples were drawn using NUTS at Tue Sep  8 20:34:42 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

推定された状態の図示

MCMCサンプルの取得

mcmc_sample = time_varying_coef_stan.extract()

図示

plotSSM関数についてはこちらを参照してください。

fig, ax = plt.subplots(3, 1, figsize=(15, 15))

p_all = plotSSM(mcmc_sample=mcmc_sample,
                time_vec=sales_df_2['date'],
                obs_vec=sales_df_2['sales'],
                state_name='alpha',
                graph_title='推定結果：状態',
                y_label='sales',
                axes=ax[0])

p_mu = plotSSM(mcmc_sample=mcmc_sample,
                time_vec=sales_df_2['date'],
                obs_vec=sales_df_2['sales'],
                state_name='mu',
                graph_title='推定結果:集客効果を除いた',
                y_label='sales',
                axes=ax[1])

p_b = plotSSM(mcmc_sample=mcmc_sample,
                time_vec=sales_df_2['date'],
                state_name='b',
                graph_title='推定結果:集客効果の変遷',
                y_label='sales',
                axes=ax[2])
plt.show()