hamiltonian monte carloによるベイズ推定ソフトウェアstanの紹介

Hamiltonian Monte Carloによるベイズ推定ソフトウェアStanの

紹介

森林総合研究所森林植生研究領域伊東宏樹

2014-07-28 4領域セミナー

本日の話題

• Hamiltonian Monte Carlo (HMC)!

• Stan

Hamiltonian Monte Carlo• 通常のMCMC (Markov chain Monte Carlo)との違い

• 「運動量」パラメーターをつかって、推定値を更新する。

• Hybrid Monte Carloともよばれる。

• MCMC + deterministic simulation method!

• 収束がはやくなる。

通常のMCMC（Metropolis-Hastingsアルゴリズム）

事後確率

推定値0

ランダムウォーク（+採択/不採択）

HMC事後確率

推定値0

上り勾配では速く

下り勾配では遅く「運動量」をもって

動く（+採択/不採択）

http://mc-stan.org/

Sampling through adaptive neighborhoods (Gelman et al. 2013)

Stanisław Ulam (1909–1984)http://en.wikipedia.org/wiki/Stanislaw_Ulam#mediaviewer/File:STAN_ULAM_HOLDING_THE_FERMIAC.jpg

Stan開発履歴2012年8月　 1.0リリース2012年12月 1.1リリース2013年3月 1.2リリース2013年10月 2.0リリース2013年12月 2.1リリース2014年2月 2.2リリース2014年6月 2.3リリース2014年7月 2.4リリース

Stan 3種の実装• RStan!

• Rから!

• PyStan!• Pythonから!

• CmdStan!• シェルコマンドラインから

Stanの特長

• Hamiltonian Monte Carlo, No-U-Turn Samplerによる事後分布サンプリング

• モデルコードをC++に変換し、さらにネイティブバイナリにコンパイルして実行

BUGSとの違い• データ・パラメーターは型を指定して宣言

• 分布の名前などが違う

• 命令文は書かれた順に実行される

• 離散値パラメーターの推定は（いまのところ）できない（HMCの制約）

変数型整数 int

実数 real

ベクトル vector

行ベクトル row_vector

行列 matrix などなど

久保緑本*10章10.3節のモデルで比較BUGS model!{! for (i in 1:N) {! Y[i] ~ dbin(q[i], 8)! logit(q[i]) <- beta + r[i]! }! beta ~ dnorm(0, 1.0E-4)! for (i in 1:N) {! r[i] ~ dnorm(0, tau)! }! tau <- 1 / (s * s)! s ~ dunif(0, 1.0E+4)!}!!http://hosho.ees.hokudai.ac.jp/~kubo/stat/iwanamibook/fig/hbm/model.bug.txt

Stan data {! int<lower=0> N;! int<lower=0> Y[N];!}!parameters {! real beta;! real r[N];! real<lower=0> s;!}!transformed parameters {! real q[N];!! for (i in 1:N) {! q[i] <- inv_logit(beta + r[i]);! }!}!model {! for (i in 1:N) {! Y[i] ~ binomial(8, q[i]);! }! beta ~ normal(0, 1.0e+2);! r ~ normal(0, s);! s ~ uniform(0, 1.0e+4);!}

*久保拓弥(2012) データ解析のための統計モデリング入門, 岩波書店

分布名の違いの例BUGS Stan

正規分布 dnorm( normal(μ, σ)

二項分布 dbin(p, N) binomial(N, p)

ポアソン分布 dpois(λ) poisson(λ)

一様分布 dunif(α, β) uniform(α, β)

実行例> library(rstan)! 要求されたパッケージ Rcpp をロード中です ! 要求されたパッケージ inline をロード中です !! 次のパッケージを付け加えます: ‘inline’ !! 以下のオブジェクトはマスクされています (from ‘package:Rcpp’) : !! registerPlugin !!rstan (Version 2.2.0, packaged: 2014-02-14 04:29:17 UTC, GitRev: 52d7b230aaa0)!> !> # read data!> d <- read.csv(url("http://hosho.ees.hokudai.ac.jp/~kubo/stat/iwanamibook/fig/hbm/data7a.csv"))!> !

久保緑本10章10.3節のモデルを例に

> model <- '!+ data {!+ int<lower=0> N; // sample size!+ int<lower=0> Y[N]; // response variable!+ }!+ parameters {!+ real beta;!+ real r[N];!+ real<lower=0> s;!+ }!+ transformed parameters {!+ real q[N];!+ !+ for (i in 1:N) {!+ q[i] <- inv_logit(beta + r[i]); // 生存確率!+ }!+ }!+ model {!+ for (i in 1:N) {!+ Y[i] ~ binomial(8, q[i]); // 二項分布!+ }!+ beta ~ normal(0, 1.0e+2); // 無情報事前分布!+ r ~ normal(0, s); // 階層事前分布!+ s ~ uniform(0, 1.0e+4); // 無情報事前分布!+ }'!>

> data <- list(N = nrow(d), Y = d$y)!> !> fit <- stan(model_code = model, data = data, pars = c("beta", "s"),!+ warmup = 100, iter = 10100, thin = 10, chains = 3)!!TRANSLATING MODEL 'model' FROM Stan CODE TO C++ CODE NOW.!COMPILING THE C++ CODE FOR MODEL 'model' NOW.!SAMPLING FOR MODEL 'model' NOW (CHAIN 1).!Iteration: 10100 / 10100 [100%] (Sampling)!Elapsed Time: 0.222008 seconds (Warm-up)! 18.2454 seconds (Sampling)! 18.4674 seconds (Total)!![…]!!SAMPLING FOR MODEL 'model' NOW (CHAIN 3).!Iteration: 10100 / 10100 [100%] (Sampling)!Elapsed Time: 0.249295 seconds (Warm-up)! 18.7859 seconds (Sampling)! 19.0352 seconds (Total)

結果> print(fit, digits = 2)!Inference for Stan model: model.!3 chains, each with iter=10100; warmup=100; thin=10; !post-warmup draws per chain=1000, total post-warmup draws=3000.!! mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat!beta 0.04 0.01 0.33 -0.60 -0.18 0.04 0.27 0.69 3000 1!s 3.04 0.01 0.37 2.41 2.78 3.02 3.27 3.80 2807 1!lp__ -443.66 0.18 9.50 -463.66 -449.84 -443.33 -437.14 -425.83 2805 1!!Samples were drawn using NUTS(diag_e) at Tue Jun 17 15:45:43 2014.!For each parameter, n_eff is a crude measure of effective sample size,!and Rhat is the potential scale reduction factor on split chains (at !convergence, Rhat=1).

> traceplot(fit)

OpenBUGSとの速度比較• 久保緑本10章10.3節のモデル

• chain: 3

• iteration: 10000

• burn-in (warmup): 100

• thin: 10

• Mac Pro (2.8 GHz Quad-Core Intel Xeon)

• OS X 10.9.3

• OpenBUGS 3.2.2

• RStan/CmdStan 2.2.0

OpenBUGSとの速度比較OpenBUGS

CmdStan （並列実行）

0 75 150 225 300

compilerun

実行時間（秒）

*OpenBUGSの実行時間にはWineのオーバーヘッド含む。ただし、Linux上でLinux版OpenBUGSとStanとを比較してもほぼ同様の結果。

実行例(2)

Zero-Inflated Negative Binomial Model

0 1 2 3 4 5 6 7 8 9 11 13 15 17

40 0が多いデータ

モデルθ: 存在確率 α, β: 負の二項分布のパラメーター

p(y|� ,�,� ) =

�(1��)+� �NegBin(0|�,� ) (y = 0)

� �NegBin(y|� ,� ) (y > 0)

data {! int<lower=1> N;! int<lower=0> y[N];!}!parameters {! real<lower=0, upper=1> theta;! real<lower=0> alpha;! real<lower=0> beta;!}!model {! // priors! theta ~ uniform(0, 1);! alpha ~ uniform(0, 100);! beta ~ uniform(0, 100);! for (i in 1:N) {! if (y[i] == 0) {! increment_log_prob(log((1 - theta)! + theta * (beta / (beta + 1))^alpha));! } else {! increment_log_prob(bernoulli_log(1, theta)! + neg_binomial_log(y[i], alpha, beta));! }! }!}!generated quantities {! real mu;! mu <- alpha / beta;!}

increment_log_prob()

• increment_log_prob(x)は、対数事後確率にxを加算する

• 既存の分布では尤度を記述できないときなど

Rコード

library(rstan)!model <- stan_model("zinb.stan")!fit <- sampling(model, data = list(N = n, y = y),! chains = 4,! iter = 2000, warmup = 1000, thin = 1)

コンパイルとサンプリングを別に実行することもできる。

結果> print(fit, digits = 2)!Inference for Stan model: zinb.!4 chains, each with iter=2000; warmup=1000; thin=1; !post-warmup draws per chain=1000, total post-warmup draws=4000.!! mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat!theta 0.68 0.00 0.09 0.53 0.62 0.67 0.73 0.88 601 1.00!alpha 1.24 0.02 0.50 0.49 0.87 1.16 1.53 2.39 581 1.00!beta 0.27 0.00 0.10 0.13 0.20 0.26 0.33 0.48 639 1.00!mu 4.53 0.02 0.73 3.09 4.05 4.52 4.99 6.00 1797 1.00!lp__ -218.77 0.05 1.34 -222.19 -219.37 -218.42 -217.79 -217.26 693 1.01!!Samples were drawn using NUTS(diag_e) at Sun Jul 13 10:54:12 2014.!For each parameter, n_eff is a crude measure of effective sample size,!and Rhat is the potential scale reduction factor on split chains (at !convergence, Rhat=1).

まとめ• Stanのよいところ

• （一般に）実行速度が速い。

• （原理的に）通常のMCMCより収束がよい。

• Stanのよくないところ

• 離散値パラメーターの推定ができない。

• 参考文献などまだ少ない。

参考文献

• Gelman A. et al. (2013) Bayesian Data Analysis (3rd ed.) Chapman & Hall/CRC!

• Stan Development Team (2014) Stan Modeling Language User’s Guide and Reference Manual version 2.3 http://mc-stan.org/manual.html

ネット上の解説• Teito Nakagawa: Stanチュートリアル

• http://www.slideshare.net/teitonakagawa/stantutorialj!

• Takashi J. Ozaki: MCMCの計算にStanを使ってみた（超基礎・導入編） • http://tjo.hatenablog.com/entry/2013/11/06/201735!

• berobero11: Stanのマニュアルの8章～12章の私的メモ • http://heartruptcy.blog.fc2.com/blog-entry-88.html

hamiltonian monte carloによるベイズ推定ソフトウェアstanの紹介

theta

fit

gt

int

incrementlogprob

Science

positive curvature and hamiltonian monte carlo · positive...

the hamiltonian structure of general relativistic perfect...

birkhoff coordinates of integrable hamiltonian...

hamiltonian path integrals in white noise analysis ›...

zhakarov hamiltonian

prml 10.1節 ~ 10.3節 - 変分ベイズ法

n. bilic - "hamiltonian method in the braneworld" 2/3

hamiltonian formalism, regge-teitelboim charges and...

two new classes of hamiltonian graphs

hamiltonian systems of charged particles

第3章変分近似法...

stabilisation of in nite-dimensional port-hamiltonian...

electromagnetic radiation and matter copyrighted...

exchange interaction model hamiltonian for strongly

symplectic reduction for finite-dimensional hamiltonian...

birkhoff coordinates of integrable hamiltonian systems in...

nekhoroshev estimates quasi-convex hamiltonian systems ·...

lagrangian and hamiltonian dynamics

capita selecta hamiltonian mechanics in company of...

ベイズ chow-liu アルゴリズム