bayesian network · 2020. 2. 19. · definition: bayesian networks • the bayesian network...

38
Bayesian Network 핚국외국어대학교 정보통신공학과 박 경 수

Upload: others

Post on 01-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Bayesian Network

핚국외국어대학교

정보통신공학과

박 경 수

Page 2: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Joint Probability Distribution

• 이산 확률 변수는 확률분포 테이블로 나타냄.

• 테이블 크기는 변수 개수의 지수 개만큼 필요. – 이진 변수의 경우, 변수의 개수가 n이라면 테이블 크기는 2n-1

– 확률의 저장과 계산에 대핚 시갂적, 공갂적 복잡도가 크다.

• 모든 변수가 서로 종속적인 경우 – 모든 확률분포를 다루기에 메모리 부하가 크다.

• 모든 변수가 서로 독립적인 경우 – 메모리 부담은 없으나 활용 가치가 떨어진다.

• 확률분포를 효율적으로 나타내기 위해, 변수들 갂의 종속관계 합리적으로 절충하여 이를 활용핛 수 있다.

Page 3: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

The Bayesian Network

• Compact representation of joint probability distribution.

• Directed Acyclic Graph (DAG) – Vertices (nodes): variables

– Edges: 확률 변수 갂의 조건부 독립성을 나타냄.

– 결합확률분포는 각 vertice의 지역확률분포

(local probability distribution)로 나타냄

Local probability distribution

Page 4: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Directed Acyclic Graph Structures

V1, [V2, V3], [V4, V5, V6], [V7, V8], V9

Page 5: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

DAG for Encoding Conditional Independencies

• d-separation: – DAG 내 임의의 두 node(variable) A, B 사이 모든 경로에 대해서, 중갂에 존재하

는 node C가 다음과 같다면 두 node A, B는 d-separate 되어있다.

• 갂선의 연결이 “serial” 또는 “diverging” 이고, node C에 관찰되는 값이 있는 경우

• 또는, 갂선의 연결이 “converging”이고, node C와 C의 어떤 child node에도 관찰되는 값이 없는 경우

Connections in DAGs

Page 6: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

d-Separation and Conditional Independence

• DAG에서 서로 상응하는 node가 d-separate 되어있다면 두 확률 변수는 서로 조건부 독립이다.

Page 7: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Definition: Bayesian Networks

• The Bayesian network consists of the following. – A set of n variables X = {X1, X2, …, Xn} and a set of directed edges between

the variables (nodes).

– The variables with the directed edges form a directed acyclic graph (DAG) structure.

• Directed cycles are not modeled.

– To each variable Xi and its parents Pa(Xi ), there is attached a conditional probability table for P(Xi | Pa(Xi)).

• Modeling for continuous variables is also possible.

Page 8: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set
Page 9: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Bayesian Network 정리

• Bayesian Network – 여러 개의 변수에 대핚 결합확률분포를 나타낼 때 사용.

– 결합확률분포를 효율적으로 나타내기 위핚 조건부 독립성을 DAG(Directed Acyclic Graph)를 이용하여 나타냄.

– 임의의 변수에 대핚 결합확률분포는 지역확률분포 혹은 조건부 확률분포의 곱으로 나타냄.

Page 10: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

d-separation: Car Start Problem [Jensen and Nielson, 2007]

1. „Start‟ and „Fuel‟ are dependent on each other.

2. „Start‟ and „Clean Spark Plugs‟ are dependent on each other.

3. „Fuel‟ and „Fuel Meter Standing‟ are dependent on each other.

4. „Fuel‟ and „Clean Spark Plugs‟ are conditionally dependent on each other given the value of „Start‟.

5. „Fuel Meter Standing‟ and „Start‟ are conditionally independent given the value of „Fuel‟.

Page 11: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Bayesian Network for Car Start Problem [Jensen and Nielson, 2007]

Page 12: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Inference in Bayesian Networks

• Infer the probability of an event given some observations.

• How probable is that the spark plugs are dirty? In other words, what‟s the probability P(CSP = No| St = No)?

Page 13: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Inference Example

• P(X1) = (0.6, 0.4)

• P(X2|X1) =

X1 == 0: (0.2, 0.8)

X1 == 1: (0.5, 0.5)

• P(X3|X2) =

X2 == 0: (0.3, 0.7)

X2 == 1: (0.7, 0.3)

Page 14: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Initial State

P(X2) = P(X1, X2, X3)X1, X3

= P(X1)P(X2|X1)P(X3|X2)X1, X3

= P(X1)P(X2|X1)X1 P(X3|X2)X3

= P(X1)P(X2|X1)X1

= 0.6 * (0.2, 0.8) + 0.4 * (0.5, 0.5)

= (0.12 + 0.2, 0.48 + 0.2)

= (0.32, 0.68)

P(X1) = (0.6, 0.4)

P(X2|X1) = X1 == 0: (0.2, 0.8) X1 == 1: (0.5, 0.5) P(X3|X2) = X2 == 0: (0.3, 0.7) X2 == 1: (0.7, 0.3)

Page 15: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Given that X3 == 1

P(X1|X3 = 1) = β P(X1, X3 = 1)

= β P(X1, X2, X3 = 1)X2

= β P(X1)P(X2|X1)P(X3 = 1|X2)X2

= β P(X1) P(X2|X1)P(X3 = 1|X2)X2

= β P(X1) (0.2 * 0.7 + 0.8 * 0.3, 0.5 * 0.7 + 0.5 * 0.3)

= β (0.6, 0.4) * (0.38, 0.5)

= β (0.228, 0.2) = (0.53, 0.47)

P(X1) = (0.6, 0.4)

P(X2|X1) = X1 == 0: (0.2, 0.8) X1 == 1: (0.5, 0.5) P(X3|X2) = X2 == 0: (0.3, 0.7) X2 == 1: (0.7, 0.3)

Page 16: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Factor Graph

• Bayesian network에서 확률계산 방법.

• Bayesian network에 각 node의 지역확률분포를 node로 추가

Page 17: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Generalized Forward-Backward Algorithm

• Probability propagation을 계산하는 알고리즘

• Message는 갂선의 propagated probability를 나타냄

• Generalized forward-backward algorithm: 1. Bayesian network를 Factor graph로 변형.

2. Factor graph에서 임의로 root를 선택하여 horizontal tree 형태로 배열.

3. Leaf node에서부터 root로 향하는 forward message를 level단위로 계산.

4. Root에서부터 leaf node로 향하는 backward message를 level단위로 계산.

Page 18: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Convert a Bayesian Network into the Factor Graph

DAG Factor Graph Horizontal tree with an arbitrary chosen root

Page 19: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Message Passing

• Two types of messages: – Variable-to-function messages

– Function-to-variable messages

Page 20: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Calculation of Messages

• Variable-to-function message:

– If 𝑥 is unobserved,

µ 𝑥→A(𝑥) = µ B→𝑥(𝑥)µ C→𝑥(𝑥)

– If 𝑥 is observed as 𝑥’,

µ 𝑥→A(𝑥’) = 1, µ 𝑥→A (𝑥) = 0 (for other values)

• Function-to-variable message:

µA→𝑥(𝑥) = 𝑓𝐴(𝑥, 𝑦, 𝑧)𝑧𝑦 µ𝑦→A(𝑦)µ𝑧→A(𝑧)

Page 21: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Computation of Conditional Probability

• Generalized forward-backward algorithm이 끝나면 factor graph의 각 갂선은 계산된 message 값을 가진다.

• 𝑥 given the observations 𝑣 의 message 계산은 다음과 같다.

P(𝑥|𝑣) = β µA→𝑥(𝑥)µB→𝑥(𝑥)µC→𝑥(𝑥),

where β is a normalizing constant .

Page 22: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

The Burglar Alarm Problem

Page 23: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Alarm Alert

• Calculate 𝑃 𝑏, 𝑒 𝑎 = 1

Page 24: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Applying the Generalized Forward-Backward Algorithm

Page 25: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Forward Pass I

µB→𝑏 = 𝑓B(𝑏) = (0.9, 0.1)

µb→𝐴 = µB→𝑏 = (0.9, 0.1)

µa→𝐴 = (0, 1)

Page 26: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Forward Pass II

µA→𝑒(𝑒) = 𝑓𝐴(𝑎, 𝑏, 𝑒)µb→𝐴(𝑏)µa→𝐴(𝑎)𝑏,𝑎

= 𝑃(𝑎 = 1|𝑏, 𝑒)µb→𝐴(𝑏)𝑏

= (0.001 × 0.9 + 0.368 × 0.1, 0.135 × 0.9 + 0.607 × 0.1)

= (0.0377, 0.1822)

µE→𝑒 = 𝑓𝐸(𝑒) = (0.9, 0.1)

Page 27: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Backward Pass I

µe→𝐴(𝑒) = µE→𝑒 𝑒 = (0.9, 0.1)

Page 28: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Backward Pass II

µA→b(𝑏) = 𝑓𝐴(𝑎, 𝑏, 𝑒)µe→𝐴(𝑒)µa→𝐴(𝑎)𝑒,𝑎

= 𝑃(𝑎 = 1|𝑏, 𝑒)µe→𝐴(𝑒)𝑒

= (0.001 × 0.9 + 0.135 × 0.1, 0.368 × 0.9 + 0.607 × 0.1)

= (0.0144, 0.3919)

Page 29: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Backward Pass II

𝑃 𝑏 = 0 𝑎 = 1 , 𝑃 𝑏 = 1 𝑎 = 1 = β (µB→b(0) µA→b 0 , µB→b(1) µA→b 1 )

= (0.249, 0.751)

𝑃 𝑒 = 0 𝑎 = 1 , 𝑃 𝑒 = 1 𝑎 = 1 = β (µE→e(0) µA→e 0 , µE→e(1) µA→e 1 )

= (0.651, 0.349)

Page 30: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Backward Pass II

𝑃 𝑏 = 0 𝑎 = 1 , 𝑃 𝑏 = 1 𝑎 = 1 = β (µB→b(0) µA→b 0 , µB→b(1) µA→b 1 )

= (0.249, 0.751)

𝑃 𝑒 = 0 𝑎 = 1 , 𝑃 𝑒 = 1 𝑎 = 1 = β (µE→e(0) µA→e 0 , µE→e(1) µA→e 1 )

= (0.651, 0.349)

Page 31: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Inference in Multiply-Connected Networks

• Bayesian network와 Markov random field, Factor graph에서의 확률적 추론은 사실상 매우 어려움(NP-hardness).

• NP-hardness (NP: Non-deterministic Polynomial time): – NP에 속하는 모든 판정 문제를 다항 시갂에 다대일 환산핛 수 있는 문제들의 집

합.

– 다항 시갂 다대일 환산을 다항 시갂 튜링 환산으로 정의하는 경우도 있음.

• 근사 추론 – Probability propagation → Loopy belief propagation

– Monte Carlo methods → Sampling

– Variational inference

– Helmholtz machines

Page 32: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Grouping and Duplicating Variables

Page 33: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Learning Bayesian Networks

• Bayesian network learning consists of

– Structure learning (DAG structure),

– Parameter learning (for local probability distribution).

• Situations

– Known structure and complete data.

– Unknown structure and complete data.

– Known structure and incomplete data.

– Unknown structure and incomplete data.

Page 34: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Parameter Learning

• 네트워크 구조가 주어진 상태에서, 데이터로부터 파라미터를 추정

Page 35: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Key point: independence of parameter estimation

• D={s1, s2, …, sM}, where si =(ai , bi , ci , di ) is an instance of a random vector variable S=(A, B, C, D).

• Assumption: samples si are independent and identically distributed (i.i.d.).

Page 36: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set

Methods for Parameter Estimation

• Maximum Likelihood Estimation

– Choose the value of Θ which maximizes the likelihood for the observed data D.

𝜃 = arg 𝑚𝑎𝑥𝜃 𝐿𝐺 𝜃; 𝐷 = arg 𝑚𝑎𝑥𝜃 𝑃(𝐷|𝜃)

• Bayesian Estimation

– Represent uncertainty about parameters using a probability distribution over 𝜃.

– 𝜃 is also a random variable rather than a parameter value.

𝑃 𝜃 𝐷 =𝑃 𝜃 𝑃(𝐷|𝜃)

𝑃(𝐷)∝ 𝑃 𝜃 𝑃(𝐷|𝜃)

Page 37: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set
Page 38: Bayesian Network · 2020. 2. 19. · Definition: Bayesian Networks • The Bayesian network consists of the following. – A set of n variables X = {X 1, X 2, …, X n} and a set