bayesian network · 2020. 2. 19. · definition: bayesian networks • the bayesian network...

Bayesian Network

핚국외국어대학교

정보통신공학과

박 경 수

Joint Probability Distribution

• 이산 확률 변수는 확률분포 테이블로 나타냄.

• 테이블 크기는 변수 개수의 지수 개만큼 필요. – 이진 변수의 경우, 변수의 개수가 n이라면 테이블 크기는 2n-1

– 확률의 저장과 계산에 대핚 시갂적, 공갂적 복잡도가 크다.

• 모든 변수가 서로 종속적인 경우 – 모든 확률분포를 다루기에 메모리 부하가 크다.

• 모든 변수가 서로 독립적인 경우 – 메모리 부담은 없으나 활용 가치가 떨어진다.

• 확률분포를 효율적으로 나타내기 위해, 변수들 갂의 종속관계 합리적으로 절충하여 이를 활용핛 수 있다.

The Bayesian Network

• Compact representation of joint probability distribution.

• Directed Acyclic Graph (DAG) – Vertices (nodes): variables

– Edges: 확률 변수 갂의 조건부 독립성을 나타냄.

– 결합확률분포는 각 vertice의 지역확률분포

(local probability distribution)로 나타냄

Local probability distribution

Directed Acyclic Graph Structures

V1, [V2, V3], [V4, V5, V6], [V7, V8], V9

DAG for Encoding Conditional Independencies

• d-separation: – DAG 내 임의의 두 node(variable) A, B 사이 모든 경로에 대해서, 중갂에 존재하

는 node C가 다음과 같다면 두 node A, B는 d-separate 되어있다.

• 갂선의 연결이 “serial” 또는 “diverging” 이고, node C에 관찰되는 값이 있는 경우

• 또는, 갂선의 연결이 “converging”이고, node C와 C의 어떤 child node에도 관찰되는 값이 없는 경우

Connections in DAGs

d-Separation and Conditional Independence

• DAG에서 서로 상응하는 node가 d-separate 되어있다면 두 확률 변수는 서로 조건부 독립이다.

Definition: Bayesian Networks

• The Bayesian network consists of the following. – A set of n variables X = {X1, X2, …, Xn} and a set of directed edges between

the variables (nodes).

– The variables with the directed edges form a directed acyclic graph (DAG) structure.

• Directed cycles are not modeled.

– To each variable Xi and its parents Pa(Xi ), there is attached a conditional probability table for P(Xi | Pa(Xi)).

• Modeling for continuous variables is also possible.

Bayesian Network 정리

• Bayesian Network – 여러 개의 변수에 대핚 결합확률분포를 나타낼 때 사용.

– 결합확률분포를 효율적으로 나타내기 위핚 조건부 독립성을 DAG(Directed Acyclic Graph)를 이용하여 나타냄.

– 임의의 변수에 대핚 결합확률분포는 지역확률분포 혹은 조건부 확률분포의 곱으로 나타냄.

d-separation: Car Start Problem [Jensen and Nielson, 2007]

1. „Start‟ and „Fuel‟ are dependent on each other.

2. „Start‟ and „Clean Spark Plugs‟ are dependent on each other.

3. „Fuel‟ and „Fuel Meter Standing‟ are dependent on each other.

4. „Fuel‟ and „Clean Spark Plugs‟ are conditionally dependent on each other given the value of „Start‟.

5. „Fuel Meter Standing‟ and „Start‟ are conditionally independent given the value of „Fuel‟.

Bayesian Network for Car Start Problem [Jensen and Nielson, 2007]

Inference in Bayesian Networks

• Infer the probability of an event given some observations.

• How probable is that the spark plugs are dirty? In other words, what‟s the probability P(CSP = No| St = No)?

Inference Example

• P(X1) = (0.6, 0.4)

• P(X2|X1) =

X1 == 0: (0.2, 0.8)

X1 == 1: (0.5, 0.5)

• P(X3|X2) =

X2 == 0: (0.3, 0.7)

X2 == 1: (0.7, 0.3)

Given that X3 == 1

P(X1|X3 = 1) = β P(X1, X3 = 1)

= β P(X1, X2, X3 = 1)X2

= β P(X1)P(X2|X1)P(X3 = 1|X2)X2

= β P(X1) P(X2|X1)P(X3 = 1|X2)X2

= β P(X1) (0.2 * 0.7 + 0.8 * 0.3, 0.5 * 0.7 + 0.5 * 0.3)

= β (0.6, 0.4) * (0.38, 0.5)

= β (0.228, 0.2) = (0.53, 0.47)

P(X1) = (0.6, 0.4)

P(X2|X1) = X1 == 0: (0.2, 0.8) X1 == 1: (0.5, 0.5) P(X3|X2) = X2 == 0: (0.3, 0.7) X2 == 1: (0.7, 0.3)

Factor Graph

• Bayesian network에서 확률계산 방법.

• Bayesian network에 각 node의 지역확률분포를 node로 추가

Generalized Forward-Backward Algorithm

• Probability propagation을 계산하는 알고리즘

• Message는 갂선의 propagated probability를 나타냄

• Generalized forward-backward algorithm: 1. Bayesian network를 Factor graph로 변형.

2. Factor graph에서 임의로 root를 선택하여 horizontal tree 형태로 배열.

3. Leaf node에서부터 root로 향하는 forward message를 level단위로 계산.

4. Root에서부터 leaf node로 향하는 backward message를 level단위로 계산.

Convert a Bayesian Network into the Factor Graph

DAG Factor Graph Horizontal tree with an arbitrary chosen root

Message Passing

• Two types of messages: – Variable-to-function messages

– Function-to-variable messages

Calculation of Messages

• Variable-to-function message:

– If 𝑥 is unobserved,

µ 𝑥→A(𝑥) = µ B→𝑥(𝑥)µ C→𝑥(𝑥)

– If 𝑥 is observed as 𝑥’,

µ 𝑥→A(𝑥’) = 1, µ 𝑥→A (𝑥) = 0 (for other values)

• Function-to-variable message:

µA→𝑥(𝑥) = 𝑓𝐴(𝑥, 𝑦, 𝑧)𝑧𝑦 µ𝑦→A(𝑦)µ𝑧→A(𝑧)

Computation of Conditional Probability

• Generalized forward-backward algorithm이 끝나면 factor graph의 각 갂선은 계산된 message 값을 가진다.

• 𝑥 given the observations 𝑣 의 message 계산은 다음과 같다.

P(𝑥|𝑣) = β µA→𝑥(𝑥)µB→𝑥(𝑥)µC→𝑥(𝑥),

where β is a normalizing constant .

The Burglar Alarm Problem

Alarm Alert

• Calculate 𝑃 𝑏, 𝑒 𝑎 = 1

Applying the Generalized Forward-Backward Algorithm

Forward Pass I

µB→𝑏 = 𝑓B(𝑏) = (0.9, 0.1)

µb→𝐴 = µB→𝑏 = (0.9, 0.1)

µa→𝐴 = (0, 1)

Forward Pass II

µA→𝑒(𝑒) = 𝑓𝐴(𝑎, 𝑏, 𝑒)µb→𝐴(𝑏)µa→𝐴(𝑎)𝑏,𝑎

= 𝑃(𝑎 = 1|𝑏, 𝑒)µb→𝐴(𝑏)𝑏

= (0.001 × 0.9 + 0.368 × 0.1, 0.135 × 0.9 + 0.607 × 0.1)

= (0.0377, 0.1822)

µE→𝑒 = 𝑓𝐸(𝑒) = (0.9, 0.1)

Backward Pass I

µe→𝐴(𝑒) = µE→𝑒 𝑒 = (0.9, 0.1)

Backward Pass II

µA→b(𝑏) = 𝑓𝐴(𝑎, 𝑏, 𝑒)µe→𝐴(𝑒)µa→𝐴(𝑎)𝑒,𝑎

= 𝑃(𝑎 = 1|𝑏, 𝑒)µe→𝐴(𝑒)𝑒

= (0.001 × 0.9 + 0.135 × 0.1, 0.368 × 0.9 + 0.607 × 0.1)

= (0.0144, 0.3919)

Backward Pass II

𝑃 𝑏 = 0 𝑎 = 1 , 𝑃 𝑏 = 1 𝑎 = 1 = β (µB→b(0) µA→b 0 , µB→b(1) µA→b 1 )

= (0.249, 0.751)

𝑃 𝑒 = 0 𝑎 = 1 , 𝑃 𝑒 = 1 𝑎 = 1 = β (µE→e(0) µA→e 0 , µE→e(1) µA→e 1 )

= (0.651, 0.349)

Inference in Multiply-Connected Networks

• Bayesian network와 Markov random field, Factor graph에서의 확률적 추론은 사실상 매우 어려움(NP-hardness).

• NP-hardness (NP: Non-deterministic Polynomial time): – NP에 속하는 모든 판정 문제를 다항 시갂에 다대일 환산핛 수 있는 문제들의 집

합.

– 다항 시갂 다대일 환산을 다항 시갂 튜링 환산으로 정의하는 경우도 있음.

• 근사 추론 – Probability propagation → Loopy belief propagation

– Monte Carlo methods → Sampling

– Variational inference

– Helmholtz machines

Grouping and Duplicating Variables

Learning Bayesian Networks

• Bayesian network learning consists of

– Structure learning (DAG structure),

– Parameter learning (for local probability distribution).

• Situations

– Known structure and complete data.

– Unknown structure and complete data.

– Known structure and incomplete data.

– Unknown structure and incomplete data.

Parameter Learning

• 네트워크 구조가 주어진 상태에서, 데이터로부터 파라미터를 추정

Key point: independence of parameter estimation

• D={s1, s2, …, sM}, where si =(ai , bi , ci , di ) is an instance of a random vector variable S=(A, B, C, D).

• Assumption: samples si are independent and identically distributed (i.i.d.).

Methods for Parameter Estimation

• Maximum Likelihood Estimation

– Choose the value of Θ which maximizes the likelihood for the observed data D.

𝜃 = arg 𝑚𝑎𝑥𝜃 𝐿𝐺 𝜃; 𝐷 = arg 𝑚𝑎𝑥𝜃 𝑃(𝐷|𝜃)

• Bayesian Estimation

– Represent uncertainty about parameters using a probability distribution over 𝜃.

– 𝜃 is also a random variable rather than a parameter value.

𝑃 𝜃 𝐷 =𝑃 𝜃 𝑃(𝐷|𝜃)

𝑃(𝐷)∝ 𝑃 𝜃 𝑃(𝐷|𝜃)

bayesian network · 2020. 2. 19. · definition: bayesian networks • the bayesian network...

Documents