ccf 贝叶斯网络在中国的应用和发展学术沙龙

CCF贝叶斯网络在中国的应用和发展学术沙龙

香港科技大学BN 理论研究和应用的情况

2012-05-22

Overview

Early Work (1992-2002)

Inference: Variable Elimination

Inference: Local Structures

Others: Learning, Decision Making, Book

Latent Tree Models (2000 - )

Theory and Algorithms

Applications Multidimensional Clustering, Density Estimation, Latent Structure

Survey Data, Documents, Business Data

Traditional Chinese Medicine (TCM)

Extensions

Bayesian Networks

Variable Elimination

Papers: N. L. Zhang and D. Poole (1994),

A simple approach to Bayesian network computations, in Proc. of the 10th

Canadian Conference on Artificial Intelligence, Banff, Alberta, Canada, May 16-22.

N. L. Zhang and D. Poole (1996),

Exploiting causal independence in Bayesian network inference,Journal of Artificial

Intelligence Research, 5: 301-328.

Idea


First BN inference algorithm in

Russell & Norvig wrote on page 529: “The algorithm we describe is closest to that developed by Zhang and

Poole (1994, 1996)”

Koller and Friedman wrote on page: “… the variable elimination algorithm, as presented here, first described

by Zhang and Poole (1994), …”

The K&F book cites 7 of our papers

Local Structure

Local Structures: Causal Independence


Exploiting causal independence in Bayesian network inference,Journal of Artificial


N. L. Zhang and D. Poole (1994),

Intercausal independence and heterogeneous factorization,i in Proc. of the 10th

Conference on Uncertainties in Artificial Intelligence., Seattle, USA, July 29-31

Local Structures: Causal Independence

Local Structure: Context Specific Independence


On the role of context-specific independence in Probabilistic Reasoning IJCAI-99,

1288-1293.

D. Poole and N. L. Zhang (2003).

Exploiting contextual independence in probablisitic inference. Journal of Artificial


Other Works

Parameter Learning N. L. Zhang (1996),

Irrelevance and parameter learning in Bayesian networks, Artificial

Intelligence, An International Journal, 88: 359-373.

Decision Making N. L. Zhang (1998), Probabilistic Inference in Influence Diagrams,

Computational Intelligence , 14(4): 475-497.

N. L. Zhang R. Qi and D. Poole (1994) A computational theory of decision

networks, International Journal of Approximate Reasoning, 1994, 11 (2):

83-158. PhD Thesis

Other Works

Overview




Others: Learning, Decision Making






Extensions

Latent Tree Models: Overview

Concept first mentioned by Pearl 1988

We are the first one to conduct systematic research on LTMs. N. L. Zhang (2002). Hierarchical latent class models for cluster analysis.

AAAI-02, 230-237.

N. L. Zhang (2004). Hierarchical latent class models for cluster analysis.

Journal of Machine Learning Research, 5(6):697--723, 2004.

Earlier Followers:

Aarlborg U of Denmark, Norwegian University of Science and

Technology

Recent papers from:

MIT, CMU, USC, Goergia Tech, Edinburgh

Latent Tree Models

Recent survey by French researcher:

Latent Tree Models (LTM)

Bayesian networks with

Rooted tree structure

Discrete random variables

Leaves observed (manifest variables)

Internal nodes latent (latent variables)

Also known as hierarchical latent class (HLC)

models, HLC models

P(Y1),

P(Y2|Y1),

P(X1|Y2), P(X2|Y2), …

Example

Manifest variables

Math Grade, Science Grade, Literature Grade, History Grade

Latent variables

Analytic Skill, Literal Skill, Intelligence

Theory: Root Walking and Model Equivalence

M1: root walks to X2; M2: root walks to X3

Root walking leads to equivalent models on manifest variables

Implications:

Cannot determine edge orientation from data

Can only learn unrooted models

Regular latent tree models: For any latent node Z with neighbors X1, X2,

…,

Regularity

Can focus on regular models only

Irregular models can be made regular

Regularized models better than irregular models

The set of all such models is finite.

Effective Dimension

Standard dimension:

Number of free parameters

Effective dimension

X1, X2, …, Xn: observed variables

P(X1, X2, …, Xn) is a point in a high-D space for each value of the

parameter

Spans a manifold as parameter value varies.

Effective dimension: dimension of the manifold.

Parsimonious model:

Standard dimension = effective dimension

Open question: How to test parsimony?

Effective Dimension

Paper: N. L. Zhang and Tomas Kocka (2004). Effective dimensions of hierarchical

latent class models. Journal of Artificial Intelligence Research, 21: 1-17.

Open question: Effective of LTM with one latent variable

Learning Latent Tree Models

Determine Number of latent variables

Cardinality of each latent variable

Model Structure

Conditional probability distributions

Search-Based Learning: Model Selection

Bayesian score: posterior probability P(m|D)

P(m|D) = P(m)∫ P(D|m, θ) d θ/ P(D)

BIC Score: large sample approximation

BIC(m|D) = log P(D|m, θ*) – d logN/2

BICe Score:

BICe(m|D) = log P(D|m, θ*) – de logN/2

effective dimension de.

Effective dimensions are difficult to compute

BICe not realistic

Search Algorithms Papers:

T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269.

N. L. Zhang and T. Kocka (2004). Efficient Learning of Hierarchical Latent Class Models. ICTAI-2004

Double hill climbing (DHC), 2002 7 manifest variables.

Single hill climbing (SHC), 2004 12 manifest variables

Heuristic SHC (HSHC), 2004 50 manifest variables

EAST, 2011 100+ manifest variables

Recent fast algorithm for specific applications.

Illustration of the search process

Algorithm by Others

Variable clustering method S. Harmeling and C.K. I. Williams. Greedy learning of binary latent trees (2011).

IEEE Transactions on Pattern Analysis and Machine Intel ligence, 33(6), 1087-

1097.

Raphaël Mourad, Christine Sinoquet, Philippe Leray (2010). A hierarchical

Bayesian network approach for linkage disequilibrium modeling and data-

dimensionality reduction prior to genome-wide association studies. BMC

Bioinformatics 2011, 12:16doi:10.1186/1471-2105-12-16.

Fast, model quality may be poor

Adaptation of Evolution Tree Algorithms Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky

（ 2011 ） . Learning latent tree graphical models. Journal of Machine Learning

Research 1 (2011) 1-48.

Fast, has consistence proof, for special LTMs only

Overview










Extensions

Density Estimation

Characteristics of LTMs Are computationally very simple to work with. Can represent complex relationships among manifest variables.

Useful tool for density estimation.

Density Estimation

New approximate inference algorithm for Bayesian networks (Wang,

Zhang and Chen, AAAI 08, Exceptional Paper)

SampleLTAB Algo

sparse sparse dense

dense

Multidimensional Clustering

Paper: T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional

clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269.

Cluster Analysis

Grouping of objects into clusters so that objects in the same

cluster are similar in some sense

How to Cluster Those?


Style of picture


Type of object in picture


Multidimensional clustering / Multi-Clustering

How to partition data in multiple ways?

Latent tree models

Latent Tree Models & Multidimensional Clustering

Model relationship between Observed / Manifest variables

Math Grade, Science Grade, Literature Grade, History Grade

Latent variables Analytic Skill, Literal Skill, Intelligence

Each latent variable gives a partition Intelligence: Low, medium, high

Analytic skill: Low, medium, high

ICAC Data

// 31 variables, 1200 samples

C_City: s0 s1 s2 s3 // very common, quit common, uncommon, ..

C_Gov: s0 s1 s2 s3

C_Bus: s0 s1 s2 s3

Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,...

Tolerance_C_Bus: s0 s1 s2 s3

WillingReport_C: s0 s1 s2 // yes, no, depends

LeaveContactInfo: s0 s1 // yes, no

I_EncourageReport: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e

I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ...

…..

-1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0

-1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0

-1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0

….

Latent Structure Discovery

Y2: Demographic info; Y3: Tolerance toward corruption

Y4: ICAC performance; Y7: ICAC accountability

Y5: Change in level of corruption; Y6: Level of corruption


Y2=s0: Low income youngsters; Y2=s1: Women with no/low income

Y2=s2: people with good education and good income;

Y2=s3: people with poor education and average income


Y3=s0: people who find corruption totally intolerable; 57%

Y3=s1: people who find corruption intolerable; 27%

Y3=s2: people who find corruption tolerable; 15%

Interesting finding:

Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus

Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus

Y3=s0: Same attitude toward C-Gov and C-Bus

People who are tough on corruption are equally tough toward C-Gov and C-Bus.

People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv


Interesting finding: Relationship btw background and tolerance toward corruption

Y2=s2: ( good education and good income) the least tolerant. 4% tolerable

Y2=s3: (poor education and average income) the most tolerant. 32% tolerable

The other two classes are in between.

Marketing Data

Latent Tree Analysis of Text Data

The WebKB Data Set 1041 web pages collected from 4 CS departments in 1997

336 words

Latent Tree Model for WebKB Data by BI Algorithm

89 latent variables

Latent Tree Modes for WebKB Data

LTM for Topic Detection

Topic

A latent state

A collection of document

A document can belong to multiple topics 100%

LTM vs LDA for Topic Detection

LTM

Topic A latent state

A collection of document

A document can belong to multiple topics 100%

LDA

Topic: Distribution over the entire vocabulary.

The probabilities of the words add to one.

Document:

Distribution over topics.

If a document contains more of one topic, then it contains less of other

topics.

Latent Tree Analysis Summary

Finds meaningful facets of data

Identify natural clusters along each facet.

Gives clear picture of what is in data.

LTM for Spectral Clustering

Original Data Set

Eigenvectors of Laplacian Matrix

Rounding: Eigenvectors to final partition

LTM for Spectral Clustering

Rounding:

Determine number of clusters

Determine the final partition

No good method available

LTM Method:

Overview










Extensions

LTM and TCM

Papers N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and

diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine. 42:

229-245. Took 8 years

N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Statistical Validation of

TCM Theories. Journal of Alternative and Complementary Medicine, 14(5):583-7.

(Featured at TCM Wiki).

张连文 , 袁世宏，王天芳，赵燕等 . 隐结构分析与西医疾病的辨证分型 (I):

基本原理 . 世界科学技术 --- 中医药现代化 , 13 卷 (3 期 ): 498 ～ 502,

2011.

张连文 , 许朝霞，王忆勤，刘腾飞等 . 隐结构分析与西医疾病的辨证分型(II): 综合聚类 . 世界科学技术 --- 中医药现代化 , 14 卷 (2 期 ), 2012.

LTM and TCM: Objectives Statistical validation of TCM postulates

[Review of a recent paper]

I am very interested in what these authors are trying to do. They are dealing

with an important epistemological problem.

To go from the many symptoms and signs that patients present, to construct a

consistent and other-observer identifiable constellation, is a core task of the medical

practitioner. A kind of feedback occurs between what a practitioner is taught/finds listed

in books, and what that practitioner encounters in the clinic. The better the constellation

is understood, the more accurate the clustering of symptoms, the more consistent is the

identification of syndromes among practitioners and through time. While these

constellations have been worked into widely-accepted ‘disease constructs’ for

biomedicine for some time which are widely accepted as ‘real,’ this is not quite as true

for TCM constellations. This latent variable study is interesting not only in itself, but also

as providing evidence that what TCM ‘says’ is so, shows up during analysis as

demonstrably so.

LTM and TCM: Objectives TCM postulates explain occurrence of Symptoms ：

When KIDNEY YANG is in deficiency, it cannot warm the body and the patient feels cold, resulting in

intolerance to cold, cold limbs, …

Manifest variables ： Directly observed: Feel cold, cold limbs

Latent variable: Not directly observed: Kidney Yang deficiency

Latent Structure: Relationships between latent variables and manifest variables

Statistical validation of TCM postulates

)

Latent Tree Analysis of Symptom Data

Similar to WebKB data Web page containing words Patient having symptoms

What will be the result of latent tree analysis? Different facets of data revealed Natural clusters along each facet identified

Each facet involves a few symptoms May correspond to a syndrome Providing validation to TCM postulates Providing evidence for syndrome differentiation

Latent Tree Model for Kidney Data

Latent structure matches relevant TCM postulate

Providing validation to TCM postulate

Latent Tree Model for Kidney Data

Work reported in N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and diagnosis in

traditional Chinese medicine. Artificial Intelligence in Medicine. 42: 229-245.

Email from: Bridie Andrews: Bentley University, Boston

Dominique Haughton: ditto, Fellow of American Statistics Association

Lisa Conboy: Harvard Medical School

“We are very interested in your paper on “Latent tree models and

diagnosis in traditional Chinese medicine”, and are planning to repeat

your method using some data we have here on about 270 cases of

“irritable bowel syndrome” and their differing TCM diagnoses.”

Results on Many Data Sets from 973 Project

Providing Evidence of Syndrome Differentiation


How to produce evidence for TCM syndrome diagnosis using latent

structure analysis?


Imagine sub-typing WM disease D from TCM perspective

Expected conclusion ： several syndromes among D patients

Also providing a basis for distinguishing syndrome Z patients from

other D patients

Picture 2

Example: Model for Depression Data

Example: Model for Depression Data

Evidence provided by Y8 for syndrome classification ： Two classes: 有胸膈气机不畅 , 无胸膈气机不畅 Sizes of the classes: 48% ， 52% ； Symptoms important for distinguishing between the two classes

(descending order of importance): 憋气、气短、胸闷 and 太息 . Others play little role

Latent Tree Analysis of Prescription Data

Data Guanganman Hospital 1287 formulae prescribed for patients with Disharmony between Liver and

Spleen

Latent Tree Model

Some Partitions Obtained

Overview










Extensions

Pouch Latent Tree Models (PLTMs)

Probabilistic graphical model with continuous observed variables (X’s)

and discrete latent variables (Y’s).

Tree structure Bayesian network except several observed variables can

appear in the same node, a pouch.

Pouch Latent Tree Models (PLTM)

One possible PLTM for the transcript data

PLTM generalizes Gaussian Mixture Model (GMM), which is PLTM with a

single pouch and single latent variable

The UCI Image Data

Each instance represents 3x3 pixel region of an image

with 18 attributes, labeled.

Class labels were first removed and remaining data analyzed using PLTM.

Pouches capture natural facets well: From left to right: Line-Density, Edge, Color, Coordinates

Latent variables represent clusterings

The UCI Image Data

Y1 matches true class partition well.

Y3: partition based on edge and color Y4: partition based on centroid.col

Feature curve: Normalized

MI between a latent

variables and attributes

Y2 represents a partition

along line-density facet

Y1 represents a partition

along color facet

Interesting finding: Y1

strongly correlated with

centroid.row

Overview










Extensions

谢谢！