Download - Bayesian Tuto
-
7/29/2019 Bayesian Tuto
1/49
Haimonti Dutta , Department Of
Computer And InformationScience 1
David HeckerMann
A Tutorial On Learning With
Bayesian Networks
-
7/29/2019 Bayesian Tuto
2/49
Haimonti Dutta , Department Of
Computer And InformationScience 2
Outline
Introduction
Bayesian Interpretation of probability and review methods
Bayesian Networks and Construction from prior knowledge
Algorithms for probabilistic inference
Learning probabilities and structure in a bayesian network
Relationships between Bayesian Network techniques andmethods for supervised and unsupervised learning
Conclusion
-
7/29/2019 Bayesian Tuto
3/49
-
7/29/2019 Bayesian Tuto
4/49
Haimonti Dutta , Department Of
Computer And InformationScience 4
What do Bayesian Networks and Bayesian
Methods have to offer ?
Handling of Incomplete Data Sets
Learning about Causal Networks Facilitating the combination of domain knowledge
and data
Efficient and principled approach for avoiding the
over fitting of data
-
7/29/2019 Bayesian Tuto
5/49
Haimonti Dutta , Department Of
Computer And InformationScience 5
The Bayesian Approach toProbability and Statistics
Bayesian Probability : the degree of belief in that event
Classical Probability : true or physical probability of an event
-
7/29/2019 Bayesian Tuto
6/49
Haimonti Dutta , Department Of
Computer And InformationScience 6
Some Criticisms of Bayesian Probability
Why degrees of belief satisfy the rules ofprobability
On what scale should probabilities be measured?
What probabilites are to be assigned to beliefsthat are not in extremes?
-
7/29/2019 Bayesian Tuto
7/49
Haimonti Dutta , Department Of
Computer And InformationScience 7
Some Answers
Researchers have suggested different sets of
properties that are satisfied by the degrees ofbelief
-
7/29/2019 Bayesian Tuto
8/49
Haimonti Dutta , Department Of
Computer And InformationScience 8
Scaling Problem
The probability wheel : a tool for assessingprobabilities
What is the probability that the fortune wheel stopsin the shaded region?
-
7/29/2019 Bayesian Tuto
9/49
Haimonti Dutta , Department Of
Computer And InformationScience 9
Probability assessment
An evident problem : SENSITIVITY
How can we say that the probability of an event is 0.601
and not .599 ?
Another problem : ACCURACY
Methods for improving accuracy are available indecision analysis techniques
-
7/29/2019 Bayesian Tuto
10/49
Haimonti Dutta , Department Of
Computer And InformationScience 10
Learning with Data
Thumbtack problem
When tossed it can rest on either heads or tails
Heads Tails
-
7/29/2019 Bayesian Tuto
11/49
Haimonti Dutta , Department Of
Computer And InformationScience 11
Problem
From N observations we want to determine the
probability of heads on the N+1 th toss.
-
7/29/2019 Bayesian Tuto
12/49
Haimonti Dutta , Department Of
Computer And InformationScience 12
Two Approaches
Classical Approach :
assert some physical probability of heads(unknown)
Estimate this physical probability from Nobservations
Use this estimate as probability for the headson the N+1 th toss.
-
7/29/2019 Bayesian Tuto
13/49
Haimonti Dutta , Department Of
Computer And Information
Science 13
The other approach
Bayesian Approach
Assert some physical probability
Encode the uncertainty about theis physicalprobability using the Bayesian probailities
Use the rules of probability to compute therequired probability
-
7/29/2019 Bayesian Tuto
14/49
Haimonti Dutta , Department Of
Computer And Information
Science 14
Some basic probability
formulas Bayes theorem : the posterior probability for
given D and a background knowledge :
p( /D, ) = p( / ) p (D/ , )P(D / )
Where p(D/ )= p(D/ , ) p( / ) d
Note : is an uncertain variable whose value corresponds tothe possible true values of the physical probability
-
7/29/2019 Bayesian Tuto
15/49
Haimonti Dutta , Department Of
Computer And Information
Science 15
Likelihood functionHow good is a particular value of ?
It depends on how likely it is capable of generating the observed
dataL ( :D ) = P( D/ )
Hence the likelihood of the sequence H, T,H,T ,T may be L ( :D )
= . (1- ). . (1- ). (1- ).
-
7/29/2019 Bayesian Tuto
16/49
Haimonti Dutta , Department Of
Computer And Information
Science 16
Sufficient statistics
To compute the likelihood in the thumb tack
problem we only require h and t(the number ofheads and the number of tails)
h and t are called sufficient statistics for thebinomial distribution
A sufficient statistic is a function that summarizesfrom the data , the relevant information for thelikelihood
-
7/29/2019 Bayesian Tuto
17/49
-
7/29/2019 Bayesian Tuto
18/49
Haimonti Dutta , Department Of
Computer And Information
Science 18
To remember
We need a method to assess the prior distributionfor .
A common approach usually adopted is assumethat the distribution is a beta distribution.
-
7/29/2019 Bayesian Tuto
19/49
Haimonti Dutta , Department Of
Computer And Information
Science 19
Maximum Likelihood Estimation
MLE principle :
We try to learn the parameters that maximize thelikelihood function
It is one of the most commonly used estimators instatistics and is intuitively appealing
-
7/29/2019 Bayesian Tuto
20/49
Haimonti Dutta , Department Of
Computer And Information
Science 20
A graphical model that efficiently encodes thejoint probability distribution for a large set ofvariables
What is a Bayesian Network ?
-
7/29/2019 Bayesian Tuto
21/49
Haimonti Dutta , Department Of
Computer And Information
Science 21
Definition
A Bayesian Network for a set of variables
X = { X1,.Xn} contains network structure S encoding conditional
independence assertions about X
a set P of local probability distributions
The network structure S is a directed acyclic graphAnd the nodes are in one to one correspondencewith the variables X.Lack of an arc denotes aconditional independence.
-
7/29/2019 Bayesian Tuto
22/49
Haimonti Dutta , Department Of
Computer And Information
Science 22
Some conventions.
Variables depicted as nodes
Arcs represent probabilistic dependence between
variables
Conditional probabilities encode the strength ofdependencies
-
7/29/2019 Bayesian Tuto
23/49
Haimonti Dutta , Department Of
Computer And Information
Science 23
An Example
Detecting Credit - Card Fraud
Fraud Age Sex
Gas Jewelry
-
7/29/2019 Bayesian Tuto
24/49
Haimonti Dutta , Department Of
Computer And Information
Science 24
Tasks
Correctly identify the goals of modeling
Identify many possible observations that may be relevant toa problem
Determine what subset of those observations is worthwhileto model
Organize the observations into variables having mutuallyexclusive and collectively exhaustive states.
Finally we are to build a Directed A cyclic Graph that encodesthe assertions of conditional independence
-
7/29/2019 Bayesian Tuto
25/49
Haimonti Dutta , Department Of
Computer And Information
Science 25
A technique of constructing a
Bayesian NetworkThe approach is based on the following
observations :
People can often readily assert causal
relationships among the variables Casual relations typically correspond to
assertions of conditional dependence
To construct a Bayesian Network we simply draw
arcs for a given set of variables from the causevariables to their immediate effects.In the finalstep we determine the local probabilitydistributions.
-
7/29/2019 Bayesian Tuto
26/49
-
7/29/2019 Bayesian Tuto
27/49
Haimonti Dutta , Department Of
Computer And Information
Science 27
Bayesian inference
On construction of a Bayesian network we need to determinethe various probabilities of interest from the model
Observed data QueryComputation of a probability of interest given a model is probabilistic
inference
x1 x2 x[m] x[m+1]
-
7/29/2019 Bayesian Tuto
28/49
Haimonti Dutta , Department Of
Computer And Information
Science 28
Learning Probabilities in a Bayesian
NetworkProblem: Using data to update the probabilities of agiven network structure
Thumbtack problem: We do not learn the
probability of the heads , we update the posteriordistribution for the variable that represents thephysical probability of the heads
The problem restated:Given a random sample D
compute the posterior probability .
-
7/29/2019 Bayesian Tuto
29/49
Haimonti Dutta , Department Of
Computer And Information
Science 29
Assumptions to compute the posterior
probability
There is no missing data in the random sample D.
Parameters are independent .
-
7/29/2019 Bayesian Tuto
30/49
Haimonti Dutta , Department Of
Computer And Information
Science 30
But
Data may be missing and then how dowe proceed ?????????
-
7/29/2019 Bayesian Tuto
31/49
Haimonti Dutta , Department Of
Computer And Information
Science 31
Obvious concerns.
Why was the data missing?
Missing values
Hidden variables
Is the absence of an observationdependent on the actual states of thevariables?
We deal with the missing data that areindependent of the state
-
7/29/2019 Bayesian Tuto
32/49
Haimonti Dutta , Department Of
Computer And Information
Science 32
Incomplete data (contd)
Observations reveal that for any interesting set oflocal likelihoods and priors the exactcomputation of the posterior distribution will be
intractable.
We require approximation for incomplete data
-
7/29/2019 Bayesian Tuto
33/49
Haimonti Dutta , Department Of
Computer And Information
Science 33
The various methods of approximations forIncomplete Data
Monte Carlo Sampling methods
Gaussian Approximation
MAP and Ml Approximations and EM algorithm
-
7/29/2019 Bayesian Tuto
34/49
Haimonti Dutta , Department Of
Computer And Information
Science 34
Gibbs Sampling
The steps involved :
Start :
Choose an initial state for each of the variables in X atrandom
Iterate : Unassign the current state of X1.
Compute the probability of this state given that of n-1variables.
Repeat this procedure for all X creating a new sample of X
After burn in phase the possible configuration of X will
be sampled with probability p(x).
-
7/29/2019 Bayesian Tuto
35/49
Haimonti Dutta , Department Of
Computer And Information
Science 35
Problem in Monte Carlo method
Intractable when the sample size is large
Gaussian Approximation
Idea : Large amounts of data can be approximatedto a multivariate Gaussian Distribution.
-
7/29/2019 Bayesian Tuto
36/49
Haimonti Dutta , Department Of
Computer And Information
Science 36
Criteria for Model Selection
Some criterion must be used to determine thedegree to which a network structure fits the priorknowledge and data
Some such criteria include
Relative posterior probability
Local criteria
-
7/29/2019 Bayesian Tuto
37/49
Haimonti Dutta , Department Of
Computer And Information
Science 37
Relative posterior probability
A criteria for model selection is the logarithm of therelative posterior probability given as follows :
Log p(D /Sh) = log p(Sh) + log p(D /Sh)
log prior log marginallikelihood
-
7/29/2019 Bayesian Tuto
38/49
Haimonti Dutta , Department Of
Computer And Information
Science 38
Local Criteria
An Example :
A Bayesian network structure for medical diagnosis
Ailment
Finding 1 Finding 2Finding n
-
7/29/2019 Bayesian Tuto
39/49
Haimonti Dutta , Department Of
Computer And Information
Science 39
Priors
To compute the relative posterior probability
We assess the
Structure priors p(Sh)
Parameter priors p( s /Sh)
-
7/29/2019 Bayesian Tuto
40/49
Haimonti Dutta , Department Of
Computer And Information
Science 40
Priors on network parameters
Key concepts :
Independence Equivalence
Distribution Equivalence
-
7/29/2019 Bayesian Tuto
41/49
Haimonti Dutta , Department Of
Computer And Information
Science 41
Illustration of independent equivalence
Independence assertion : X and Z are conditionallyindependent given Y
X
Y Z
X
YZ
X
Y Z
-
7/29/2019 Bayesian Tuto
42/49
Haimonti Dutta , Department Of
Computer And Information
Science 42
Priors on structures
Various methods.
Assumption that every hypothesis is equallylikely ( usually for convenience)
Variables can be ordered and presence orabsence of arcs are mutually independent
Use of prior networks
Imaginary data from domain experts
-
7/29/2019 Bayesian Tuto
43/49
Haimonti Dutta , Department Of
Computer And Information
Science 43
Benefits of learning structures
Efficient learning --- more accurate models withless data
Compare P(A) and P(B) versus P(A,B) former
requires less data
Discover structural properties of the domain
Helps to order events that occur sequentially andin sensitivity analysis and inference
Predict effect of the actions
-
7/29/2019 Bayesian Tuto
44/49
Haimonti Dutta , Department Of
Computer And Information
Science 44
Search Methods
Problem : We are to find the best network from theset of all networks in which each node has nomore than k parents
Search techniques :
Greedy Search
Greedy Search with restarts
Best first Search
Monte Carlo Methods
-
7/29/2019 Bayesian Tuto
45/49
-
7/29/2019 Bayesian Tuto
46/49
Haimonti Dutta , Department Of
Computer And Information
Science 46
What is all this good for anyway????????
Implementations in real life :
It is used in the Microsoft products(MicrosoftOffice)
Medical applications and Biostatistics (BUGS)
In NASA Autoclass projectfor data analysis
Collaborative filtering (Microsoft MSBN) Fraud Detection (ATT)
Speech recognition (UC , Berkeley )
-
7/29/2019 Bayesian Tuto
47/49
Haimonti Dutta , Department Of
Computer And Information
Science 47
Limitations Of Bayesian Networks
Typically require initial knowledge of manyprobabilitiesquality and extent of prior
knowledge play an important role
Significant computational cost(NP hard task)
Unanticipated probability of an event is not takencare of.
-
7/29/2019 Bayesian Tuto
48/49
Haimonti Dutta , Department Of
Computer And Information
Science 48
Conclusion
Inducer
Bayesian Network
Data +priorknowledge
-
7/29/2019 Bayesian Tuto
49/49