prml을위한기초확률이론 (basic probability theory for prml) · pdf fileprml...
Post on 11-Feb-2018
216 Views
Preview:
TRANSCRIPT
Seoul National University
PRML을위한기초확률이론(Basic Probability Theory for PRML)
Yung-Kyun Noh (노영균)Seoul National University
패턴인식및기계학습겨울학교(Pattern Recognition and Machine Learning Winter School)
2016. 1. 20 (Wed.)
Seoul National University
Contents• Probability / Probability density
• Conditional probability (density)
• Marginal probability (density)• Joint probability (density)• Inference and classification• Gaussian Processes
2
Seoul National University
• Mapping from a random variable to a number
Probability
3
…0 1
Event 1 Event 2 Event 3 Event N…
Seoul National University
Probability
4
if
: random variable : set of outputs of random variables
Seoul National University
Probability and Probability Density
5
RD
Probability =
Seoul National University
Probability and Probability Density
6
Event is defined infinitesimally:
R1
DR2
: set of infinitesimal events
Seoul National University
Can you explain the meaning of these functions?
7
Compare with
Seoul National University
Supervised Learning (Prediction)
8
• Method: – Learning from
examples and can classify an unseen data
label
classifierNew unseen data (a horse)
[Based on the assumption of regularity]
car car
planeship
horse
horse horsecar
Learn
Seoul National University
=[1, 2, 5, 10, …]T
Representation of Data
9
• Each datum is one point in a data space
Data space
elements
Seoul National University
Classification
10
Seoul National University
Bayes Optimal Classifier• Our ultimate goal is not a zero error.
11
(Optimal) Bayes error
Figure credit: Masashi Sugiyama
Seoul National University
Model on Each Class
• Model: Class-conditional density as a Gaussian
12
Seoul National University
Optimal Regression• Minimizing mean square error
13
Minimize
Minimized when
Seoul National University
Model for Regression• Obtain regression function
from data
• Choose a model where the following expectation is minimized:
– Minimized for
• Bias-Variance tradeoff
14
Variance Bias^2
Seoul National University
Several Rules
15
Seoul National University
For More Than Two Random Variables• For three disjoint sets for a
random variable and another three disjoint sets for a random variable :
16
1
Seoul National University
Conditional Probability
17
1
Seoul National University
Conditional Probability Density
18
D
z = c
New domain
y = 1
y = 2
Seoul National University
Conditional Probability Density
19
Seoul National University
Marginal Probability Density
20
sum
Seoul National University
Marginal Probability Density and Conditional Probability Density in Machine Learning
21
Conditional PDF Conditional PDF
Joint PDF
2
1
Seoul National University
Marginal Probability Density and Conditional Probability Density in Machine Learning
22
Joint PDF
sum
Seoul National University
Benefits of Using High Dimensionalities• Feature 1 and Feature 2 have correlation
23
Feature 1
Feature 2
Seoul National University
Curse of Dimensionality– To achieve same density as N = 100 for 1-
variable– We need N = 100D for D variables
– Conversely, when we have 60,000 data for 10-dimensional space, the density is the same as 3 data in 1-dimensional space.
24
Seoul National University
GAUSSIAN DENSITY FUNCTION
25
Seoul National University
Gaussian Random Variable
26
Principal axes are the eigenvector directions of
p¸1p
¸2
: eigenvalues
Seoul National University
Gaussian Random Variable - Projection
27
Projection to any direction is Gaussian.
Seoul National University
Gaussian Random Variable – Marginal
28
Seoul National University
Gaussian Random Variable – Conditional
29
Seoul National University
PARAMETER ESTIMATION
30
Seoul National University
Motivation – Parameter Estimation• Parameter estimation is an optimization
problem
31
: estimated probability density function,in other words, density function that fits data the most
Seoul National University
Maximum Likelihood Estimation• Parameter estimation is an optimization
problem
32
Seoul National University
Maximum Likelihood for Gaussian
• With optimal parameters satisfying
33
Empirical mean and empirical covariance are the maximum likelihood solutions.
Seoul National University
Maximum Likelihood for Gaussian
34
rμ ln p(X jμ) = ~0 μ = ¹; §
Seoul National University
Maximum A Posteriori (MAP) Estimation
• MAP estimation
• Likelihood (Model): • Prior:• Bayes rule:
35
cf)
Seoul National University
Maximum A Posteriori (MAP) Estimation for Gaussian
• Let the prior
• The posterior can be calculated using
36
Seoul National University
Maximum A Posteriori (MAP) Estimation for Gaussian
37
Seoul National University
Maximum A Posteriori (MAP) Estimation for Gaussian• Posterior density
– Caution: Posterior of , not the density function of
• MAP of = Mean of =
38
Seoul National University
MLE vs. MAP• For Gaussian
– When N is just a few (say N = 5),
39
Dominant
Seoul National University
MLE vs. MAP• For Gaussian
– When we have a few outliers
40
Dominant (learn from )
Seoul National University
MLE vs. MAP
41
Seoul National University
Bayesian Integration• The final standard method of prediction is to
use Bayesian inference instead of estimating the parameter point.– Do not insert directly, but
marginalize.
42
Uncertainty of = N (¹n; ¾2 + ¾2n)
Seoul National University
Conjugate Priors• Given a likelihood pdf, , posterior
has the same form as the prior .
43
Prior PosteriorLikelihood
Two distributionsHave the same form
Seoul National University
Conjugate Priors
44
Gaussian GaussianGaussian
GammaGammaGaussian
Dirichlet DirichletMultinomial
Beta BetaBinomial
Gamma GammaPoisson
Seoul National University
Kullback-Leibler Divergence
45
: Empirical density function: Model density function
Seoul National University
Kullback-Leibler Divergence
46
Likelihood:
KL Divergence:
Seoul National University
Kullback-Leibler Divergence
47
Model with complex function will capture the noise.
Seoul National University
Consistency and Bayes Error• Objective: minimizing expected error
48
Bayes error
0
Error
Smaller Larger
Error for training data (train error)
(For example, a linear classifier with regularization)
: loss function: A realization of N data
Seoul National University
Consistency and Bayes Error• Consistent learner with many data
49
Bayes error
0
Error
Smaller Larger
train error
Minimum error of consistent learner
Consistency does not necessarily imply achieving the Bayes error!!!
(For example, a linear classifier with regularization)
Seoul National University
GAUSSIAN PROCESS REGRESSION
50
Seoul National University
Prediction Using Correlation
51
Seoul National University
Gaussian Process
52
Figure credit: Christopher Bishop
Seoul National University
Gaussian Process as an Infinite Dimensional Gaussian
53
Seoul National University
Gaussian Process Regression
54
Figure credit: C. E. Rasmussen & C. K. I. Williams
For given
y(x;D) = k>K¡1y
Seoul National University
Summary• What we did:
– Probability and probability density– Conditional density, marginalized density– Model construction– Gaussian model– Parameter estimation– Gaussian process
• What we didn’t do:– Multinomial distribution and Dirichlet distribution– Convergence of estimation– Generative model vs. Discriminative model
55
Seoul National University
THANK YOUYung-Kyun Nohnohyung@snu.ac.kr
56
top related