lecture 2: statistical learning primer for biologists alan qi purdue statistics and cs jan. 15, 2009

71
Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Upload: rosemary-lloyd

Post on 21-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Lecture 2: Statistical learning primer for biologists

Alan QiPurdue Statistics and CS

Jan. 15, 2009

Page 2: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Outline

• Basics for probability• Regression• Graphical models: Bayesian networks and

Markov random fields• Unsupervised learning: K-means and

Expectation maximization

Page 3: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Probability Theory

•Sum Rule

Product Rule

Page 4: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

The Rules of Probability

• Sum Rule

• Product Rule

Page 5: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Bayes’ Theorem

posterior likelihood × prior

Page 6: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Probability Density & Cumulative Distribution Functions

Page 7: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Expectations

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

Page 8: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Variances and Covariances

Page 9: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

The Gaussian Distribution

Page 10: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Gaussian Mean and Variance

Page 11: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

The Multivariate Gaussian

Page 12: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Gaussian Parameter Estimation

Likelihood function

Page 13: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Maximum (Log) Likelihood

Page 14: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Properties of and

Unbiased

Biased

Page 15: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Curve Fitting Re-visited

Page 16: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Page 17: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Predictive Distribution

Page 18: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

Page 19: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Bayesian Curve Fitting

Page 20: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Bayesian Networks

• Directed Acyclic Graph (DAG)

Page 21: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Bayesian Networks

General Factorization

Page 22: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Generative Models

• Causal process for generating images

Page 23: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Discrete Variables (1)

• General joint distribution: K 2 -1 parameters

• Independent joint distribution: 2(K-1) parameters

Page 24: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Discrete Variables (2)

General joint distribution over M variables: KM -1 parameters

M node Markov chain: K-1+(M-1)K(K-1) parameters

Page 25: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Discrete Variables: Bayesian Parameters (1)

Page 26: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Discrete Variables: Bayesian Parameters (2)

Shared prior

Page 27: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Parameterized Conditional Distributions

If are discrete, K-state variables, in general has O(K M) parameters.

The parameterized form

requires only M + 1 parameters

Page 28: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Conditional Independence

• a is independent of b given c

• Equivalently

• Notation

Page 29: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Conditional Independence: Example 1

Page 30: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Conditional Independence: Example 1

Page 31: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Conditional Independence: Example 2

Page 32: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Conditional Independence: Example 2

Page 33: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Conditional Independence: Example 3

• Note: this is the opposite of Example 1, with c unobserved.

Page 34: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c observed.

Page 35: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

“Am I out of fuel?”

B = Battery (0=flat, 1=fully charged)F = Fuel Tank (0=empty, 1=full)G = Fuel Gauge Reading

(0=empty, 1=full)

And hence

Page 36: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

“Am I out of fuel?”

Probability of an empty tank increased by observing G = 0.

Page 37: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

“Am I out of fuel?”

Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

Page 38: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

The Markov Blanket

Factors independent of xi cancel between numerator and denominator.

Page 39: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Cliques and Maximal Cliques

Clique

Maximal Clique

Page 40: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Joint Distribution

• where is the potential over clique C and

• is the normalization coefficient; note: M K-state variables KM terms in Z.

• Energies and the Boltzmann distribution

Page 41: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Illustration: Image De-Noising (1)

Original Image Noisy Image

Page 42: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Illustration: Image De-Noising (2)

Page 43: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Illustration: Image De-Noising (3)

Noisy Image Restored Image (ICM)

Page 44: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Converting Directed to Undirected Graphs (1)

Page 45: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Converting Directed to Undirected Graphs (2)

• Additional links: “marrying parents”, i.e., moralization

Page 46: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Directed vs. Undirected Graphs (2)

Page 47: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Inference on a Chain

Computational time increases exponentially with N.

Page 48: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Inference on a Chain

Page 49: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Supervised Learning

• Supervised learning: learning with examples or labels, e.g., classification and regression

• Linear regression (the example we just given), Generalized linear models (e.g, probit classification), Support vector machines, Gaussian processes classifications, etc.

• Take CS590M-Machine Learning in 2009 fall.

Page 50: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Unsupervised Learning

• Supervised learning: learning with examples or labels, e.g., classification and regression

• Unsupervised learning: learning without examples or labels, e.g., clustering, mixture models, PCA, non-negative matrix factorization

Page 51: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

K-means Clustering: Goal

Page 52: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Cost Function

Page 53: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Two Stage Updates

Page 54: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Optimizing Cluster Assignment

Page 55: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Optimizing Cluster Centers

Page 56: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Convergence of Iterative Updates

Page 57: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Example of K-Means Clustering

Page 58: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Mixture of Gaussians• Mixture of Gaussians:

• Introduce latent variables:

• Marginal distribution:

Page 59: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Conditional Probability

• Responsibility that component k takes for explaining the observation.

Page 60: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Maximum Likelihood

• Maximize the log likelihood function

Page 61: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Maximum Likelihood Conditions (1)

• Setting the derivatives of to zero:

Page 62: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Maximum Likelihood Conditions (2)

• Setting the derivative of to zero:

Page 63: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Maximum Likelihood Conditions (3)

• Lagrange function:

• Setting its derivative to zero and use the normalization constraint, we obtain:

Page 64: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Expectation Maximization for Mixture Gaussians

• Although the previous conditions do not provide closed-form conditions, we can use them to construct iterative updates:

• E step: Compute responsibilities .• M step: Compute new mean , variance ,

and mixing coefficients .• Loop over E and M steps until the log

likelihood stops to increase.

Page 65: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Example

• EM on the Old Faithful data set.

Page 66: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

General EM Algorithm

Page 67: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

EM as Lower Bounding Methods

• Goal: maximize

• Define:

• We have

Page 68: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Lower Bound

• is a functional of the distribution .

• Since and ,• is a lower bound of the log likelihood

function .

Page 69: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Illustration of Lower Bound

Page 70: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Lower Bound Perspective of EM

• Expectation Step:• Maximizing the functional lower bound

over the distribution .

• Maximization Step:• Maximizing the lower bound over the

parameters .

Page 71: Lecture 2: Statistical learning primer for biologists Alan Qi Purdue Statistics and CS Jan. 15, 2009

Illustration of EM Updates