prof. ron shamir & prof. roded sharan school of computer...

Lecture 7: DNA chips and

clustering 4,6/12/12

חישובית גנומיקה רודד שרן' ופרופרון שמיר ' פרופ אוניברסיטת תל אביב ,ס למדעי המחשב"ביה

Computational Genomics Prof. Ron Shamir & Prof. Roded Sharan School of Computer Science, Tel Aviv University

Clustering gene expression data

How Gene Expression Data Looks

Expression levels,

“Raw Data”

conditions

Entries of the Raw Data matrix: • Ratio values • Absolute values • …

• Row = gene’s expression pattern / fingerprint vector

• Column = experiment/condition’s profile

Data Preprocessing Expression

levels,

“Raw Data”

conditions

•Input: Real-valued raw data matrix.

•Compute the similarity matrix (cosine angle/correlation/…)

• Alternatively – distances

10 20 30 40 50 60

From the Raw Data matrix we compute the similarity matrix S. Sij reflects the similarity of the expression patterns of gene i and gene j.

DNA chips: Applications

• Deducing functions of unknown genes (similar expression pattern similar function) • Deciphering regulatory mechanisms (co-expression co-regulation). • Identifying disease profiles • Drug development •…

Analysis requires clustering of genes/conditions.

Clustering: Objective Group elements (genes) to clusters satisfying:

• Homogeneity: Elements inside a cluster are highly similar to each other.

• Separation: Elements from different clusters have low similarity to each other.

•Unsupervised. •Most formulations are NP-hard.

The Clustering Bazaar

Hierarchical clustering

An Alternative View

Instead of partition to clusters – Form a tree-hierarchy of the input elements satisfying:

• More similar elements are placed closer along the tree.

•Or: Tree distances reflect distance between elements

Hierarchical Representation

1 3 4 2 1 3 4 2

4.5 5.0

Dendrogram: rooted tree, usually binary; all leaf-

root distances are equal. Ordinates reflect (avg.)

distances between the corresponding subtrees.

Hierarchical Clustering: Average Linkage Sokal & Michener 58, Lance & Williams 67

• Input: Distance matrix (Dij)

• Iterative algorithm. Initially each element is a cluster. nr- size of cluster r

– Find min element Drs in D; merge clusters r,s

– Delete elements r,s; add new element t with Dit=Dti=nr/(nr+ns)•Dir+ ns/(nr+ns) • Dis

– Repeat

Average Linkage (cont.)

• Claim: Drs is the average distance between elements in r and s.

• Proof by induction…

• Claim: Drs can only increase.

A General Framework Lance & Williams 67

• Find min element Drs , merge clusters r,s

• Delete elems. r,s, add new elem. t with

Dit=Dti=rDir+ sDis + |Dir-Dis|

• Single-linkage: Dit=min{Dir,Dis}

• Complete-linkage: Dit=max{Dir,Dis}

• Note: analogous formulation in terms of similarity

matrix (rather than distance)

Hierarchical clustering of GE data Eisen et al., PNAS 1998

• Growth response: Starved human fibroblast cells, added serum

• Monitored 8600 genes over 13 time-points

• tij - fluorescence level of gene i in condition j; rij – same for reference (time=0).

• sij= log(tij/rij)

• Skl=(jskj •slj)/[|sk||sl|] (cosine of angle)

• Applied average linkage method

• Ordered leaves by increasing element weight: average expression level, time of maximal induction, or other criteria

“Eisengrams” for same data randomly permuted within rows (1), columns (2) and both(3)

Comments

• Distinct measurements of same genes cluster together

• Genes of similar function cluster together

• Many cluster-function specific insights

• Interpretation is a REAL biological challenge

More on hierarchical methods • Agglomerative vs. the “more natural”

divisive. • Advantages:

– gives a single coherent global picture – Intuitive for biologists (from phylogeny)

• Disadvantages: – No single partition; no specific clusters – Forces all elements to fit a tree

hierarchy

Non-Hierarchical Clustering

K-means (Lloyd’ 57, Macqueen ’67)

• Input: vector vi for each element i; #clusters=k

• Define a centroid cp of a cluster Cp as its average vector.

• Goal: minimize clusters pi in cluster pd(vi,cp)

• Objective = homogeneity only (k fixed)

• NP-hard already for k=2.

K-means alg.

• Initialize an arbitrary partition P into k clusters.

• Repeat the following till convergence:

– Update centroids (max c, P fixed)

– Assign each point to its closest centroid (max P, c fixed)

• Can be shown to have poly expected time under various assumptions on data distribution.

• A variant: perform a single best modification (that decreases the score the most).

A Soft Version

• Based on a probabilistic model of data as coming from a

mixture of Gaussians:

• Goal: evaluate the parameters θ (assume σ is known).

• Method: apply EM to maximize the likelihood of data.

( | ) ~ ( , )

P x z j N I

( , )( ) exp( )

EM, soft version • Iteratively, compute soft assignment and use it to

derive expectations of π, μ:

Soft vs. hard k-means

Soft EM optimizes: Hard EM optimizes: If we use uniform mixture probs then k-means is an application of hard EM since:

( )log ( , | ) ( , )i z i

P x z d x

Expectation-Maximization & Baum-Welch

The probabilistic setting

Input: data x coming from a probabilistic model with hidden

information y

Goal: Learn the model’s parameters so that the likelihood of the

data is maximized.

Example: a mixture of two Gaussians

1( 1) ; ( 2) 1

( )1( | ) exp

P y P y p p

xP x y

The likelihood function

( 1) ; ( 2) 1

( )1( | ) exp

( ) ( | ) ( , | )

( )log ( ) log exp

P y p P y p p

xP x y j

L P x P x y j

The EM algorithm Goal: max logP(x|θ)=log (Σ P(x,y|θ))

Assume we have a model θt which we wish to improve.

Note: P(x|θ) = P(x,y|θ) / P(y|x,θ)

( | , ) log ( | ) ( | , ) log ( , | ) ( | , ) log ( | , )

log ( | ) ( | , ) log ( , | ) ( | , ) log ( | , )

log ( | )

P y x P x P y x P x y P y x P y x

P x P y x P x y P y x P y x

( | , ) log ( , | ) ( | , ) log ( | , )

( | , ) = ( | ) ( | ) ( | , ) log

( | , )

t t t t

tt t t t

P y x P x y P y x P y x

P y xQ Q P y x

Constant >=0

The EM algorithm (cont.)

Main component:

is the expectation of logP(x,y|θ) over the distribution of y given

by the current parameters θt

The algorithm:

• E-step: Calculate the Q function

• M-step: Maximize Q(θ|θt) with respect to θ

• Stopping criterion: improvement in log likelihood ≤ ε

( | ) ( | , ) log ( , | )t t

Q P y x P x y

Application to the mixture model

( | ) ( | , ) log ( , | )t t

Q P y x P x y

( , | ) ( , | ) ( , | )

i i i i

P x y P x y P x y j

log ( , | ) log ( , | )

( | ) ( | , ) log ( , | )

( | , ) log ( , | )

ij i i

P x y y P x y j

Q P y x y P x y j

P y x y P x y j

Application (cont.)

( | ) ( 1| , ) log ( , | )

( , | ): ( 1| , )

( , | )

ij i i i

tt t i iij ij i t

Q P y x P x y j

P x y jw P y x

P x y j

( )1( | ) log log log

i jt t

xQ w p

Baum-Welch: EM for HMM

y=π, i.e. the log likelihood is

And the Q function is:

log ( | ) log ( , | )P x P x

( | ) ( | , ) log ( , | )t tQ P x P x

Baum-Welch (cont.)

( , ) ( )

( , | ) [ ( )]k kl

M M ME b A

k b k l

P x e b a

Emission probability, state k

character b

Transition probability, state k

to state l

Number of times we saw b from k at π

Baum-Welch (cont.)

( | ) ( | , ) ( , ) log( ( )) ( ) log

( | , ) ( , ) log( ( )) ( | , ) ( ) log

M M Mt t

k k kl kl

k b k l

M M Mt t

k k kl kl

k b k l

Q P x E b e b A a

P x E b e b P x A a

( | , ) ( )t

kl klP x A A

( | , ) ( , ) ( )t

k kP x E b E b

value probability expectation value probability expectation

•So we want to find a set of parameters θt+1 that maximizes:

•Ek(b), Akl can be computed using forward/backward:

•For maximization, select:

Baum-Welch (cont.)

( ) log( ( )) logM M M

k k kl kl

k b k l

E b e b A a

( ) , ( )

ij kij k

A E ba e b

P(i=k, i+1=l | x, t) = [1/P(x)]·fk(i)·akl·el(xi+1)·bl(i+1)

Akl = [1/P(x)]· i fk(i) · akl · el(xi+1) · bl(i+1) similarly, Ek(b) = [1/P(x)] · {i|xi=b}

fk(i) · bk(i)

Relative entropy is positive

log(x)≤x-1

)(log)(

Maximize:

Baum-Welch: EM for HMM

( ) log( ( )) logM M M

k k kl kl

k b k l

E b e b A a

chosen

( ) (denote as a ), ( )

ij kij k

A E ba e b

always positive

1 1 1 ' 1 '

log log

chosen chosenM M M Mkl kl kl

kl ikother otherk l k k lkl ik kl

chosenM Mchosen kl

ik kl otherk k l kl

a A aA A

Difference between chosen set and some other:

Multiply and divide by same factor

prof. ron shamir & prof. roded sharan school of computer...

Documents

teknik pemecahan kunci algoritma rivest shamir adleman

ofra korat adina shamir bar-ilan university

3. rsa (rivest, shamir, adleman) encryption.pdf

prijslijst vw sharan-1

sharan - volkswagen españa€¦ · 03 sharan. el hogar que...

sharan - giustozziauto.com · sharan. uno spazio dove...

2010 volkswagen sharan prijslijst 1008

accesorios para el sharan - grupo sealco

6 g ilad shamir

ficha tecnica sharan 2003 - volkswagen sharan 2003...

15 sharan noiembrie 2013

shamir 1981

nuevo sharan - micoche.com filenuevo sharan datos por...

kalidasam namami - bhagavat sharan upadhyaya

sharan 2010

shamir 23 (proyecto personal)

sharan - volkswagen.it · sharan. uno spazio dove sentirsi...

mehgadoot kalidas - bhagavata sharan upadhyaya

maithali sharan-6.01.pdf

architetti, shamir by ef cinisello