lecture 8 random walks &...

Lecture 8

Random walks & macromolecules

Zhanchun Tu (涂展春 )

Department of Physics, BNU

Email: [email protected]

Homepage: www.tuzc.org

mailto:[email protected]

Main contents

● Deterministic vs statistical descriptions of

macromolecular structures

● Macromolecules as random walks

● Single-molecule mechanics

● Proteins as random walks

§8.1 Deterministic vs statistical

descriptions of structures

Structure Atomic coordinates (r1,r

2,...,r

N)

Deterministic description

Structure Average Size & Shape of macromolecules

2RG

x

y

z

RN

RG=?

⟨RN ⟩=? ; ⟨RN2⟩=? ; pRN =?

Statistical descriptions

Simple polymer

● Example

Long-chain molecule

Bondlength

Bondangle

PE: polyethylene (聚乙烯 )

Thermal motion does NOT excite the DOFs of bond length and angle!

Rotational DOF

l0

0° 120°-120°

C

C

C

C

gauche gauche(旁式 )trans(反式 )

l0, θ fixed!

● Flexibility (柔性 )

Locally static flexibility

P gaucheP trans

=e− /kB T

≃1 for k B T

Gauche/trans conformations will be found in similar frequency in the local part of a polymer. Thus the local part of a polymer appears as a random coil.

P gaucheP trans

=e− /kB T

≃0 for ≫k B T

Gauche conformations is seldom found in local part of a polymer. Only trans conformations. Locally like a rigid rod.

Persistence length (驻留长度 ) and Globally static flexibility

p=l0 e /k B TQuestion: what's physical meaning of ξ

p?

e− / kB T Probability of gauche conformation between near neighbor bonds.

How many bonds will occur 1 gauche conformation?

1

e−/ kB T=e

/ kB T=p/ l 0

1 gauche conformation can occur in persistence length

(1) if total length < ξp, the polymer seems a rigid rod.

(2) if total length >> ξp, many gauche conformations

occur in the polymer. The whole chain is random coil. Locally rigid while Globally flexible.

p= 0 e E / kB T

τ0~10 ps

Persistence time (驻留时间 ) & Dynamic flexibility (动态柔性 )

gauche gauchetrans

Transition time from trans to gauche

(1) tobserve

<τp, polymer is frozen in one configuration. Dynamically rigid.

(2) tobserve

>>τp, polymer transits in different configurations.

Dynamically flexible.

Now, we only consider the case length >> ξp and t

observe>>τ

p

● End-to-end distance (首末端距 )

x

y

z

RN

RN

: End-to-end vector

Thus, RN is a stochastic variable!

⟨RN ⟩=? ; ⟨RN2⟩=? ; pRN =?

length >> ξp

tobserve

>>τp

Polymer transits between a large number of configurations

§8.2 Macromolecules as random walks

Basic idea● Macromolecules are regarded as rigid

segments (链节 ) connected by hinges (铰 )

DNA on a surface (AFM image) Representation of DNA as random walk

1D random walk model 3D random walk model

Condition: L>>ξp

Mathematical treatments

● Drunkard's walk (醉汉走路 )

Are there any rules on the position R of the drunkard?

<R>=0

<R2>=0?

Note: R is a vector

● Mean square end-to-end distance

0

a a a a a a a a a a a a a a a x

a---Length of each step

xn---position after the n-th step

x0=0---start point

kna---displacement of the n-th step with P(k

n=1)=P(k

n=-1)=1/2

xn=xn−1k n a

Problem: prove that <xn>=0.

⟨ xn⟩=⟨ xn−1k n a ⟩=⟨ xn−1⟩⟨k n⟩ a=⟨ xn−1⟩⇒

Proof: ⟨k n⟩=1×1/2−1×1/2=0⇒

=⟨ xn−2⟩=...=⟨ x1⟩=⟨ x0⟩=0

1D random walk

⟨ xn2⟩=?

xn2= xn−1kn a 2=xn−1

22 ak n xn−1k n

2 a2

k n2=±12=1

⟨k n xn−1⟩=1 xn−1 P1−1 xn−1 P −1=0

1/2

⟨ xn2⟩=⟨ xn−1

22ak n xn−1k n

2 a2⟩=⟨ xn−1

2⟩a2

⇒⟨ xN2⟩=Na2

rn=xn , yn=xn−1 , yn−1k xn , k yn a

1 0 -11 0 1/4 00 1/4 0 1/4

k xn

k yn

P rn2=xn

2 yn

2

Problem: prove that

⟨rN2⟩=Na2

x

y

aaaaaa

a a a a a a

rn=xn , yn , zn=xn−1 , yn−1 , z n−1k xn , k yn , k zna

P±1,0,0=P0,±1,0=P 0,0,±1=1/6 ; 0 for others.

Problem: prove that

⟨rN2 ⟩=Na2

2D random walk

3D random walk

Summary: ⟨rN2 ⟩=N a

● Total configurations of N-step 1D random walk

(1) The probabilities of right and left steps are same

(2) Each step starts with no concern for the orientation of the previous segment

(3) Each step has two kinds of choice

Total configurations of N-step = 2N

2N different permissible configurations for an N-segment macromolecule

P(each configuration)=1/2N

● Distribution of end-to-end distance

Qestion: N-step walks, what is the probability that nr rightward steps?

the realizations W(nr,N) of n

r rightward steps in N-step walks

nr=0, W=1

nr=1, W=3

nr=2, W=3

nr=3, W=1

p nr , N =W nr , N

2N=

N !nr ! N−nr!

1

2N Binomial distribution

Problem: verify this probability distribution is normalized

p nr , N =N !

nr !N−nr! 1

2N

Relation between end-to-end distance ( R ) and nr

R=nr−n la

N=nrnl

P R , N dR= pnr , N dnrnrR

P R , N = pnr , N dnr

dR=

pnr , N

2 a

Probability distribution function for the end-to-end distance

(Gaussian distribution)

Parameter: N=100, a=1/2

Line: Gaussian distribution

Dot: binomial distribution

Central limit theorem: probability distribution of x

1+x

2+...+x

N (a sum of

identically distributed independent random variables) is Gaussian in the limit of large N

Problem: prove that ⟨ R ⟩=0, ⟨R2⟩=Na2

3D case

Central limit theorem

Normalization

Variance

Sharp peak of P(R; N) at R=0

Stretch a polymer so that R is nonzero, then after release

it will quickly find itself in the R ≈ 0 state.

R ≈ 0 state is a much more likely state

0 F

This is not the result of a physical force (eg. electric force),

but purely a result of statistics.

Other example: pressure

● Entropic elasticity

● Persistence lengthThe length scale over which the tangent-tangent correlation function decays along the chain r(u)

r(s)

t(s)

t(u)

for L>>ξp

On the other hand, (N=L/a>>1)-step random walk ⟨R2⟩=Na 2=L a

a=2 p Kuhn length = 2 X Persistence length

The size of genome● Radius of gyration (回转半径 )

It measures the average distance between the monomers and the center of mass of the polymer

⟨Rkl−R k 2 ⟩=l a2

=4 l p2

For DNA or RNA

randomwalk

0

ii+1

Ri

● Estimate: Size of Viral and Bacterial Genomes

Bacteriophage genomes of T2 and T4: N bp≈150 kb

Bacterium:

Persistence length

Observed result slightly smaller

than the estimated value

DNA from Bacterium

Geography of Chromosomes● Chromosomes have separate territories (领地 )

within the nucleus

In human cell nucleus

● Chromosomes are tethered at different locations in nucleus

Two possible tethered ways: (A) at the centromeres and the two telomeres. (B) at discrete chromosomal loci interact with the nuclear envelope.

着丝粒

端粒

● Simple tether model

Without tethers

With tethers R is fixed P r = P r−R

N: Number of segments between markers

Data: Experiment on Chr. III of E. coli

Tether model

Non-tether model

DNA looping● Examples of looping

long distance DNA looping ofchromosome before genetic recombination

● Probability of looping for long DNA fragments

Based on 1D random walk

p nr , N =N !

nr !N−nr! 1

2N R=nr−n la

N=nrnl

Let R=0

Stirling formula

∝N−1/2

Based on 1D Gaussian distribution of end-to-end distance

for−R≪N a

p°=∫−

P R ; N dR≈ ∝N−1/2

Based on 3D Gaussian distribution of end-to-end distance

≈ ∝N−3/2

Thus po depends on the dimension of space

PCR, DNA Melting & DNA Bubbles● PCR

● DNA melting

Min Energy, Min Entropy

Max Entropy, Max Energy

F= E - TSWhen increase T, the decrease of -TS overcomes the increase of E

DNA melting min F

competition

● Single bubble model

Note: ssDNA more flexible than dsDNA

Bubble length: n bp

Total DNA length: N bp

Free energy of forming 1 bubble

[ ]

energy for initiating a bubble by one base pair

energy for elongating a bubble with n base pair

number of ways of making a bubble

number of ways of choosing the position of the bubble at the DNA chain

Recall probability for N-step random walk:

Number of loops for 2n-step random walk:

On=2 n!n !n !

+const.

ddn

G1n

k BT=0⇒

Min Free energy 2 ln 2−1

2 n−

1N−n1

= el≡E el

k B T

y=2ln 2−1

2n−

1N−n1

2ln2

(1) Low enough temperature el2 ln 2≈1.39

No solution!

Free energy monotonically increases with n

Min Free energy <=> n=1 <=> dsDNA stable

N=100 bp

(2) High enough temperature

el2 ln 2≈1.39

Two solutions!

The larger one is more stable

Large bubble

dsDNA melting

§8.3 Single-molecule mechanics

Single-Molecule Techniques● Atomic-force microscopy

Measure tension force & extension

Accuracy: 1nN, 0.01nm

● Optical tweezers

measure the rate of transcription

Tension force: 1- 50 pN

Extension accuracy: <1nm

● Magnetic tweezers

Tension force

Twist moment

measure the torsional properties of DNA

● Pipette-based force apparatus

measure ligand-receptor adhesion forces

Force-Extension Curves: force Spectroscopy

● Different Macromolecules Have Different Force Signatures When Subjected to Loading

dsDNA RNA

protein made of repeats of Ig module

Random walk models for force-extension curves

● 1D model

mg=f

Total length: Ltot

=Na

Extension: L=(nr-n

l)a

Ltot

=Na L=(nr-n

l)a

Min G

The most possible ratio of nr/n

l:

Relative extension: = tanhfa

k B T

For fa<<kBT : z= fa /k B T

● Results of 3D random walks and others

z= fa /3k BT

Small fon lattice

off lattice

● Homework

Figure 8.37(B)

§8.4 Proteins as random walks

Compact random walk

● Native state of protein are usually compact

mapping

Protein folding● HP model [Science 273 (1996) 666]

A protein is represented by a self-avoiding chain of beads placed on a discrete lattice, with two types of beads used to mimic polar (P) and hydrophobic (H) amino acids

E=∑i=4

N

∑j=1

i−3

J i j∣ri−r j∣−1

i={P , if site i has a P monomerH , if site i hasan H monomer

J PP=0, J HH=−2.3,J HP=J PH=−1

Unit lattice length

Native configurations of proteins might Minimize E!

You may ask: why do the parameters take these values?

Consider a chain of 27 beads fills a 3x3x3 lattice, simulations tells us there are 51704 structures unrelated by rotational, reflection, or reverse-labeling symmetries.

Sequence space ----------> Structure space

Among 227 possible sequences, simulations show that 4.75% (=6039797) of the sequences have unique ground states

Intuitive reasons:

● Designability (可设计性 )

(1) H monomers are buried as much as possible (note: buried inside=>more contact neighbors), which is expressed by the

relation JPP

>JHP

> JHH

, which lowers the energy of configurations in

which H residues are hidden from water.

(2) different types of monomers tend to segregate,

which is expressed by 2JHP

> JPP

+ JHH

.

Each structure corresponds to more than 1 sequences

NS: the number of sequences that corresponds to a structure S

Larger NS implies that structure S has higher designability.

Structures differ markedly in terms of their designability. Highly designable structures are thermodynamically more stable than other structures and exhibit certain secondary structures. In the structures with the 10 largest

NS values, all have parallel running lines (like

β-sheet) folded in a regular manner.

A highly designable structure

Protein structures are selected in nature because they are readily designed, and that such a selection simultaneously leads to thermodynamic stability. The protein structures in nature should have high designability.

Conclusion:

Suggestion:

§. Summary & further reading

Summary● Random walk model of macromolecules

– End-to-end distance

– Probability distribution function (1D & 3D)

– Radius of gyration

– Probability of looping

– DNA melting

⟨rN2 ⟩=N a

∝N−3/2po∝N−1/2 (1D) (3D)

el≡Eel

k B T

Increase T

● Single-molecule mechanics

z=⟨ L ⟩Ltot

=tanhfa

k B T1D random walk

3D Random walk on lattice

3D Random walk off lattice

Further reading

● Phillips et al., Physical biology of the cell, Ch8● de Gennes, Scaling concepts in polymer

physics● Doi & Edwards, The theory of polymer

dynamics● Li et al., Emergence of Preferred Structures in a

Simple Model of Protein Folding, Science 273 (1996) 666

lecture 8 random walks &...

Documents