lecture 8 random walks &...
TRANSCRIPT
Lecture 8
Random walks & macromolecules
Zhanchun Tu (涂展春 )
Department of Physics, BNU
Email: [email protected]
Homepage: www.tuzc.org
Main contents
● Deterministic vs statistical descriptions of
macromolecular structures
● Macromolecules as random walks
● Single-molecule mechanics
● Proteins as random walks
§8.1 Deterministic vs statistical
descriptions of structures
Structure Atomic coordinates (r1,r
2,...,r
N)
Deterministic description
Structure Average Size & Shape of macromolecules
2RG
x
y
z
RN
RG=?
⟨RN ⟩=? ; ⟨RN2⟩=? ; pRN =?
Statistical descriptions
Simple polymer
● Example
Long-chain molecule
Bondlength
Bondangle
PE: polyethylene (聚乙烯 )
Thermal motion does NOT excite the DOFs of bond length and angle!
Rotational DOF
l0
0° 120°-120°
C
C
C
C
gauche gauche(旁式 )trans(反式 )
l0, θ fixed!
● Flexibility (柔性 )
Locally static flexibility
P gaucheP trans
=e− /kB T
≃1 for k B T
Gauche/trans conformations will be found in similar frequency in the local part of a polymer. Thus the local part of a polymer appears as a random coil.
P gaucheP trans
=e− /kB T
≃0 for ≫k B T
Gauche conformations is seldom found in local part of a polymer. Only trans conformations. Locally like a rigid rod.
Persistence length (驻留长度 ) and Globally static flexibility
p=l0 e /k B TQuestion: what's physical meaning of ξ
p?
e− / kB T Probability of gauche conformation between near neighbor bonds.
How many bonds will occur 1 gauche conformation?
1
e−/ kB T=e
/ kB T=p/ l 0
1 gauche conformation can occur in persistence length
(1) if total length < ξp, the polymer seems a rigid rod.
(2) if total length >> ξp, many gauche conformations
occur in the polymer. The whole chain is random coil. Locally rigid while Globally flexible.
p= 0 e E / kB T
τ0~10 ps
Persistence time (驻留时间 ) & Dynamic flexibility (动态柔性 )
gauche gauchetrans
Transition time from trans to gauche
(1) tobserve
<τp, polymer is frozen in one configuration. Dynamically rigid.
(2) tobserve
>>τp, polymer transits in different configurations.
Dynamically flexible.
Now, we only consider the case length >> ξp and t
observe>>τ
p
● End-to-end distance (首末端距 )
x
y
z
RN
RN
: End-to-end vector
Thus, RN is a stochastic variable!
⟨RN ⟩=? ; ⟨RN2⟩=? ; pRN =?
length >> ξp
tobserve
>>τp
Polymer transits between a large number of configurations
§8.2 Macromolecules as random walks
Basic idea● Macromolecules are regarded as rigid
segments (链节 ) connected by hinges (铰 )
DNA on a surface (AFM image) Representation of DNA as random walk
1D random walk model 3D random walk model
Condition: L>>ξp
Mathematical treatments
● Drunkard's walk (醉汉走路 )
Are there any rules on the position R of the drunkard?
<R>=0
<R2>=0?
Note: R is a vector
● Mean square end-to-end distance
0
a a a a a a a a a a a a a a a x
a---Length of each step
xn---position after the n-th step
x0=0---start point
kna---displacement of the n-th step with P(k
n=1)=P(k
n=-1)=1/2
xn=xn−1k n a
Problem: prove that <xn>=0.
⟨ xn⟩=⟨ xn−1k n a ⟩=⟨ xn−1⟩⟨k n⟩ a=⟨ xn−1⟩⇒
Proof: ⟨k n⟩=1×1/2−1×1/2=0⇒
=⟨ xn−2⟩=...=⟨ x1⟩=⟨ x0⟩=0
1D random walk
⟨ xn2⟩=?
xn2= xn−1kn a 2=xn−1
22 ak n xn−1k n
2 a2
k n2=±12=1
⟨k n xn−1⟩=1 xn−1 P1−1 xn−1 P −1=0
1/2
⟨ xn2⟩=⟨ xn−1
22ak n xn−1k n
2 a2⟩=⟨ xn−1
2⟩a2
⇒⟨ xN2⟩=Na2
rn=xn , yn=xn−1 , yn−1k xn , k yn a
1 0 -11 0 1/4 00 1/4 0 1/4
k xn
k yn
P rn2=xn
2 yn
2
Problem: prove that
⟨rN2⟩=Na2
x
y
aaaaaa
a a a a a a
rn=xn , yn , zn=xn−1 , yn−1 , z n−1k xn , k yn , k zna
P±1,0,0=P0,±1,0=P 0,0,±1=1/6 ; 0 for others.
Problem: prove that
⟨rN2 ⟩=Na2
2D random walk
3D random walk
Summary: ⟨rN2 ⟩=N a
● Total configurations of N-step 1D random walk
(1) The probabilities of right and left steps are same
(2) Each step starts with no concern for the orientation of the previous segment
(3) Each step has two kinds of choice
Total configurations of N-step = 2N
2N different permissible configurations for an N-segment macromolecule
P(each configuration)=1/2N
● Distribution of end-to-end distance
Qestion: N-step walks, what is the probability that nr rightward steps?
the realizations W(nr,N) of n
r rightward steps in N-step walks
nr=0, W=1
nr=1, W=3
nr=2, W=3
nr=3, W=1
p nr , N =W nr , N
2N=
N !nr ! N−nr!
1
2N Binomial distribution
Problem: verify this probability distribution is normalized
p nr , N =N !
nr !N−nr! 1
2N
Relation between end-to-end distance ( R ) and nr
R=nr−n la
N=nrnl
P R , N dR= pnr , N dnrnrR
P R , N = pnr , N dnr
dR=
pnr , N
2 a
Probability distribution function for the end-to-end distance
(Gaussian distribution)
Parameter: N=100, a=1/2
Line: Gaussian distribution
Dot: binomial distribution
Central limit theorem: probability distribution of x
1+x
2+...+x
N (a sum of
identically distributed independent random variables) is Gaussian in the limit of large N
Problem: prove that ⟨ R ⟩=0, ⟨R2⟩=Na2
3D case
Central limit theorem
Normalization
Variance
Sharp peak of P(R; N) at R=0
Stretch a polymer so that R is nonzero, then after release
it will quickly find itself in the R ≈ 0 state.
R ≈ 0 state is a much more likely state
0 F
This is not the result of a physical force (eg. electric force),
but purely a result of statistics.
Other example: pressure
● Entropic elasticity
● Persistence lengthThe length scale over which the tangent-tangent correlation function decays along the chain r(u)
r(s)
t(s)
t(u)
for L>>ξp
On the other hand, (N=L/a>>1)-step random walk ⟨R2⟩=Na 2=L a
a=2 p Kuhn length = 2 X Persistence length
The size of genome● Radius of gyration (回转半径 )
It measures the average distance between the monomers and the center of mass of the polymer
⟨Rkl−R k 2 ⟩=l a2
=4 l p2
For DNA or RNA
randomwalk
0
ii+1
Ri
● Estimate: Size of Viral and Bacterial Genomes
Bacteriophage genomes of T2 and T4: N bp≈150 kb
Bacterium:
Persistence length
Observed result slightly smaller
than the estimated value
DNA from Bacterium
Geography of Chromosomes● Chromosomes have separate territories (领地 )
within the nucleus
In human cell nucleus
● Chromosomes are tethered at different locations in nucleus
Two possible tethered ways: (A) at the centromeres and the two telomeres. (B) at discrete chromosomal loci interact with the nuclear envelope.
着丝粒
端粒
● Simple tether model
Without tethers
With tethers R is fixed P r = P r−R
N: Number of segments between markers
Data: Experiment on Chr. III of E. coli
Tether model
Non-tether model
DNA looping● Examples of looping
long distance DNA looping ofchromosome before genetic recombination
● Probability of looping for long DNA fragments
Based on 1D random walk
p nr , N =N !
nr !N−nr! 1
2N R=nr−n la
N=nrnl
Let R=0
Stirling formula
∝N−1/2
Based on 1D Gaussian distribution of end-to-end distance
for−R≪N a
p°=∫−
P R ; N dR≈ ∝N−1/2
Based on 3D Gaussian distribution of end-to-end distance
≈ ∝N−3/2
Thus po depends on the dimension of space
PCR, DNA Melting & DNA Bubbles● PCR
● DNA melting
Min Energy, Min Entropy
Max Entropy, Max Energy
F= E - TSWhen increase T, the decrease of -TS overcomes the increase of E
DNA melting min F
competition
● Single bubble model
Note: ssDNA more flexible than dsDNA
Bubble length: n bp
Total DNA length: N bp
Free energy of forming 1 bubble
[ ]
energy for initiating a bubble by one base pair
energy for elongating a bubble with n base pair
number of ways of making a bubble
number of ways of choosing the position of the bubble at the DNA chain
Recall probability for N-step random walk:
Number of loops for 2n-step random walk:
On=2 n!n !n !
+const.
ddn
G1n
k BT=0⇒
Min Free energy 2 ln 2−1
2 n−
1N−n1
= el≡E el
k B T
y=2ln 2−1
2n−
1N−n1
2ln2
(1) Low enough temperature el2 ln 2≈1.39
No solution!
Free energy monotonically increases with n
Min Free energy <=> n=1 <=> dsDNA stable
N=100 bp
(2) High enough temperature
el2 ln 2≈1.39
Two solutions!
The larger one is more stable
Large bubble
dsDNA melting
§8.3 Single-molecule mechanics
Single-Molecule Techniques● Atomic-force microscopy
Measure tension force & extension
Accuracy: 1nN, 0.01nm
● Optical tweezers
measure the rate of transcription
Tension force: 1- 50 pN
Extension accuracy: <1nm
● Magnetic tweezers
Tension force
Twist moment
measure the torsional properties of DNA
● Pipette-based force apparatus
measure ligand-receptor adhesion forces
Force-Extension Curves: force Spectroscopy
● Different Macromolecules Have Different Force Signatures When Subjected to Loading
dsDNA RNA
protein made of repeats of Ig module
Random walk models for force-extension curves
● 1D model
mg=f
Total length: Ltot
=Na
Extension: L=(nr-n
l)a
Ltot
=Na L=(nr-n
l)a
Min G
The most possible ratio of nr/n
l:
Relative extension: = tanhfa
k B T
For fa<<kBT : z= fa /k B T
● Results of 3D random walks and others
z= fa /3k BT
Small fon lattice
off lattice
● Homework
Figure 8.37(B)
§8.4 Proteins as random walks
Compact random walk
● Native state of protein are usually compact
mapping
Protein folding● HP model [Science 273 (1996) 666]
A protein is represented by a self-avoiding chain of beads placed on a discrete lattice, with two types of beads used to mimic polar (P) and hydrophobic (H) amino acids
E=∑i=4
N
∑j=1
i−3
J i j∣ri−r j∣−1
i={P , if site i has a P monomerH , if site i hasan H monomer
J PP=0, J HH=−2.3,J HP=J PH=−1
Unit lattice length
Native configurations of proteins might Minimize E!
You may ask: why do the parameters take these values?
Consider a chain of 27 beads fills a 3x3x3 lattice, simulations tells us there are 51704 structures unrelated by rotational, reflection, or reverse-labeling symmetries.
Sequence space ----------> Structure space
Among 227 possible sequences, simulations show that 4.75% (=6039797) of the sequences have unique ground states
Intuitive reasons:
● Designability (可设计性 )
(1) H monomers are buried as much as possible (note: buried inside=>more contact neighbors), which is expressed by the
relation JPP
>JHP
> JHH
, which lowers the energy of configurations in
which H residues are hidden from water.
(2) different types of monomers tend to segregate,
which is expressed by 2JHP
> JPP
+ JHH
.
Each structure corresponds to more than 1 sequences
NS: the number of sequences that corresponds to a structure S
Larger NS implies that structure S has higher designability.
Structures differ markedly in terms of their designability. Highly designable structures are thermodynamically more stable than other structures and exhibit certain secondary structures. In the structures with the 10 largest
NS values, all have parallel running lines (like
β-sheet) folded in a regular manner.
A highly designable structure
Protein structures are selected in nature because they are readily designed, and that such a selection simultaneously leads to thermodynamic stability. The protein structures in nature should have high designability.
Conclusion:
Suggestion:
§. Summary & further reading
Summary● Random walk model of macromolecules
– End-to-end distance
– Probability distribution function (1D & 3D)
– Radius of gyration
– Probability of looping
– DNA melting
⟨rN2 ⟩=N a
∝N−3/2po∝N−1/2 (1D) (3D)
el≡Eel
k B T
Increase T
● Single-molecule mechanics
z=⟨ L ⟩Ltot
=tanhfa
k B T1D random walk
3D Random walk on lattice
3D Random walk off lattice
Further reading
● Phillips et al., Physical biology of the cell, Ch8● de Gennes, Scaling concepts in polymer
physics● Doi & Edwards, The theory of polymer
dynamics● Li et al., Emergence of Preferred Structures in a
Simple Model of Protein Folding, Science 273 (1996) 666