rna 3d structure prediction with nast
TRANSCRIPT
RNA 3D Structure Prediction with NAST
Xinpei Liu
刘欣培
Background
Test Simulations with NAST
Introduction to NAST
Content
1
2
3
4
System Consistency5
Effect of Secondary Structure
RNA folding vs Protein folding RNA 3D Structure Prediction Tools
• Manual
• Automatic
• Full atomic
• Coarse grained
• Physics based
• Knowledge based
Background
Introduction to NAST
Nucleic Acid Simulation Toolkit (NAST)• Funded by the Simbios National Center for Biomedical Computing• A knowledge-based coarse-grained tool for modeling RNA structures. It produ
ces a diverse set of plausible 3D structures that satisfy user-provided constraints based on:
• 1. Primary sequence• 2. Known or predicted secondary structure• 3. Known or predicted tertiary contacts (optional)
Requirements:• Python 2.6.x• PyOpenMM 2.0.0 (3.0.0 won't work!)
https://simtk.org/home/nastJonikas MA, Radmer RJ, Laederach A, Das R, Pearlman S, Herschlag D, Altman RB. Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters. RNA. 2009 Feb;15(2):189-99.
Advantages
• Provide information about the likely topology
of a molecule• Provide a good starting point for higher resolution atomic models
• Be able to handle large molecules (> 76nt) • Much faster than full-atomic simulation tools
• 1,000,000 steps within 138s• Allow uncertainty in the secondary structure (within a certain level)
Introduction to NAST
How to use NAST? • Primary Sequence File
• Go to http://www.rnasoft.ca/strand/ • Search for your structure and get a BPSEQ file • Use "parseBPseq.py" file in the package to generate a sequence f
ile • Secondary Structure File
• Use secondary structure prediction tool • e.g., Mcgenus• http://eole2.lsce.ipsl.fr/ipht/tt2ne/mcgenus.php
• Tertiary Contacts File (optional)• From experiments or phylogenetic analysis
Introduction to NAST
PDB ID 1ZIH 389 atoms 12 residues
Test Molecule Used
Simulations 1ZIH from an Unfolded Circle State 1,000,000 steps
Definition of q value
q is a normalized measure of similarity between a reference and comparison structure:
RMSD Mean: 2.683Sd: 0.449
3
2
3.5
4
Simulations
2.5
q value (ref.: crystal structure)Mean: 0.250 Variance: 0.00686
1ZIH from an Unfolded Circle State 1,000,000 steps
q value Mean: 0.246Variance: 0.00686
RMSD Mean: 2.704Sd: 0.454
Reference value:
Definition of GDT_TS Score
GDT_TS score The Global Distance Test Total Score (GDT_TS) of Ca atoms is used to assess the correctness of the predicted model. GDT_TS has been commonly used in modeling studies and in the CASP community. GDT_TS is defined as:
where N in the total number residues of a target, GDTd is the number of aligned residues whose Ca-atom distance between the native structure and predicted model is less than d A (angstrom) after superposition of the two structures; and d is 1, 2, 4, and 8 A (angstrom).
•Zemla A: LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003, 31: 3370-3374.
Simulations
GDT_TSMean: 57.656%Sd: 7.223%
1ZIH from an Unfolded Circle State 1,000,000 steps
Test Molecule Used PDB ID 4JF2 1829 atoms 77 residues
Simulations 4JF2 From Unfolded Circle state 1,000,000 steps
RMSD Avg.: 11.830Sd: 2.591
15
Simulations
10
q valueMean: 0.128Variance:0.000788
20
4JF2 From Unfolded Circle state 1,000,000 steps
RMSD:10.3 ± 2.3Reference value:
q valueMean: 0.125Variance: 0.000964
Simulations
GDT_TSMean: 9.620%Sd: 4.681%
4JF2 From Unfolded Circle state 1,000,000 steps
14% ± 5% (the best cluster)
Reference value
1ZIH from Crystal Structure 1,000,000 steps
RMSD Mean: 8.364Sd: 1.710
Without Secondary Structure Constraints With Secondary Structure Constraints
• RMSD
32.52
3.5
RMSD Mean: 2.704Sd: 0.454
10
5
8
67
9
Effect of Secondary Structure
Without Secondary Structure Constraints
With Secondary Structure Constraints
• q value (ref: crystal)
q valueMean: 0.130Variance: 0.00458
q value Mean: 0.246Variance: 0.00686
1ZIH from Crystal Structure 1,000,000 steps
Effect of Secondary Structure
RMSD mean: 22.860Sd: 4.798
Without Secondary Structure Constraints
With Secondary Structure Constraints
• RMSD
RMSD Avg.: 11.378Sd: 1.176
4JF2 From Crystal Structure 1,000,000 steps
30
25
20
15
10
5
10
5
Effect of Secondary Structure
Without Secondary Structure Constraints
With Secondary Structure Constraints
• q value (ref: crystal)
q valueMean: 0.125Variance: 0.000964
q valueMean: 0.0761Variance: 0.00136
4JF2 From Crystal Structure 1,000,000 steps
Effect of Secondary Structure
Effect of Secondary Structure• Simulations with different percentage of wrong pairs in secondary structure
(600, 000 steps)
Mean Std.0% 6.0969 2.597115% 5.4951 1.805425% 4.4746 1.274835% 2.6558 2.0450
q valueMean: 0.3969Variance: 0.02268
System Consistency 1ZIH from an Unfolded Circle State
Reference Model: resulted structure from simulation with crystal structure (1,000,000 steps)
Reference Model:Crystal Structure (1,000,000 steps)
q value Mean: 0.246Variance: 0.00686
Reference Model: resulted structure from simulation with crystal structure (1,000,000 steps)
q valueMean:0.223Variance: 0.00276
4JF2 From Unfolded Circle stateSystem Consistency
q valueMean: 0.128Variance:0.000788
Reference Model:Crystal Structure
Folding result from NAST is able to provide a basic idea of the structure for a given sequence.
Small proportion of mistakes doesn’t really influence folding result but this holds only within a certain level.
The simulation will more likely to generate a folding that is more similar to other resulted models (with the same steps), instead of crystal structure
More tests with GDT-TS may be needed.
Conclusion
Any Question or Comment?