structural bioinformatics chih-hao lu cadd/0916.ppt

92
Structural Bioinformatics Chih-Hao Lu http://140.128.63.6/~CADD/0916.ppt

Upload: jacob-flynn

Post on 30-Dec-2015

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Structural Bioinformatics

Chih-Hao Lu

http://140.128.63.6/~CADD/0916.ppt

Page 2: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

陸志豪助理教授

[email protected]

國立交通大學生物資訊所 博士

結構生物資訊、計算生物學、演化式計算與機器學習

蛋白質區域結構模組與功能預測蛋白質結構與動力學的相關研究蛋白質與分子的交互作用相關研究

學歷

專長

研究領域

Page 3: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

• To identify drugs that inhibit target proteins involved in diseases and have therapeutic effect against diseases– Drugs often have stronger binding affinities than

natural compounds

Mechanism of drug actions

Target protein

A pathway of disease

ProteinProtein

Natural compound

Drug

xx x x

Page 4: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Classification of Drug Development

Protein

(receptor) S

tructu

re

Compound structure Known Unknown

Known

Unknow

n

Structure-based Drug Design (SBDD)

SBDD or de novo design

High-Throughput Screening(HTS)

Compound similarity searchO

O

O

O

O

O

query Similar compounds

OO

DDT 2002

Page 5: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Central Dogma

Page 6: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 7: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Why study protein structure?

• Proteins play crucial functional roles in all biological processes: enzymatic catalysis, signaling messengers …

• Function depends on 3D structure.

• Easy to obtain protein sequences, difficult to determine structure.

7

Page 8: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

From primary to quaternary

Page 9: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Primary Structure

•蛋白質的骨架是由二十種胺基酸 (Amino Acid)所組成的長條序列

•胺基酸彼此是由胜汰鍵 (Peptide Bond)所連結

Page 10: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Proteins are polypeptide chains

Page 11: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Fundamentals of protein structure 20 Amino Acids

?

Page 12: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Amino acid Abbreviated names MtOccurrence in proteins(%)

Glycine Gly G 75 7.2

Alanine Ala A 89 7.8

Valine Val V 117 6.6

Leucine Leu  L 131 9.1

Isoleucine Ile I 131 5.3

Methionine Met M 149 2.3

Phenylalanine Phe F 165 3.9

Tyrosine Tyr Y 181 3.2

Tryptophan Trp W 204 1.4

Serine Ser  S  105 6.8

Proline Pro P 115 5.2

Threonine Thr T 119 5.9

Cysteine Cys C 121 1.9

Asparagine Asn N 132 4.3

Glutamine Gln Q 146 4.2

Lysine Lys K 146 5.9

Histidine His H 155 2.3

Arginine Arg R 174 5.1

Aspartic acid Asp D 133 5.3

Glutamic acid Glu E 147 6.3

Page 13: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCII

Sequence

Secondary Structure

Page 14: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

helix

•平均每 3.6個殘基(Residues)形成一個轉折

• helix的結構是由氫鍵 (Hydrogen bonds)的交互作用形成

Page 15: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

310helix, helix, helix

Page 16: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

The helix has a dipole moment

Page 17: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Some amino acids are preferred in helices

• Good– Ala Glu Leu Met

• Poor– Pro Gly Tyr Ser

•結構具有雙向性 (Amphipathic)–疏水性 (Hydrophobic)–親水性 (Hydrophilic)

Helical wheel

Page 18: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

sheet

• sheet 是由數個彩帶狀的 strand 所組成的平面

•每兩個 strand可以分成平行(parallel)與反平行(antiparallel)的結構

Page 19: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Antiparallel sheets

Page 20: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Parallel sheets

Page 21: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Turn or Loop

•連接 helix或是 strand 時, peptide bond需要作將近 180度的轉折,這些區域就稱之為 Turn

•此外有一些不規則的結構,統稱為 Loop

Turn

Loop

Page 22: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Hairpin loops

Page 23: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Secondary structure elements are connected to form simple motifs

Page 24: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Schematic diagrams of the calcium-binding motif

Page 25: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

(Luscombe, Genome Biology 2000)

Page 26: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

The hairpin motif occurs frequently in protein structures

Page 27: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

The Greek key motif is found in antiparallel sheets

Page 28: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Tertiary Structure

sheet

Helixloop

Tertiary Structure

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

Sequence

Secondary Structure

•數個 secondary structure聚在一起,就形成了蛋白質的三級結構 (Tertiary Structure)

Page 29: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Simple motifs combine to form complex motifs

Page 30: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Quaternary structure

•由數個相同或是不同的三級結構分子(subunit),再結合而成的複合體,稱為四級結構。

Page 31: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

How to determine the protein structure?

• By experimentation– X-Ray– NMR (nuclear magnetic resonance spectroscopy)

• Sequence-Structure gap

31

Page 32: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Target Selection

CrystallomicsData

CollectionStructureSolution

StructureRefinement

FunctionalAnnotation

Structure Determination(X-ray)

Isolation, Expression,Purification,Crystallization

PDBDeposition

Publication

Page 33: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

The first x-ray crystallographic structural results in 1958

first determination 3-D globular protein structure (myglobin) in 1958 – John Kendrew

Page 34: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Molecular visualization

• Abstract views of macromolecular– well-defined secondary structure elements (-

helices and -strands)– Jane Richardson, 1985

• -helix as simple cylinder or broad, spiral ribbon• -strand as broad, flat ribbon

Page 35: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 36: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

The structure of myoglobin

Page 37: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Molecular visualization

RasMol

PyMOL

Swiss-Pdb Viewer

MOLMOL

MolScript

MDL Chime

Page 38: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Green Fluorescent Protein (GFP)

Page 39: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Green Fluorescent Protein (GFP)

Page 40: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Green Fluorescent Protein (GFP)

Page 41: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Green Fluorescent Protein (GFP)

Page 42: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

The Protein Data Bank http://www.rcsb.org/pdb/home/home.do

Page 43: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Number of Structures Available

Page 44: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Structure-based databases

• Popular software and resources for protein structure validation– PDBSum, Procheck, What_Check

• Resources classifying protein structure– SCOP, CATH, DALI, VAST, CE

• Popular resources of protein interactions– Protein-Protein(DNA) interaction server, DIP, MINT

• Popular resources visualizing macromolecular structures– PDBSum, NDB Atlas, STING

Page 45: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Protein evolution and the SCOP database

http://scop.berkeley.edu/

Page 46: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

SCOP

• Classes– all- protein

• can have small adornment of or 310 helix– all- structures

• may have several regions of 310 helix, and small -sheet outside the -helical core

/ (alpha and beta)• mainly parallel sheets (-- units)

+ (alpha plus beta)• mainly antiparallel sheets (segregated and region)

– others• multidomain proteins, membrane and cell surface proteins,

small proteins, coiled coil proteins, low-resolution structures, peptides, and designed proteins

Page 47: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Class, Fold, Superfamily and Family classification

Page 48: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

A. niger

2aaa:1-353

Acid -amylase

B. cereus

J. Biochem 113:646-649

Oligo-1,6 glucosidase

B. circulans

1cdg:1-382 1cgt:1-382

Cyclodextrin glycosyltransferase

-Glucanase -Amylase (N) -Amylase-Galactosidase (3)

B. stearothermophilus

1cyg:1-378

TIM Trp biosynthesis Glycosyltransferase RuBisCo (C)

Rossmann fold Flavodoxin-like -Barrel

scop Root

Class

Fold

Superfamily

Family

Protein

Species

PDB/Ref

SCOP Sample Hierarchy

Det

erm

ined

by

stru

ctu

re

Rel

ated

by

hom

olog

y

Page 49: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 50: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 51: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 52: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

The CATH domain structure database http://www.cathdb.info/index.html

Page 53: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

CATH http://www.cathdb.info/index.html

Page 54: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 55: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 56: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Structure quality assurance • Not all structures are of equally high quality• Models from X-ray crystallography• Models from NMR spectroscopy• Errors in deposited structures• Procheck, What_Check

2YSB

Page 57: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Ramachandran Plot

• A graph between the dihedral angles of an amino acid in a protein.

• Due to steric hindrance from amino acid side chains, only certain angles are allowed in a folded protein.

• A plot between the dihedral angles of individual amino acids in a protein can serve to indicate how well the structure has been determined.

• Any deviations from the allowed values are called Outliers and usually indicate bad geometry

Dihedral Angles

Page 58: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Ramachandran Plot

Standard Plot showing wheredifferent secondary structures fitinto the plot.

A real life example. All non-glycineresidues are in allowed regions.

Page 59: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Validation

• Ideally, there should be no outliers in the Ramachandran plot, except for Glycine and Proline, which are “special” amino acids.

• However, there may be some rational explanation for outliers by the scientist depositing the structure. (Always refer to the publication!).

• Expect to find more than 85-90% of residues to fall into the red regions.

So what do you think about this ?

Page 60: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 61: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Secondary structure assignment http://swift.cmbi.ru.nl/gv/dssp/

http://e106.life.nctu.edu.tw/~hwhuang/dssp/

http://140.128.63.6/~bioinformatics/MDLChime26SP4.exe

Page 62: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

The role of secondary structure

• In structural genomics– basic unit for structure classification– main uses

• it is indicative of the fold• it is an intuitive means of visualizing protein structure• it influences the sequence alignment• it is related to function

– applications (ex. Secondary Structure Element)• speed up large-scale all-against-all alignment of 3D

structures• comparative modeling and threading

Page 63: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Hydrogen Bonding is Key to Automated Methods

• Why? - ~90% of backbone donors (NH) and acceptors (C=O) form hydrogen bonds

• Basic definition – Angle N – (H) – O greater than 120 degrees – H …O less than 2.5Å– Note H’s not usually identified directly

Page 64: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Angle-distance hydrogen bond assignment

• Baker and Hubbard assigned hydrogen bonds according to the angle N-H-O and to the distance rHO (1984)

N

O

H

<2.5Å>120°

?

Page 65: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

N

O

H

2.5Å120°

?

60°

30°

1.25Å

2.165Å

~3.122Å

N OH

1Å 2.5Å

180°

Page 66: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Coulomb hydrogen bond calculation – used by DSSP

E = f + - 1

rNO+

1

rHC'+

1

rHO+

1

rNC'

• f is a constant 332 Å kcal/e2

• Delta is the + and – polar charge in electrons• Weakest H-bond –0.5 kcal/mole in DSSP• H not given – requires extrapolation – note assumes

planar geometry for peptide bond

Page 67: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

DSSP

• H – alpha helix• G = 310 helix• I = Pi helix• B = bridge – single residue sheet• E = extended beta strand• T = beta turn• S = bend• C = coil

http://e106.life.nctu.edu.tw/~hwhuang/dssp/

Page 68: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

DSSP as Implemented in the PDB

1ATP

Page 69: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Identifying structural domain and function in proteins

1NTY

Page 70: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 71: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Prediction of protein-protein or protein-DNA interaction

• Sequence-based methods– Homology

– Correlated Mutation

• Structure-based methods– Physical docking

• Hybrid methods

Page 72: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Principles and methods of docking and ligand design

• Structure-based design– Docking

• Analog-based design– QSAR

– (Quantitative structure-activity relationships)

Page 73: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Most force fields consist of a summation of bonded forces associated with chemical bonds, bond angles, and bond dihedrals, and non-bonded forces associated with van der Waals forces and electrostatic charge.

Page 74: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt
Page 75: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Fold recognition method

Page 76: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Prediction in 1D

– Secondary structure prediction– Solvent accessibility prediction– Disulfide bond prediction– Fold recognition– Enzyme class prediction– Subcellular localization prediction– Metal binding sites prediction– Disulfide connectivity prediction– Phi psi angle prediction

Page 77: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Secondary structure prediction

sheet

Helixloop

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

EEEELLLLLHHHHHHHHHHHHLLLLLHHHHHHLLLLEEEELLLLL

H Helix

E sheet

L loop

Page 78: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Solvent accessibility prediction

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

EEEEBBBBEEEEEBBBBBBBEEEEEEBBBBBBBEEEEEEEBBBBEE

B Buried

E Exposed B

B

B B

B

B

EE

E EE

E

Page 79: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Disulfide bond predictionTTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

O O R O O R

Page 80: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Fold recognition

SCOP

Root classes folds superfamily family proteins species

/

+

Multi-domain

Membrane..

Small protein..….….…

TIM barrel

TIM…Aldolase………

TIM………

TIM ChickenHuman……….

StructureClassificationOfProteins

SCOP statistics 11 800 1294 2327

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

?

Page 81: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Subcellular localization prediction

Eukaryotic Cellular compartments

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

?

Page 82: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Metal binding sites prediction

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

NNNNBNNBNNNNNBNNNNNBNNNNNNNNNNNNNNNNNNBNNN

B Binding

N Non-binding

Page 83: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Phi psi angle predictionRamachandran plot • Phi Cn-1 – Nn – Cn – Cn

• Psi Nn – Cn – Cn – Nn+1

A B

D

C

E FG H I

J K L

M N O

P Q R

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

AADGJJKKCPGDANOOEEAAAAJJJJJJJJKKNNQQCCJJJJAAAA

Page 84: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

TTC1C2PSIVARSNFNVC3RLPGTPEAIC4ATYTGC5IIIPGATC6PGDYAN

C1

C6

C2

C5

C4

C3

connectivity pattern 1-6, 2-5, 3-4

Disulfide Connectivity Prediction

Page 85: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Class 1 Features 1~NClass 2 Features 1~N Class 3 Features 1~N Class 4 Features 1~N :

:

Class K Features 1~N

Training Data

SVM

SVM Model

Page 86: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Class 1

Class 2

Class 3

:

:

Class K

Feature 1

Feature 2

Feature 3

:

:

Feature N

Testing Data

SVM

SVM Model

Page 87: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Protein Structure Prediction

Sequence

Sequence HomologyTo known fold

HomologyModeling

>30%

Threading

Match Found?

Ab initio

No

Model

Yes

<30%

87

Page 88: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Homology modeling

• The goal of protein modeling is to predict a structure from its sequence– Template recognition and initial alignment

– Alignment correction

– Backbone generation

– Loop modeling

– Side-chain modeling

– Model optimization

– Model validation

Page 89: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

??

KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Use as template

8lyz1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare Similar

Sequence

Homologous

What is Homology Modeling?

Target Template

89

Page 90: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Structure prediction by homology modeling

90

Step 1

Step 2

Step 3

Step 4

Page 91: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Structure comparison and alignment

1CRN 1JXX

CE

http://cl.sdsc.edu/ce.html

DALI

http://ekhidna.biocenter.helsinki.fi/dali_server/

Page 92: Structural Bioinformatics Chih-Hao Lu CADD/0916.ppt

Homework (10/6上課前交 )1. 根據以下條件,在 Protein Data Bank上搜尋,並列出所搜尋到的 PDB ID– Hemoglobin– Has ligand: Yes– X-ray resolution<1.5– Homo sapiens

2. 比較 PDB ID:1ema 在 DSSP 與 STRIDE的二級結構差異

3. 利用 PDBsum裡的 Procheck分析 PDB ID:1atp