proteinmodelling-12523931616815-phpapp01

7/31/2019 proteinmodelling-12523931616815-phpapp01

1/19

Protein Modelling(Building 3D models of proteins)

By:-

Syed Asif Husain Naqvi


2/19

Why make a structural model

for your protein ? The structure can provide clues to the

function.

With a structure it is easier to guess thelocation of functional sites. We can do docking experiments (both with

other proteins and with small molecules).

With a structure we can plan more preciseexperiments in the lab.


3/19

INTRODUCTION

Proteins from different sources and sometimes diverse biologicalfunctions can have similar sequences, and it is generally accepted thathigh sequence similarity is reflected by distinct structure similarity.Indeed, the relative mean square deviation (rmsd) of the alpha-carbonco-ordinates for protein cores sharing 50% residue identity is expected

to be around 1. This fact served as the premise for the developmentof comparative protein modelling (also often called modelling byhomology or knowledge-based modelling), which is presently the mostreliable method.Comparative model building consist of the extrapolation of thestructure for a new (target) sequence from the known 3D-structure ofrelated family members (templates).


4/19

While the high precision structures required for detailed studies of protein-

ligand interaction can only be obtained experimentally, theoretical proteinmodelling provides the molecular biologists with "low-resolution" modelswhich hold enough essential information about the spatial arrangement ofimportant residues to guide the design of experiments.

The rational design of many site-directed mutagenesis experiments couldtherefore be improved if more of these "low-resolution" theoretical modelstructures were available.


5/19

Basic principles for structural

modeling Use any piece of information available from the existing

databases regarding the protein you wish to model and its

family.

Choose the most suitable algorithm according to theavailable data that you have.

Create alternative models. Check your models.


6/19

Building by Homology

There are hundreds of thousands of protein sequences but only severalthousands protein folds.

For every second protein that we randomly pick from thestructural data base there is close homolog (identity >30%). This

homolog almost always has the same fold.

In the current projects for experimental determination of protein structures,priority is given to determine structures of protein without homologs in thestructural databases

(structural genomics).

We believe that in several years we will have almost all the basic folds.


7/19

Steps of building by homology

Look for homology between your sequence and proteins in the structural databases.

Known algorithms such as Blast can be used.

If no hits were obtained, it is possible to use multiple alignment of the family for thesearch. This might be more sensitive.

Construct accurate alignment between the query sequence and all the hits that will beserve as the template during the building.

Correct alignment is crucial for this step. Any mistake can lead to significant errors inthe final model.

A possible alignment algorithms are for example ClustalWandd T-Cofee.

Manual intervention is sometimes required, especially for weak homology.More sequences within the protein family increase the chance for correct alignment.Therefore, it is frequently recommended to search sequence databases (such asSwissProt) and not only structural databases.


8/19

Identification of modelling templatesComparative protein modelling requires at least one sequence of known 3D-structure with significant similarity to the target sequence. In order to determineif a modelling request can be carried out, one compares the target sequencewith a database of sequences derived from the Brookhaven Protein Data Bank(PDB), using programs such as FastA and BLAST.

Sequences with a FastA score 10.0 standard deviations above the mean of therandom scores or a Poisson unlikelyhood probability P(N) lower than 10-5

(BLAST) can be considered for the model building procedure. The choice oftemplate structures can be further restricted to those which share at least 30%residue identity as determined by SIM.

STEPS


9/19

The above procedure might allow the selection of several suitable

templates for a given target sequence, and up to ten templates are used inthe modelling process. The best template structure - the one with thehighest sequence similarity to the target - will serve as the reference. Allthe other selected templates will be superimposed onto it in 3D. The 3Dmatch is carried out by superimposing corresponding Ca atom pairs

selected automatically from the highest scoring local sequence alignmentdetermined by SIM. This superposition can then be optimised bymaximising the number of Ca pairs in the common core while minimisingtheir relative mean square deviation. Each residue of the referencestructure is then aligned with a residue from every other available

template structure if their Ca atoms are located within 3.0 . Thisgenerates a structurally corrected multiple sequence alignment.


10/19

Aligning the target sequence with the template sequence

The target sequence now needs to be aligned with the template sequence or, if severaltemplates were selected, with the structurally corrected multiple sequence alignment.This can be achieved by using the best-scoring diagonals obtained by SIM. Residueswhich should not be used for model building, for example those located in non-conserved loops, will be ignored during the modelling process. Thus, the commoncore of the target protein and the loops completely defined by at least one suppliedtemplate structure will be built.


11/19

Building the model

Framework construction

When more than one template is available, the relative contribution, or weight, of eachstructure is determined by its local degree of sequence identity with the target sequence.


12/19

Building non-conserved loops

Following framework generation, loops for which no structural information wasavailable in the template structures are not defined and therefore must be constructed. .

Using a "spare part" algorithm, one searches for fragments which could beaccommodated onto the framework among the Brookhaven Protein Data Bank (PDB)entries determined with a resolution better than 2.5 . Each loop is defined by its lengthand its "stems", namely the alpha carbon (Ca) atom co-ordinates of the four residuespreceding and following the loop. The fragments which correspond to the loopdefinition are extracted from the PDB entries and rejected if the relative mean square

deviation (rmsd) computed for their "stems" is greater than a specified cut-off value.Furthermore, only fragments which do not overlap with neighbouring segments shouldbe retained.


13/19

Completing the backbone

Since the loop building only adds Ca atoms, the backbone carbonyl and nitrogens mustbe completed in these regions. This step can be performed by using a library ofpentapeptide backbone fragments derived from the PDB entries determined with a

resolution better than 2.0 . These fragments are then fitted to overlapping runs of fiveCa atoms of the target model. The co-ordinates of each central tripeptide are thenaveraged for each target backbone atom (N, C, O) and added to the model. This processyields modelled backbones that differ from experimental co-ordinates by approx. 0.2 rms.


14/19

Adding side chains

For many of the protein side chains there is no structural information available in thetemplates. These cannot therefore be built during the framework generation and mustbe added later. The number of side chains that need to be built is dictated by the

degree of sequence identity between target and template sequences. To this end oneuses a table of the most probable rotamers for each amino acid side chain dependingon their backbone conformation. All the allowed rotamers of the residues missingfrom the structure are analysed to see if they are acceptable by a Van der Waalsexclusion test. The most favoured rotamer is added to the model. The atoms defining

the c1 and c2 angles of incomplete side chains can be used to restrict the choice ofrotamers to those fitting these angles. If some side chains cannot be rebuilt in a firstattempt, they will be assigned initially in a second pass. This allows some side chainsto be rebuilt even if the most probable allowed rotamer of a neighbouring residuealready occupies some of this portion of space. The latter may then switch to a lessprobable but allowed rotamer. In case that not all of the side chains can be added, anadditional tolerance of 0.15 can be introduced in the van der Waals exclusion testand the procedure repeated.


15/19

Model refinement

Idealisation of bond geometry and removal of unfavourable non-bonded contacts can beperformed by energy minimisation with force fields such as CHARMM, AMBER or

GROMOS. The refinement of a primary model should be performed by no more than100 steps of steepest descent, followed by 200-300 steps of conjugate gradient energyminimisation. Experience has shown models optimised that energy minimisation (ormolecular dynamics) usually move away from a control structure. It is thus necessary tokeep the number of minimisation steps to a minimum. Constraining the positions of

selected atoms (such as Ca, or using a B-factor based function) in each residuegenerally helps avoiding excessive structural drift during force field computations.


16/19

Several web pages for HomologyModelling

COMPOSERfelix.bioccam.ac.uksoft-base.html

MODELLERguitar.rockefeller.edu/modeller/modeller.html

WHAT IFwww.sander.embl-heidelberg.de/whatif/

SWISS-MODELwww.expasy.ch/SWISS-MODEL.html


17/19

Model evaluation

After the model is built we can check its validity by various ways. Wecan check that the model has a reasonable shape and that it is usuallyobey geometric constraints.

If the model turns out to be bad, it is necessary to repeat several stagesof the model building.

We can easily assess homology modeling procedures by building

models for proteins which have already solved structure and comparebetween the model and the native structure. It is always possible that information from the native structure will be

used in direct or indirect ways for model building A more objective test is prediction of structures before they are

publicly distributed (this is the idea of the CASP {Critical Assessment of

Techniques for Protein Structure Prediction} competitions).


18/19

The SWISS-MODEL serverThe aim of the Internet-based SWISS-MODEL server is to provide a comparativeprotein modelling tool independent from expensive computer hardware and software.

SWISS-MODEL provides four distinct modes of function accessible through

separate forms:

TheFirst Approach Mode:This mode allows the user to submit a sequence or its

Swiss-Prot identification code. In this mode, SWISS-MODEL will go through thecomplete procedure described above. The First Approach Mode also allows the user todefine a choice of pre-selected template structures, thereby overruling the automatedselection procedure. The results of the modelling procedure will be returned to the uservia E-mail.

The Optimise Mode:allows the user to re-compute a model by submitting alteredsequence alignments and ProMod command files. The sequence alignment procedure,which is fully automatic in the First Approach Mode, may yield sub-optimal alignmentsand consequently lead to erroneous models. The automated alignment of moderately

similar sequences is indeed often imprecise and the boundaries of non-conserved loopsare frequently ill defined and miss-aligned.

Th i f h li h f b d b h d i


19/19

These regions of the sequence alignments must therefore be corrected by hand inorder to overcome these weaknesses. The Optimise Mode allows the user to do suchcorrections, and to request the remodelling of the sequence by submitting his ownsequence alignment. This is best done by preparing a modelling request within theSwiss-PdbViewer (see chapter 12). These requests can then be saved as "HTML"files and then submitted through a Web browser.

The Combine Mode: allows the user to combine independently modelled proteinchains in a quaternary complex, based on an existing assembly template. In thecurrent version of the server, this assembly template must be a PDB file containing

the desired protein complex. The server provides detailed guidelines on how tosubmit requests to this particular facility. The actions taken by the server include asuperposition of each modelled chain onto its respective counterpart on the assemblytemplate and an energy minimisation of the whole complex.

The GPCR (G- Protein Coupled Receptor) Mode. To get a GPCR sequencemodelled, the first step consists of selecting a template for comparative modellingfrom the list of available ones. The current version of the SWISS-MODEL serverdoes not allow the user to create a new template using his own experimental data.

proteinmodelling-12523931616815-phpapp01

Documents