proteinmodelling-12523931616815-phpapp01

Upload: rednri

Post on 05-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    1/19

    Protein Modelling(Building 3D models of proteins)

    By:-

    Syed Asif Husain Naqvi

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    2/19

    Why make a structural model

    for your protein ? The structure can provide clues to the

    function.

    With a structure it is easier to guess thelocation of functional sites. We can do docking experiments (both with

    other proteins and with small molecules).

    With a structure we can plan more preciseexperiments in the lab.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    3/19

    INTRODUCTION

    Proteins from different sources and sometimes diverse biologicalfunctions can have similar sequences, and it is generally accepted thathigh sequence similarity is reflected by distinct structure similarity.Indeed, the relative mean square deviation (rmsd) of the alpha-carbonco-ordinates for protein cores sharing 50% residue identity is expected

    to be around 1. This fact served as the premise for the developmentof comparative protein modelling (also often called modelling byhomology or knowledge-based modelling), which is presently the mostreliable method.Comparative model building consist of the extrapolation of thestructure for a new (target) sequence from the known 3D-structure ofrelated family members (templates).

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    4/19

    While the high precision structures required for detailed studies of protein-

    ligand interaction can only be obtained experimentally, theoretical proteinmodelling provides the molecular biologists with "low-resolution" modelswhich hold enough essential information about the spatial arrangement ofimportant residues to guide the design of experiments.

    The rational design of many site-directed mutagenesis experiments couldtherefore be improved if more of these "low-resolution" theoretical modelstructures were available.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    5/19

    Basic principles for structural

    modeling Use any piece of information available from the existing

    databases regarding the protein you wish to model and its

    family.

    Choose the most suitable algorithm according to theavailable data that you have.

    Create alternative models. Check your models.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    6/19

    Building by Homology

    There are hundreds of thousands of protein sequences but only severalthousands protein folds.

    For every second protein that we randomly pick from thestructural data base there is close homolog (identity >30%). This

    homolog almost always has the same fold.

    In the current projects for experimental determination of protein structures,priority is given to determine structures of protein without homologs in thestructural databases

    (structural genomics).

    We believe that in several years we will have almost all the basic folds.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    7/19

    Steps of building by homology

    Look for homology between your sequence and proteins in the structural databases.

    Known algorithms such as Blast can be used.

    If no hits were obtained, it is possible to use multiple alignment of the family for thesearch. This might be more sensitive.

    Construct accurate alignment between the query sequence and all the hits that will beserve as the template during the building.

    Correct alignment is crucial for this step. Any mistake can lead to significant errors inthe final model.

    A possible alignment algorithms are for example ClustalWandd T-Cofee.

    Manual intervention is sometimes required, especially for weak homology.More sequences within the protein family increase the chance for correct alignment.Therefore, it is frequently recommended to search sequence databases (such asSwissProt) and not only structural databases.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    8/19

    Identification of modelling templatesComparative protein modelling requires at least one sequence of known 3D-structure with significant similarity to the target sequence. In order to determineif a modelling request can be carried out, one compares the target sequencewith a database of sequences derived from the Brookhaven Protein Data Bank(PDB), using programs such as FastA and BLAST.

    Sequences with a FastA score 10.0 standard deviations above the mean of therandom scores or a Poisson unlikelyhood probability P(N) lower than 10-5

    (BLAST) can be considered for the model building procedure. The choice oftemplate structures can be further restricted to those which share at least 30%residue identity as determined by SIM.

    STEPS

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    9/19

    The above procedure might allow the selection of several suitable

    templates for a given target sequence, and up to ten templates are used inthe modelling process. The best template structure - the one with thehighest sequence similarity to the target - will serve as the reference. Allthe other selected templates will be superimposed onto it in 3D. The 3Dmatch is carried out by superimposing corresponding Ca atom pairs

    selected automatically from the highest scoring local sequence alignmentdetermined by SIM. This superposition can then be optimised bymaximising the number of Ca pairs in the common core while minimisingtheir relative mean square deviation. Each residue of the referencestructure is then aligned with a residue from every other available

    template structure if their Ca atoms are located within 3.0 . Thisgenerates a structurally corrected multiple sequence alignment.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    10/19

    Aligning the target sequence with the template sequence

    The target sequence now needs to be aligned with the template sequence or, if severaltemplates were selected, with the structurally corrected multiple sequence alignment.This can be achieved by using the best-scoring diagonals obtained by SIM. Residueswhich should not be used for model building, for example those located in non-conserved loops, will be ignored during the modelling process. Thus, the commoncore of the target protein and the loops completely defined by at least one suppliedtemplate structure will be built.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    11/19

    Building the model

    Framework construction

    When more than one template is available, the relative contribution, or weight, of eachstructure is determined by its local degree of sequence identity with the target sequence.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    12/19

    Building non-conserved loops

    Following framework generation, loops for which no structural information wasavailable in the template structures are not defined and therefore must be constructed. .

    Using a "spare part" algorithm, one searches for fragments which could beaccommodated onto the framework among the Brookhaven Protein Data Bank (PDB)entries determined with a resolution better than 2.5 . Each loop is defined by its lengthand its "stems", namely the alpha carbon (Ca) atom co-ordinates of the four residuespreceding and following the loop. The fragments which correspond to the loopdefinition are extracted from the PDB entries and rejected if the relative mean square

    deviation (rmsd) computed for their "stems" is greater than a specified cut-off value.Furthermore, only fragments which do not overlap with neighbouring segments shouldbe retained.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    13/19

    Completing the backbone

    Since the loop building only adds Ca atoms, the backbone carbonyl and nitrogens mustbe completed in these regions. This step can be performed by using a library ofpentapeptide backbone fragments derived from the PDB entries determined with a

    resolution better than 2.0 . These fragments are then fitted to overlapping runs of fiveCa atoms of the target model. The co-ordinates of each central tripeptide are thenaveraged for each target backbone atom (N, C, O) and added to the model. This processyields modelled backbones that differ from experimental co-ordinates by approx. 0.2 rms.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    14/19

    Adding side chains

    For many of the protein side chains there is no structural information available in thetemplates. These cannot therefore be built during the framework generation and mustbe added later. The number of side chains that need to be built is dictated by the

    degree of sequence identity between target and template sequences. To this end oneuses a table of the most probable rotamers for each amino acid side chain dependingon their backbone conformation. All the allowed rotamers of the residues missingfrom the structure are analysed to see if they are acceptable by a Van der Waalsexclusion test. The most favoured rotamer is added to the model. The atoms defining

    the c1 and c2 angles of incomplete side chains can be used to restrict the choice ofrotamers to those fitting these angles. If some side chains cannot be rebuilt in a firstattempt, they will be assigned initially in a second pass. This allows some side chainsto be rebuilt even if the most probable allowed rotamer of a neighbouring residuealready occupies some of this portion of space. The latter may then switch to a lessprobable but allowed rotamer. In case that not all of the side chains can be added, anadditional tolerance of 0.15 can be introduced in the van der Waals exclusion testand the procedure repeated.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    15/19

    Model refinement

    Idealisation of bond geometry and removal of unfavourable non-bonded contacts can beperformed by energy minimisation with force fields such as CHARMM, AMBER or

    GROMOS. The refinement of a primary model should be performed by no more than100 steps of steepest descent, followed by 200-300 steps of conjugate gradient energyminimisation. Experience has shown models optimised that energy minimisation (ormolecular dynamics) usually move away from a control structure. It is thus necessary tokeep the number of minimisation steps to a minimum. Constraining the positions of

    selected atoms (such as Ca, or using a B-factor based function) in each residuegenerally helps avoiding excessive structural drift during force field computations.

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    16/19

    Several web pages for HomologyModelling

    COMPOSERfelix.bioccam.ac.uksoft-base.html

    MODELLERguitar.rockefeller.edu/modeller/modeller.html

    WHAT IFwww.sander.embl-heidelberg.de/whatif/

    SWISS-MODELwww.expasy.ch/SWISS-MODEL.html

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    17/19

    Model evaluation

    After the model is built we can check its validity by various ways. Wecan check that the model has a reasonable shape and that it is usuallyobey geometric constraints.

    If the model turns out to be bad, it is necessary to repeat several stagesof the model building.

    We can easily assess homology modeling procedures by building

    models for proteins which have already solved structure and comparebetween the model and the native structure. It is always possible that information from the native structure will be

    used in direct or indirect ways for model building A more objective test is prediction of structures before they are

    publicly distributed (this is the idea of the CASP {Critical Assessment of

    Techniques for Protein Structure Prediction} competitions).

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    18/19

    The SWISS-MODEL serverThe aim of the Internet-based SWISS-MODEL server is to provide a comparativeprotein modelling tool independent from expensive computer hardware and software.

    SWISS-MODEL provides four distinct modes of function accessible through

    separate forms:

    TheFirst Approach Mode:This mode allows the user to submit a sequence or its

    Swiss-Prot identification code. In this mode, SWISS-MODEL will go through thecomplete procedure described above. The First Approach Mode also allows the user todefine a choice of pre-selected template structures, thereby overruling the automatedselection procedure. The results of the modelling procedure will be returned to the uservia E-mail.

    The Optimise Mode:allows the user to re-compute a model by submitting alteredsequence alignments and ProMod command files. The sequence alignment procedure,which is fully automatic in the First Approach Mode, may yield sub-optimal alignmentsand consequently lead to erroneous models. The automated alignment of moderately

    similar sequences is indeed often imprecise and the boundaries of non-conserved loopsare frequently ill defined and miss-aligned.

    Th i f h li h f b d b h d i

  • 7/31/2019 proteinmodelling-12523931616815-phpapp01

    19/19

    These regions of the sequence alignments must therefore be corrected by hand inorder to overcome these weaknesses. The Optimise Mode allows the user to do suchcorrections, and to request the remodelling of the sequence by submitting his ownsequence alignment. This is best done by preparing a modelling request within theSwiss-PdbViewer (see chapter 12). These requests can then be saved as "HTML"files and then submitted through a Web browser.

    The Combine Mode: allows the user to combine independently modelled proteinchains in a quaternary complex, based on an existing assembly template. In thecurrent version of the server, this assembly template must be a PDB file containing

    the desired protein complex. The server provides detailed guidelines on how tosubmit requests to this particular facility. The actions taken by the server include asuperposition of each modelled chain onto its respective counterpart on the assemblytemplate and an energy minimisation of the whole complex.

    The GPCR (G- Protein Coupled Receptor) Mode. To get a GPCR sequencemodelled, the first step consists of selecting a template for comparative modellingfrom the list of available ones. The current version of the SWISS-MODEL serverdoes not allow the user to create a new template using his own experimental data.