multiple sequence alignment

31
1 Multiple sequence alignment Multiple sequence alignment Lesson 4 Lesson 4

Upload: heman

Post on 24-Jan-2016

35 views

Category:

Documents


2 download

DESCRIPTION

Multiple sequence alignment. Lesson 4. VTIS C TGSSSNIGAG-NHVK W YQQLPG VTIS C TGTSSNIGS--ITVN W YQQLPG LRLS C SSSGFIFSS--YAMY W VRQAPG LSLT C TVSGTSFDD--YYST W VRQPPG PEVT C VVVDVSHEDPQVKFN W YVDG-- ATLV C LISDFYPGA--VTVA W KADS-- AALG C LVKDYFPEP--VTVS W NSG--- - PowerPoint PPT Presentation

TRANSCRIPT

  • Multiple sequence alignmentLesson 4

  • VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGTSSNIGS--ITVNWYQQLPGLRLSCSSSGFIFSS--YAMYWVRQAPGLSLTCTVSGTSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKGFYPSD--IAVEWWSNG--

    Like pairwise alignment BUT compare n sequences instead of 2

    Each row represents an individual sequenceEach column represents the same position

    May be gaps in some sequences

  • MSA & Evolution MSA can give you a picture of the forces that shape evolution!

    Important amino acids or nucleotides are not allowed to mutateLess important positions change more easily

  • Conserved positionsColumns where all the sequences contain the same amino acids or nucleotides Important for the function or structure VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGSSSNIGS--ITVNWYQQLPGLRLSCTGSGFIFSS--YAMYWYQQAPGLSLTCTGSGTSFDD-QYYSTWYQQPPG

  • Consensus SequenceA consensus sequence holds the most frequent character of the alignment at each column

  • ProfileProfile = PSSM Position Specific Score (probability) Matrix

  • Alignment methodsThere is no available optimal solution for MSA all methods are heuristics:

    Progressive/hierarchical alignment (Clustal)Iterative alignment (mafft, muscle)

  • Progressive alignmentABCDE

    Compute the pairwise alignments for all against all (6 pairwise alignments)the similarities are stored in a tableFirst step:

  • Cluster the sequences to create a tree (guide tree):represents the order in which pairs of sequences are to be alignedsimilar sequences are neighbors in the tree distant sequences are distant from each other in the tree

    Second step:The guide tree is imprecise and is NOT the tree which truly describes the relationship between the sequences!

  • Third step:1. Align the most similar (neighboring) pairssequencesequencesequencesequence

  • Third step:2. Align pairs of pairssequenceprofile

  • Main disadvantages:sub-optimal tree topologyMisalignments resulting from globally aligning a pair of sequences will only cause further deterioration

  • Iterative alignmentABCDEGuide treeMSAPairwise distance tableADCBIterate until the MSA doesnt change (convergence)E

  • Searching for remote homologsSometimes BLAST isnt enough.Large protein family, and BLAST only gives close members. We want more distant members

    PSI-BLASTProfile HMMs

  • Profile HMMSimilar to PSI-BLAST: also uses a profileTakes into account:Dependence among sites (if site n is conserved, it is likely that site n+1 is conserved part of a domainThe probability of a certain column in an alignment

  • PSI BLAST Vs. profile HMMProfile HMMPSI BLASTMore exactSlowerLess exactFaster

  • Case study: Using homology searching The human kinome

  • Kinases and phosphatases

  • Multi-tasking enzymesSignal transductionMetabolismTranscriptionCell-cycleDifferentiation Function of nervous and immune systemAnd more

  • How many kinases in the human genome?1950s, discovery of that reversible phosphorylation regulates the activity of glycogen phosphorylase

    1970s, advent of cloning and sequencing produced a speculation that the vertebrate genome encodes as many as 1001 kinases

  • 2001 human genome sequence As well databases of Genbank, Swissprot, and dbEST

    How can we find out how many kinases are out there?

    How many kinases in the human genome?

  • The human kinomeIn 2002, Manning, Whyte, Martinez, Hunter and Sudarsanam set out to:Search and cross-reference all these databases for all kinasesCharacterize all found kinases

  • ePKs and aPKsEukaryotic protein kinase (majority) catalytic domainAtypical protein kinasesSequence homology of the catalytic domain; additional regulatory domains are non-homologousNo sequence homology to ePKs; some aPK subfamilies have structural similarity to ePKs

  • The searchSeveral profiles were built: based on the catalytic domain of: (a) 70 known ePKs from yeast, worm, fly, and human with >50% identity in the ePK domain (b) each subfamily of known aPKs

    HMM-profile searches and PSI-BLAST searches were performed

  • The results478 apKs 40 ePKs

    Total of 518 kinases in the human genome (half of the prediction in the 1970s)

  • Classifying the kinasesClassification based on the catalytic domainClassification based on the regulatory domains189 sub-families of kinases

  • Comparison to other species209 subfamilies of ePKs in human, worm, yeast and fly

  • The human genome has x2 kinases (in number) as fly or worm. Many are aPKs. Most of them are receptor tyrosine kinases (RTKs)

    The human-expanded kinase families function predominantly in processes of the:Nervous systemImmune systemAngiogenesisHemopoiesis

  • The discovery of new kinases: a new front for battling human diseases

  • Correlating with human diseases160 kinases mapped to amplicons seen in tumors80 kinases mapped to amplicons in other major illnessesUsually kinases are over-expressed in cancer and other diseases

  • Correlating with human diseases6 kinase inhibitors have been approved till today for the use against cancer>70 other inhibitors are in clinical trials