multiple sequence alignment
DESCRIPTION
Multiple sequence alignment. Lesson 4. VTIS C TGSSSNIGAG-NHVK W YQQLPG VTIS C TGTSSNIGS--ITVN W YQQLPG LRLS C SSSGFIFSS--YAMY W VRQAPG LSLT C TVSGTSFDD--YYST W VRQPPG PEVT C VVVDVSHEDPQVKFN W YVDG-- ATLV C LISDFYPGA--VTVA W KADS-- AALG C LVKDYFPEP--VTVS W NSG--- - PowerPoint PPT PresentationTRANSCRIPT
-
Multiple sequence alignmentLesson 4
-
VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGTSSNIGS--ITVNWYQQLPGLRLSCSSSGFIFSS--YAMYWVRQAPGLSLTCTVSGTSFDD--YYSTWVRQPPGPEVTCVVVDVSHEDPQVKFNWYVDG--ATLVCLISDFYPGA--VTVAWKADS--AALGCLVKDYFPEP--VTVSWNSG---VSLTCLVKGFYPSD--IAVEWWSNG--
Like pairwise alignment BUT compare n sequences instead of 2
Each row represents an individual sequenceEach column represents the same position
May be gaps in some sequences
-
MSA & Evolution MSA can give you a picture of the forces that shape evolution!
Important amino acids or nucleotides are not allowed to mutateLess important positions change more easily
-
Conserved positionsColumns where all the sequences contain the same amino acids or nucleotides Important for the function or structure VTISCTGSSSNIGAG-NHVKWYQQLPGVTISCTGSSSNIGS--ITVNWYQQLPGLRLSCTGSGFIFSS--YAMYWYQQAPGLSLTCTGSGTSFDD-QYYSTWYQQPPG
-
Consensus SequenceA consensus sequence holds the most frequent character of the alignment at each column
-
ProfileProfile = PSSM Position Specific Score (probability) Matrix
-
Alignment methodsThere is no available optimal solution for MSA all methods are heuristics:
Progressive/hierarchical alignment (Clustal)Iterative alignment (mafft, muscle)
-
Progressive alignmentABCDE
Compute the pairwise alignments for all against all (6 pairwise alignments)the similarities are stored in a tableFirst step:
-
Cluster the sequences to create a tree (guide tree):represents the order in which pairs of sequences are to be alignedsimilar sequences are neighbors in the tree distant sequences are distant from each other in the tree
Second step:The guide tree is imprecise and is NOT the tree which truly describes the relationship between the sequences!
-
Third step:1. Align the most similar (neighboring) pairssequencesequencesequencesequence
-
Third step:2. Align pairs of pairssequenceprofile
-
Main disadvantages:sub-optimal tree topologyMisalignments resulting from globally aligning a pair of sequences will only cause further deterioration
-
Iterative alignmentABCDEGuide treeMSAPairwise distance tableADCBIterate until the MSA doesnt change (convergence)E
-
Searching for remote homologsSometimes BLAST isnt enough.Large protein family, and BLAST only gives close members. We want more distant members
PSI-BLASTProfile HMMs
-
Profile HMMSimilar to PSI-BLAST: also uses a profileTakes into account:Dependence among sites (if site n is conserved, it is likely that site n+1 is conserved part of a domainThe probability of a certain column in an alignment
-
PSI BLAST Vs. profile HMMProfile HMMPSI BLASTMore exactSlowerLess exactFaster
-
Case study: Using homology searching The human kinome
-
Kinases and phosphatases
-
Multi-tasking enzymesSignal transductionMetabolismTranscriptionCell-cycleDifferentiation Function of nervous and immune systemAnd more
-
How many kinases in the human genome?1950s, discovery of that reversible phosphorylation regulates the activity of glycogen phosphorylase
1970s, advent of cloning and sequencing produced a speculation that the vertebrate genome encodes as many as 1001 kinases
-
2001 human genome sequence As well databases of Genbank, Swissprot, and dbEST
How can we find out how many kinases are out there?
How many kinases in the human genome?
-
The human kinomeIn 2002, Manning, Whyte, Martinez, Hunter and Sudarsanam set out to:Search and cross-reference all these databases for all kinasesCharacterize all found kinases
-
ePKs and aPKsEukaryotic protein kinase (majority) catalytic domainAtypical protein kinasesSequence homology of the catalytic domain; additional regulatory domains are non-homologousNo sequence homology to ePKs; some aPK subfamilies have structural similarity to ePKs
-
The searchSeveral profiles were built: based on the catalytic domain of: (a) 70 known ePKs from yeast, worm, fly, and human with >50% identity in the ePK domain (b) each subfamily of known aPKs
HMM-profile searches and PSI-BLAST searches were performed
-
The results478 apKs 40 ePKs
Total of 518 kinases in the human genome (half of the prediction in the 1970s)
-
Classifying the kinasesClassification based on the catalytic domainClassification based on the regulatory domains189 sub-families of kinases
-
Comparison to other species209 subfamilies of ePKs in human, worm, yeast and fly
-
The human genome has x2 kinases (in number) as fly or worm. Many are aPKs. Most of them are receptor tyrosine kinases (RTKs)
The human-expanded kinase families function predominantly in processes of the:Nervous systemImmune systemAngiogenesisHemopoiesis
-
The discovery of new kinases: a new front for battling human diseases
-
Correlating with human diseases160 kinases mapped to amplicons seen in tumors80 kinases mapped to amplicons in other major illnessesUsually kinases are over-expressed in cancer and other diseases
-
Correlating with human diseases6 kinase inhibitors have been approved till today for the use against cancer>70 other inhibitors are in clinical trials