第四章 生物分子数据库

Click here to load reader

Upload: lamar-daniels

Post on 03-Jan-2016

98 views

Category:

Documents


0 download

DESCRIPTION

主讲人:孙 啸 制作人:刘志华. 第四章 生物分子数据库. 东南大学 吴健雄实验室. 第一节 引言. 生物分子数据 高速增长. 分子生物学 及相关领域研究人员 迅速获得最新实验数据. 建立生物分子数据库. 生物分子数据库应满足 5 个方面的主要需求 ( 1 )时间性 ( 2 )注释 ( 3 )支撑数据 ( 4 )数据质量 ( 5 )集成性. 生物分子数据库 一级数据库 数据库中的数据直接来源于实验获得的原始数据,只经过简单的归类整理和注释 二级数据库 - PowerPoint PPT Presentation

TRANSCRIPT

  • 512 3 4 5

  • 1 2 3 4 56

  • 1EMBL http://www.embl-heidelberg.de 2GenBank http://www.ncbi.nlm.nih.gov/Web/Genbank/index.html 3DDBJ http://www.ddbj.nig.ac.jp/

  • bp

  • DNARNADNARNA

    EBML

  • MBLGigabasesEST-Expressed sequence tags; STS-sequence tagged siteshttp://www3.ebi.ac.uk/Services/DBStats/

  • 21 Mar 2003 37,943,364,438 bases in 24,353,128 records.

  • ID ACXX DT DEKWOGOSOCRNRPRARTRLRCRXDRFH FT 1Feature Key 2Location 3QualifiersSQ//EMBL EMBLASCII annotation

  • 12Authorin3WWW

  • EMBL1CD-ROM2ftp3Gopher4WWW

  • EMBL3W1X58929SCARGC

    HTML

    MEDLINEMEDLINE

  • :

    J00231DRSWISS-PROTP01860GC3_HUMAN2 3WFastAFastA

  • 2GDB GDB1 PCR ESTcontigs 2 contig 3

  • MGDhttp://www.informatics.jax.org/

    SGDhttp://genome-www.stanford.edu/Saccharomyces/

  • Ensembl (http://www.ensembl.org/3EnsemblEnsemblDNA

    GenScan

    SNP

  • Ensembl

  • Ensembl

    BLAST

  • 9

  • 4dbESTESTExpressed Sequence Tags EST90% DbEST (http://www.ncbi.nlm.nih.gov/dbEST/GenBankESTmRNA WEBemailFTPESTdbEST

  • 5dbSTSSTSSequence Tagged Sites

    dbSTShttp://www.ncbi.nlm.nih.gov/dbSTS/NCBISTS

    BLASTSTS

  • 6UniGeneUniGene( http://www.ncbi.nlm.nih.gov/UniGene/)GenBank UniGene: EST UniGeneEST

  • 99%1PIRProtein Information Resource

  • PIR (1) (2) (3) (4)

  • PIR:

    BLASTFastA

  • 2SWISS-PROT SWISS-PROT (http://www.expasy.ch/sprot/sprot-top.html, SWISS-PROT12PIR34 SWISS-PROT

  • 1

    SWISS-PROT

    A) (B) (C)ATP (D) (E) (F) (G) (H)

  • 2 3 EMBL PROSITE PDB

  • ab) AuthorincWWW

    SWISS-PROTaCD-ROMbftpcGopherdWWWSRS

    a b

  • TrEMBL (http://www.ebi.ac.uk/trembl/index.html) SWISS-PROTEMBL(CDS)SWISS-PROT

    TrEMBL1SP-TrEMBL(SWISS-PROT TrEMBL)SWISS-PROTSP-TrEMBL SWISS-PROT

    2REM-TrEMBL(REMaining TrEMBL)SWISS-PROT

    3TrEMBL

  • Swiss-Prot TrEMBL PIR BLASTFTP 4UniProt

  • UniProt31UniProt KnowledgebaseUniProt 2UniProt Non-redundant ReferenceUniRef 3UniProt ArchiveUniParc

  • 1PDBProtein Data BankPDBXNMR

  • explicit sequencePDBSEQRES

    (implicit sequence) PDB

  • HEADER HYDROLASE 19-FEB-97 1ADZ TITLE THE SOLUTION STRUCTURE OF THE SECOND KUNITZ DOMAIN OF TITLE 2 TISSUE FACTOR PATHWAY INHIBITOR, NMR, 30 STRUCTURES COMPND MOL_ID: 1; COMPND 2 MOLECULE: TISSUE FACTOR PATHWAY INHIBITOR; COMPND 8 BIOLOGICAL_UNIT: MONOMER SOURCE MOL_ID: 1; SOURCE 7 EXPRESSION_SYSTEM_PLASMID: PFLAG KEYWDS HYDROLASE, INHIBITOR, COAGULATION EXPDTA NMR, 30 STRUCTURES AUTHOR M.J.M.BURGERING,L.P.M.ORBONS REVDAT 1 25-FEB-98 1ADZ 0 JRNL AUTH M.J.BURGERING,L.P.ORBONS,A.VAN DER DOELEN, REMARK 1 REFERENCE 1 REMARK 1 AUTH M.T.STUBBS II REMARK 1 TITL STRUCTURAL ASPECTS OF FACTOR XA INHIBITION REMARK 999 SEQUENCE REMARK 999 1ADZ SWS P10646 1 - 111 NOT IN ATOMS LIST REMARK 999 1ADZ SWS P10646 183 - 304 NOT IN ATOMS LIST REMARK 999 THE FIRST NINE RESIDUES ARE NOT PART OF THE TFPI DOMAIN II REMARK 999 SEQUENCE BUT ARE FROM THE PFLAG PEPTIDE CLONING VECTOR. DBREF 1ADZ 1 71 SWS P10646 TFPI_HUMAN 112 182 SEQADV 1ADZ ASP 1 SWS P10646 ILE 112 ENGINEERED SEQADV 1ADZ TYR 2 SWS P10646 ILE 113 ENGINEERED SEQRES 1 71 ASP TYR LYS ASP ASP ASP ASP LYS LEU LYS PRO ASP PHE SEQRES 2 71 CYS PHE LEU GLU GLU ASP PRO GLY ILE CYS ARG GLY TYR SEQRES 3 71 ILE THR ARG TYR PHE TYR ASN ASN GLN THR LYS GLN CYS SEQRES 4 71 GLU ARG PHE LYS TYR GLY GLY CYS LEU GLY ASN MET ASN SEQRES 5 71 ASN PHE GLU THR LEU GLU GLU CYS LYS ASN ILE CYS GLU SEQRES 6 71 ASP GLY PRO ASN GLY PHE HELIX 1 1 ASP 12 PHE 15 5 4 HELIX 2 2 ASN 34 THR 36 5 3 HELIX 3 3 LEU 57 ILE 63 1 7 SHEET 1 A 2 ARG 29 ASN 33 0 SHEET 2 A 2 GLN 38 PHE 42 -1 N PHE 42 O ARG 29 CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 1.000000 0.000000 0.000000 0.00000 SCALE2 0.000000 1.000000 0.000000 0.00000 SCALE3 0.000000 0.000000 1.000000 0.00000 4.5 PDB PDB

  • RasMol ChemView

  • 2MMDB(Molecular Modeling Database)MMDB NCBIEntrez PDBMMDB

  • MMDB

  • SNPsSingle nucleotide polymorphisms

    SNPsSNPs 1dbSNPhttp://www3.ncbi.nlm.nih.gov/SNP/)

  • GTTTGTGATT ACTTTGTAAA AACAGTGTAA TAAGTACTCA CTAAAGGAAA TTTAGAAAAT GATAAGCTTA Aggccgggca tggtgcctca tgcctgtaat cctagcactt tgggaggctg aggtgggtgg atcacctgag ctcaggagtt ccagatcatc ctggacaata tggtgaaacc ctgtctacgc ttaaaatacg R aaattagccg ggcgtggtgg ggcatgcctg tggtctcagc tactttggag actaaggtag aaggatcact tgaatcctgg aggtggaggt tgcagagtga gccaatatcg tgccactgca ctccagccta ggtgacagag gaagactctg tctcaaaaaa aagaaaaTAA GGCCAGACAC GGGGGCTCAT GCTTGTAATC

    R=A/G

  • 2SCOPSCOP ( http://scop.mrc-lmb.cam.ac.uk/scop/PDBSCOPPDB: (1) (2) (3)

  • 3DSSPDSSPhttp://www.sander.embl-heidelberg.de/dssp/ PDBDSSP

  • The DSSP codeH = alpha helix B = residue in isolated beta-bridge E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend

  • 4HSSPHSSP(http://www.sander.embl-heidelberg.de/hssp/PDBSWISS-PROT PDBHSSPHSSP

  • From PDBFrom Swiss-prot

  • 5OMIMOMIM (Online Mendelian Inheritance in Man), OMIMOMIM http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=OMIM

  • 6EPDEPD( http://www.epd.isb-sib.ch/ )EMBL

  • 7TRRDTRRDTRRD TRRD6.01TRRDGENES 2TRRDLCR 3TRRDUNITS 4TRRDSITES 5TRRDFACTORS 6TRRDEXP 7TRRDBIB

  • 8TRANSFACTRANSFAC http://transfac.gbf.de/ TRANSFAC6 1SITE 2GENE 3FACTOR 4CELL 5CLASS 6MATRIX

  • 9BODYMAPBODYMAP (http://bodymap.ims.u-tokyo.ac.jp/3ESTmRNA,

  • 10PROSITEPROSITE ( http://www.expasy.ch/prosite/) PROSITEPROSITE

  • 11DBCatDBCat500DNARNA http://www.infobiogen.fr/services/dbcat/

  • DBCat

  • 12PubMedPubMedhttp://www.ncbi.nlm.nih.gov/NCBIMEDLINEPre-MEDLINEEntrezPubMed

  • O(n2)

    FastABLAST

  • FASTAs: t: 1FASTA = 6 6 8 10s -----A-A-T---t: --A-A-T----- 3 5 7 = 3

  • FASTP

    kkk-tupk=12kk

  • 1k

    2ks[i]t[j]i-j

  • 1 2 3 4 5 6 7 8 9 10 11 s = H A R F Y A A Q I V L A 2,6,7 F 4 H 1 I 9 L 11 Q 8 R 3 V 10 Y 5 1 2 3 4 5 6 7 8 t = V D M A A Q I A +9 -2 -3 +2 +2 -6 +2 +1 -2 +3 +2 -1-7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 1 1 2 1 1 4 1 1

  • 12k5PAMst

  • FastAFastA32FastA3FastA

    FastADNADNAFASTXFASTYDNA TFastADNATFASTXTFASTYDNAFASTSTFASTSDNAFASTFTFASTFDNA

  • FastA

  • 2BLASTBLAST BLASTFastA BLAST

  • BLASTS:t:

  • BLASTS

    SSS

  • BLAST

  • BLAST

    1

    2

    3

  • w TwTw4Hash table

  • DNAw2bit4

  • BLAST

  • BLAST

  • 4 Vs. 20

    DNA

    DNA

  • 3VASTVASTNCBIPDBMMDBVASTCn3DWang et al., 2000 http://www.ncbi.nlm.nih.gov/Structure/VAST/

  • VAST

  • .EntrezSRS

  • 1EntrezNCBIOMIMPubmedhttp://www.ncbi.nlm.nih.gov/gorf/gorf.html

  • Entrez4.8

    4.8Entrez

  • 2. SRSSRSSequence Retrieval SystemEMBLWEB

    SRSEMBLEMBL_NEWSwissProtPIRPrositeReBasePDBNRL_3DEPDE.coli ECDENZYMESEQANALREF80

    SRS

  • 3ExPASyExPASy (Expert Protein Analysis Systemhttp://www.expasy.org/) WWW SWISS-PROTTrEMBLPROSITE23SWISS-2DPAGESWISS-3DIMAGE

  • GCG GCG (Genetics Computer Group)

    140

  • GCG GenBankEMBL GCGPIRSWISS-PROTSP-TrEMBL

  • 1Gap: BestFit: FrameAlign: CompareDotPlot: GapShowProfileGap:

  • 2PileUp: HmmerAlignPlotSimilarityPrettyPrettyBoxMEMEHmmerBuildHmmerCalibrateProfileMakeProfileGapOverlapNoOverlapOldDistances

  • 3LookUp

    StringSearch

    Names

  • 4BLASTNetBLASTFastASsearchTFastA/TfastX/FastXFrameSearchMotifSearchHmmerSearchProfileSearchProfileSegmentsFindPatternsMotifsWordSearchHmmerPfamSegments

  • 5DNA/RNAMfoldDNARNAPlotFoldMfoldStemLoop

  • 6PAUPSearchPAUPDisplayDistancesDiverge

  • 7GelStartGelEnterGelMergeGelAssembleGelViewGelDisassemble

  • 8TestCodeCodonPreferenceFramesRepeatCompositionCodonFrequencyCorrespond

  • 9MapMapPlotMapSort: PeptideMapPlasmidMapPeptideSort:

  • 10PrimePrimePairMeltTemp

  • 11ProfileScanCoilScanHTHScanSPScanIsoelectric: PepPlotPeptideStructurePlotStructure

  • 12 ReverseShuffle CorruptSampleDataSetGCGToBLAST