90/4/9 pm 中研院生物資訊中心 (brc) 1 introduction of genome research bioinformatics...
TRANSCRIPT
1 (BRC)中研院生物資訊中心90/4/9 pm
Introduction of Introduction of Genome ResearchGenome Research
Bioinformatics Research CenterInstitute of Biomedical Sciences
ACADEMIA SINICA
莊樹諄www.sinica.edu.tw/~trees/bioinformaticsE-mail: [email protected]
Outline
IntroductionIntroduction Some Research Topics Related Links and Resources Bioinformation Research Center (BR
C)
Introduction
290/4/9 pm 中研院生物資訊中心 (BRC)
390/4/9 pm 中研院生物資訊中心 (BRC)
Chromosome
Introduction
490/4/9 pm
GeneDNA Sequence
Intron
5‘ 3’
5‘UTR 3’UTR
mRNAcDNAComplement DNA
ORF
Exon(coding regions)
DNA
RNA
Protein
Function
90/4/9 pm (BRC)中研院生物資訊中心 5
DNA sequence: A, C, G, T --- 4 letters RNA sequence: A, C, G, U (Uracil, (U), 尿嘧啶 ) --- 4 letters
Introduction
DNA nucleotide acid ( 核苷酸 )
Phosphoric acid( 磷酸 ) Deoxyribose ( 去氧核糖 ) Nitrogenous base ( 含氮鹽基 )
Nitrogenous base ( 含氮鹽基 )
Purines :
Pyrimidine :
Nitrogenous base ( 含氮鹽基 )
Adenine (A, 腺嘌呤 ) Guanine (G, 鳥糞嘌呤 )
Cytosine (C, 胞嘧啶 ) Thymine (T, 胸腺嘧啶 )
ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA
5‘ 3‘
TGGCACACCGTCACGTGTCCATAAACCGGTATCTGT3‘ 5‘
Codon
ACCGTGTGGCAGTGCACAGGTATTTGGCCATAGACA
Amino acid
90/4/9 pm 中研院生物資訊中心 (BRC) 6
43 = 64 20
IntroductionDNA sequence: A, C, G, T --- 4 lettersRNA sequence: A, C, G, U --- 4 lettersAmino acid sequence: --- 20 letters
7
Second position ThirdPosition (3’)
FirstPosition (5’) U C A G
UCAG
UCAGUCAG
UCAG
U
C
A
G
Phe (F) Ser (S) Tyr (Y) Cys (C)Phe (F) Ser (S) Tyr (Y) Cys (C)Leu (L) Ser (S) StopStop StopStopLeu (L) Ser (S) StopStop Trp (W)Leu (L) Pro (P) His (H) Arg (R)Leu (L) Pro (P) His (H) Arg (R)Leu (L) Pro (P) Gln (Q) Arg (R)Leu (L) Pro (P) Gln (Q) Arg (R)Ile (I) Thr (T) Asn (N) Ser (S)Ile (I) Thr (T) Asn (N) Ser (S)Ile (I) Thr (T) Lys (K) Arg (R)Met (M)Met (M) Thr (T) Lys (K) Arg (R)Val (V) Ala (A) Asp (D) Gly (G)Val (V) Ala (A) Asp (D) Gly (G)Val (V) Ala (A) Glu (E) Gly (G)Val (V) Ala (A) Glu (E) Gly (G)
StopStop StopStopStopStop
Met (M)Met (M)
中研院生物資訊中心90/4/9 pm
6-frame translations6-frame translationsaagctgatcgatcgattttagatagagaaaaaact K L I D R F - I E K Kaagctgatcgatcgattttagatagagaaaaaact S - S I D F R - R K N aagctgatcgatcgattttagatagagaaaaaact A D R S I L D R E K Tagttttttctctatctaaaatcgatcgatcagctt S F F S I - N R S I Sagttttttctctatctaaaatcgatcgatcagctt V F S L S K I D R S Aagttttttctctatctaaaatcgatcgatcagctt F F L Y L K S I D Q L
5'3' Frame 1
5'3' Frame 2
5'3' Frame 3
3'5' Frame 1
3'5' Frame 2
3'5' Frame 3
Introduction
8中研院生物資訊中心90/4/9 pm
90/4/9 pm (BRC)中研院生物資訊中心 9
Introduction
EST (Expressed Sequence Tags) DBEST (Expressed Sequence Tags) DB
HGI (Human Gene Index) DBHGI (Human Gene Index) DB
Gene : Gene : ExonExon & Intron & IntroncDNA DatabasecDNA Database
UniGene DBUniGene DB
Introduction
Human Genome Sequencing (2/11/2001)
Draft 61.0 %
Finished 32.5%
Total 93.5 %
10中研院生物資訊中心90/4/9 pm
gap
Chromosome
90/4/9 pm 12中研院生物資訊中心
90/4/9 pm (BRC)中研院生物資訊中心 12
Introduction
Phase 0: Single-few pass reads of a single clone (not contigs)
Genome Database -- 3×10Genome Database -- 3×109 9
HTGS (High Throughput Genomic Sequences)HTGS (High Throughput Genomic Sequences)
Phase 1: Unfinished, may be unordered, unoriented contigs, with gaps.
Phase 2: Unfinished, ordered, oriented contigs, with or without gaps.
Phase 3: Finished, no gaps (with or without annotations).
90/4/9 pm (BRC)中研院生物資訊中心 13
Size range (kb) Contigs Aggregate size (kb) Percent of total
<30 kb 44 666 0.1%
30-100 479 32172 4.9%
100-250 1628 260933 39.9%
250-500 421 144518 22.1%
500-1000 145 98623 15.1%
>1000 kb 43 116557 17.8%
total 2760 653471 100.0%
Introduction
Outline
Some Research Topics Related Links and Resources Bioinformation Research Center (BR
C)
Introduction Some Research Topics
14中研院生物資訊中心90/4/9 pm
15
Early estimate: 60,000~100,000
By Ch22: ~45,000
By EST: ~140,000
By Ch22 & HGI-5.0: ~120,000 (1.38-fold gene
rich and extremely cleaning and assemble process)
By 2/16/2001 Science: ~ 30,000
There are many more genes awaiting discovery
within the sequence
Gene number of human
中研院生物資訊中心90/4/9 pm
90/4/9 pm (BRC)中研院生物資訊中心 16
Some Research Topics
Alternative SplicingAlternative Splicing
Human DiversityHuman Diversity
Gene SignatureGene Signature
Genome AnnotationGenome Annotation
Human Genome: 3x109 bp
Genomic Sequence
Coding Region Non-coding Region
Gene
Single Nucleotide Polymorphism (SNP)
Inter-genic Region
Variations
gSNP
cSNP rSNP iSNP nSNP
106-107
Functional Variants (5%)17中研院生物資訊中心90/4/9 pm
Gene-based SNPsGene-based SNPs
18中研院生物資訊中心90/4/9 pm
Gene 1 Gene 2
P1 P2
nSNPrSNP
cSNP iSNP
exon
Intron
90/4/9 pm (BRC)中研院生物資訊中心 19
Human DiversityHuman Diversity
SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP)
acccgctcgtcgct tgtgtt cggctaattgcgcgaat C
cC
Synonymous(tgt tgc C)
Silent
gH
Non-synonymous(tgt C, tgg W)
C: polar W: nonpolar(Non-conservative)
tat YY: polar
(Conservative)
90/4/9 pm (BRC)中研院生物資訊中心 20
Human DiversityHuman Diversity
SNP (Single Nucleotide Polymorphism) cSNP (Coding SNP)
Purines (A/G) & Pyrimidines (C/T)Transition: A G, C TTransversion: A/G C/T
CD-CV: common diseases - common variants.
90/4/9 pm (BRC)中研院生物資訊中心 21
Ch22: 134 pseudogenes (134/679 19%)
Pseudogene
Processed pseudogene (cDNAgenebank, 82% of 134 pseudogenes)
a) Single block
b) Lack characteristic intron – exon structure
Spliced pseudogene – segments of duplicated gene families
PseudogenePseudogene
90/4/9 pm (BRC)中研院生物資訊中心 22
Tandem Repeats
Repetitive SequenceRepetitive Sequence
SINEs (Short Interspersed Elements): Alu, MIR, MER, LTR, PTR,
LINEs (Long Interspersed Elements): LINE1, LINE2,
Interspersed Repeats
Mini Satellite (Variable Number Tandem Repeats (VNTR)): 15~100 bp
Micro Satellite (Short Tandem Repeats (STR)): 2~5 bp
α-Satellite: at centromere
Telomere Repeats
CentromereTelomere
Outline
Some Research Topics Related Links and Resources Bioinformation Research Center (BR
C)
Introduction
Related Links and Resources
2390/4/9 pm 中研院生物資訊中心
90/4/9 pm (BRC)中研院生物資訊中心 24
TIGR(The Institute for Genomic Research) http://www.tigr.org/
Japan Science and Technology Corporation - Advanced Lifescience Information System JST - ALIS )
http://www-alis.tokyo.jst.go.jp/HGS/top.pl
NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ Sanger --- http://www.ensembl.org/
Related Links and Resources
90/4/9 pm (BRC)中研院生物資訊中心 25
Gene Prediction ProgramsGene Prediction Programs http://www.bork.embl-heidelberg.de/genepredict.html
http://linkage.rockefeller.edu/wli/gene/programs.html
ExPASy_Traslate ToolExPASy_Traslate Toolhttp://expasy.nhri.org.tw/tools/dna.html
Bioinformatics Research Center, Academia SinicaBioinformatics Research Center, Academia Sinicahttp://www.sinica.edu.tw/~trees/bioinformatics/bioinformatics.html
Related Links and Resources
Outline
Some Research Topics Related Links and Resources Bioinformation Research Center (BR
C)
Introduction
Bioinformation Research Center (BRC)
2690/4/9 pm 中研院生物資訊中心
Firewall
Local Server
Lab. 1 Lab. 2 Lab. 3
27中研院生物資訊中心90/4/9 pm
90/4/9 pm (BRC)中研院生物資訊中心 28
CRASA:CRASA: CComplexity RReduction AAlgorithm for SSequence AAnalysis
Genome Annotation Alternative Splicing SNP (Single Nucleotide Polymorphism)
cDNA database
Genome Sequences: Chromosome1~22,
X,Y
90/4/9 pm (BRC)中研院生物資訊中心 29
CRASA:CRASA: CComplexity RReduction AAlgorithm for SSequence AAnalysis
PC Clustering: 10 PC (PIII-667), 1 Server Win2000 (NT) HD: IDE support RAID DB2
Progressive Processing: Pyramid Structure Pattern Match Direct Search Parallel Processing
Environment
Algorithm
Server
query
p1 p2 p3
HD I/O bound
Network I/O bound
Sorting & assembling: CPU bound
Parallel ProcessingParallel Processing
30中研院生物資訊中心90/4/9 pm
BioinformaticsBioinformatics
Computer Science Biology Computer Science Biology
??