생물정보학 10강
DESCRIPTION
한남대학교 생물정보학 10강TRANSCRIPT
Bioinformatics
2014 2학기
생명시스템과학과
한남대학교
10강2014.10.28
강의계획서
주 수업내용
1주 생물정보학의개요및기본이론
2주차 추석(휴강)
3주차 서열분석의원리 I
4주차 서열분석의원리 II
5주차 단백질의구조및기능예측
6주차 지놈시퀀싱및시퀀스어셈블리
7주차 중간고사
8주차 차세대시퀀싱 (Next Generation Sequencing)
9주차 개인유전체학 I
10주차
개인유전체학 II
11주차
발현체학
12주차
메타지놈
13주차
최신연구동향
Personal Genome
지난시간에는..
• 개인지놈에는어떠한변이가있을수있는가?
• 이를발견하기위해서는어떤방법이있는가?
질병과유전
질병의원인
- 기후- 음식- 공해- 생활습관..
환경적인요인 유전적인요인
- Mendelian Diseases주로단일유전자의변이에의한질병멘델의유전법칙에의해유전
- Complex Diseases여러개의유전자에의해결정되는질병멘델의유전법칙을따르지않음
단일유전자의변이와질병연관성
• 하나의유전자에서의변이에의한질병
• 멘델유전법칙을따름
- Sexual / Autosomal 변이가성염색체에있는가 (예: 색맹), 아니면상염색체에있는가?- Dominant /Recessive 변이가열성인가우성인가?
• 주로희귀질병
- 겸상적혈구빈혈증 Sickled Cell Anemina- 헌팅턴 Huntington Diseases- 듀센형근위축증 Duchenne Musclar Dystropy- Cystic Fibrosis- Spinal Muscular atrophy
• Breast Cancer
- BRCA1/BRCA2
겸상적혈구빈혈증 Sickled Cell Anemina
http://www.snpedia.com/index.php/Rs334
Beta-globin의 6번째 Glutamate(A) 가 Valine(T)로변화GAG (E) -> GTG (V)
<-T:T는말라리아에대한내성이10배증가함
듀센형근위축증 Duchenne Musclar Dystropy
Dystrophin
근육에서액틴과세포막을연결하는단백질
인체내에서가장긴유전자
- 2.4Mb- Exon의수 : 79개- 단백질의크기 : 425kDa
Nonsense (Stop Codon) 혹은 indel에의해유전자가파괴되면질병이유래됨
헌팅턴 Huntington Diseases
- Neurogegenerative genetic disorders - Autosomal Dominant Diseases
- Huntingtin 유전자내의 CAG Repeat의증가로인해발생
두개의 Huntingtin 중어느한쪽에서돌연변이가발생하면돌연변이가우성이되어질병이발생
http://www.uniprot.org/uniprot/P42858
CAG Repeats
36개이상의 CAG Repeat가있는경우에는정상적인단백질형성을저해
BRCA1/BRCA2
- Tumor Suppressor Gene
기능을저해하는돌연변이가존재하는경우유방암 /난소암의발생을증가시킴
- BRCA1/BRCA2 의돌연변이존재는전체유방암의 5-10% 의원인이됨 (유전적인유방암의약 20-25%)
- 전체여성인구중유방암의확률 : 12%- BRCA1/2 돌연변이가있는사람중 70세이전에유방암이발생활확률 : 40-
50%
- 예방적유방절제술 : BRCA1/2 돌연변이가있어유방암의위험도가높은경우미리유방을
Angelica Jollie got a preventive double mastectomy
BRCA1/BRCA2 Genetic Testing
Myriad Genetic 이라는회사에서특허권을가지고있었음
“유전자에대한특허권은타당한가?”
2013년 미대법원에의해서특허가무효화됨
Genomic DNA Isolation
PCR Exon regions of BRCA1/BRCA2
Sequencing
Identity mutations
단일유전자분석의한계
• 단일유전자의돌연변이에의해질병과같이형질에큰영향을주는경우는그리많지않음.
• 대개의경우복수의유전자에서의작은변화의총합이형질의변화를초래
• 대개의만성질병의경우한두개의유전자에서의변화에따라서형질이나타나는것이아니라수많은유전자가관여함
- 당뇨, 고혈압, 비만, 심장질환, 동맥경화증,알츠하이머..
• 이러한경우여기에관여되는유전자를어떻게연구할것인가?
Genome Wide Association Study(GWAS)
• 환자군과정상군을선택• 이들의 SNP/SNV 등의 DNA Variation을모두검사• 이러한 DNA Variation 중정상군에비해서환자에게서통계적으로유의하게많이• 나타나는유전변이는어떠한것이있는가?• 이들을분석하여특정질환과관련된유전변이를발굴
Common disease-Common Variant Hypothesis
• 발병빈도가높은질환 (심장질환, 암, 고혈압, 비만, 당뇨…) 중에서유전적인요인이존재한다면..
• 이러한것은발병빈도가높으므로, 이러한것을유발하는유전변이는인구중많이존재하는유전변이안에대개있을것이다
• 따라서인구중에서빈도가높은 SNP 만을골라서, 이러한 SNP의분포를조사하면여러가지만성질환의유전요인을규명할수있을것이다.
SNP Microarray for GWAS
Affymetrix, http://www.affymetrix.comAssay ~ 0.7 - 5M SNPs
Sample binds to
array
Labeled probes
bind to sample,
differentiating
between the
two alleles
Make it bright
enough then
measure intensity
of array
• 핵심아이디어 : 정상인과질병을가진사람간의유전적변이를찾아보자
• Microarray 를통하여수백만의유전변이를한번에찾고,
• 공통적인유전변이의위치를파악하고
• 통계적으로유의한변이를찾기위해수많은사람의코호트 (Cohorts)가필요
Genome-wide association
studies (GWAS)
Manolio et al., Clin Invest 2008
Linkage Map
재조합 : Recombination
B
b
A
a
B
b
A
a B
bA
a
b
b
A
a
유전자사이의거리가멀수록, 재조합의확률이높다
W V M
30 3w v m
W-V 빈도 : 30, W-M 빈도 : 33%, V-M 빈도 : 3%
Linkage Disequillibirum (LD)
Manhattan plot
Filter for Mendelian inheritance
엄마, 아빠, 자식의 데이터가 다 있다고가정했을때
A|A-----A|A
|
A|T ← 엄마아빠가 다 A|A인데 T는 어디서 튀어나온거임?
(De novo mutation일수도 있으나 대개는Genotyping Error)
• GWAS의최종결과물 : 정상인에비해환자군에서통계적으로유의하게많이나오는 SNP의목록
• GWAS를통해서다음과같은질문에답을얻을수있는가?
– Genotype로부터질병위험도를예측가능한가?
– 어떤유전변이가질병을유발하는가?
– 어떤유전자가질병에관여하는가?
GWAS 결과가말해주는것들
• 유전형에서부터질병위험보를예측가능한가?– 아마도..그렇지만 GWAS를통해예측가능한유전적인요인은극히일부
– “Missing heritability”
• 어떤변이가질병을유발하는지알수있나?– 아뇨.
– GWAS는질병을유발하는변이와같이따라다니는 SNP를마커로하여질병을추적하므로그변이자체가질병을유발하지는않는다
GWAS 결과가말해주는것들
• 어떤유전자가관여하는가?
– 실제질병의기전을알기위해서는이걸알아야함.
– GWAS 를통해서 ‘용의자’ 유전자를추정하곤하나, 대개이전의연구결과에의해서추정된것들
– GWAS 결과만가지고특정한유전자를찝어내는경우는거의없으며, 대개의 GWAS
에서찾아진변이는유전자사이/인트론에위치하는경우가많음
– 전사조절의경우다른위치에있는유전자의전사를조절하는경우도있으나이를
GWAS 결과의의미부여
• GWAS 를통해찾아진 ‘질병과연관성이있을수있는유전변이’ 의 93% 가 Intergenic Region에위치
• 만약직접단백질을코딩하는부분의변화, 혹은전사조절을하는프로모터영역의변화가아니라면이들은어떻게유전자발현등에영향을미치어질병과관련성을부여할것인가?
• 이러한것들의의미를부여하기위한여러가지시도들
DNA라고다똑같다는편견을버리세요
Chromatin이풀려있는영역
Chromatin이풀려있는영역은 ‘뭔가’
생물학적으로의미가있는역할을할가능성이높음
이러한영역은 DNase 처리에민감
이러한영역을찾으려면?
GWAS SNP과 DHS 영역과의관계
원거리에있는유전자와 GWAS SNP의상호작용
Missing heritability
• 복합적인질병이유전적인요인에의해서일어나나
• GWAS study를통해서발굴된 ‘유전요인’ 은 전체유전요인의 5% 이하만설명가능
Height = 80-90% genetic GWAS explains <5%Autism = 90% genetic GWAS explains <5%
• 나머지는어디로갔는가?
• Common disease-Common Variant Hypothesis : 맞는가?
Common Disease, Rare Variant
• 일반적인질병은흔히보는유전변이에의해서일어나는것이아니라
• 빈도가낮은유전변이의총합에의해서생성됨
• 이런빈도가낮은유전변이는대개 SNP Microarray에존재하지않음
Non-SNP Variants 의중요성
• GWAS ignored all but SNPs – no structural or copy-number variants (CNVs):– Detection of CNVs using SNP arrays is very limited
• These have been shown important in schizophrenia, autism, microcephaly, heart disease…many more.
• Also, we know major genome differences between humans (even monozygotic twins)
• Good evidence that these regions are very dynamic, i.e. non-Mendelian
Next generation sequencing
● Rare Variant 는 Microarray로발굴하는것이불가능
● 이경우에는 NGS 에의한시퀀싱이필요
● 지놈에존재하는 “all” SNP를발굴
● 비쌈 -.-
● WGS or Exome
The 1000 Genome Project
- 목표 : 시퀀싱을통해서인간지놈이어디까지변화할수있는지알아보는프로젝트
- 왜이런게필요한가?
• 질병관련변이가 “이근처” 에있는줄은알았지만,• 실제로이질병을유발하는데관련되어있는변이는무엇인가?• 이를위해서는인구중에서일어날수있는모든변이를알아야겠음!
1000 Genome Projects
• 인간유전학의기본데이터를알기위한국제공동프로젝트
- 취지 : 복수의지놈을시퀀싱하여인간에서발생할수있는거의대다수의 “일반적인” 변이를몽땅다알아내기
- 목표• Discover population level human genetic variations of all
types (95% of variation > 1% frequency)• Define haplotype structure in the human genome• Develop sequence analysis methods, tools, and other
reagents that can be transferred to other sequencing projects
그결과개인지놈을시퀀싱하여나온변이중 dbSNP에안나오는
것은..
Date Fraction not in dbSNP
February, 2000 98%
February, 2001 80%
April, 2008 10%
February, 2011 2%
Now <1%
http://www.1000genomes.org
43
http://browser.1000genomes.org
Genes and SNPs
UTRCoding
Intron
Line indicates number of SNPS Each Line is One SNP
Region in Detail
File upload to view with 1000
Genomes data
• Supports popular file types:
– BAM, BED, bedGraph, BigWig, GBrowse,
Generic, GFF, GTF, PSL, VCF*, WIG
Manage your data
Uploaded VCF Example:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.wgs.phase1_release_v3.20101
123.snps_indels_sv.sites.vcf.gz
Gene View
49
Click a Gene then ‘Variation Table’ or ‘Variation Image’
Download
as csv
Get in vcf format
Gene Tab
Variation Image
• Gene variation
zoom
Transcript Tab:
Variations
Effect on Protein:
• SIFT
• PolyPhen
Variation Pages
52
http://browser.1000genomes.org
Exome Aggregation Consortium
http://exac.broadinstitute.org/
여러가지목적으로시퀀싱된 Exome Data를모아서, 실제로유전자의기능에영향을미칠만한변이들의목록을조사한데이터베이스
DEMO
1000 Genome Browsers
Exome Aggregation Consortium
Personal Genome and medicine
• 기성복 vs 맞춤정장
• 기성의학 vs 맞춤정장
• 사람마다유전변이가있고, 특정약물에대한반응도도틀림
• 개개인의유전정보에따른맞춤치료
• 개인유전체정보를얻는것이급선무
약물의유효성
약물유전학 Pharmacogenetics
• 약물에대해서다른반응을보이는유전적변이에대한연구
• 약물에대한반응성은유전적인요인이 20-95% 이상차지
• 비유전적요인 : 나이, 기관, 약물상호작용, 질병종류
Personalized medicine today yesterday
• Cytochrome P450 genotyping test– Enzyme group ‘cytochrome P450’ (CYP450– Many types of medications(including antidepressents, anticoagulants,
proton pump inhibitors, etc)– Determine dosing and effects of these drugs.
• Thiopurine methyltransferase test– Thiopurine– Thiopurine methyltransferase (TPMT)
• UGT1A1 TA repeat genotype test– Irinotecan (Camptosar)– UGT1A1 enzyme
• Dihydropyrimidine dehydrogenase test– 5-flourouracil (5-FU)– Dihydropyrimidine dehydrogenase enzyme– Responsible for breaking down 5-FU
Uses in Muscular Dystrophy:
• Becker and Duchenne MD – same family of disease; Duchenne’s more severe than Becker’s because generally the reading frame is preserved in BMD while it is not in DMD.
• DMD – death around age 20; BMD – life expectancy may be reduced, but some have a normal life span. Severity partially depends on mutation.
• Dystrophin is the largest known gene in the human body, located on the X chromosome.
• 79 exons• ~15% caused by premature
stop codons• Phenotype-genotype
correlation studies
Gentamicin treatment in DMD/BMD
• Aminoglycoside antibiotic synthesized by Micromonospora
• Works by binding the 30S subunit (inhibition site) of the bacterial ribosome, interrupting protein synthesis (stop codon readthrough)
Gentamicin treatment of Duchenne and Becker muscular dystrophy due to nonsense mutations. (Wagner et al 2001)
• Some success in mdx mouse model – suppressed truncation of protein and improved phenotype.
• Cons: highly nephrotoxic; can have psychiatric side effects.
Ataluren (PTC-124) and PRO051• Mutation specific
• Both aim to restore reading frame:– Ataluren does this through ribosomal stop codon readthrough
– PRO051 does this through exon skipping (block splicing machinery)
Duchenne Becker phenotype
•Nonsense mutations result in a premature stop codon (UAG, UAA, or UGA) and cause a truncated protein.
•Works best on UGA stop codon.
Concept applicable to other diseases that also result from nonsense mutations, such as cystic fibrosis and nonsense-mutation hemophilia A and B (nmHA/B).
Ataluren mechanism
Cancer Pharmacogenomics and Tumor and Germline Genomes
Wang L et al. N Engl J Med 2011;364:1144-1153.
Anticancer drugs approved by the Food and Drug Administration with labeling regarding pharmacogenomic biomarkers
Wang L et al. N Engl J Med 2011;364:1144-1153.
Lebrikizumab
treatment
in asthma :
efficacy
related to
serum periostin
Corren et al NEJM 2011
Lebrikizumab is a monoclonal
antibody that neutralizes IL-13.
IL-13 induces bronchial epithelial cells
to secrete periostin.
Patients with high serum periostin
respond better.