생물정보학 10강

71
Bioinformatics 2014 2학기 생명시스템과학과 한남대학교 102014.10.28

Upload: suk-namgoong

Post on 03-Jul-2015

436 views

Category:

Science


4 download

DESCRIPTION

한남대학교 생물정보학 10강

TRANSCRIPT

Page 1: 생물정보학 10강

Bioinformatics

2014 2학기

생명시스템과학과

한남대학교

10강2014.10.28

Page 2: 생물정보학 10강

강의계획서

주 수업내용

1주 생물정보학의개요및기본이론

2주차 추석(휴강)

3주차 서열분석의원리 I

4주차 서열분석의원리 II

5주차 단백질의구조및기능예측

6주차 지놈시퀀싱및시퀀스어셈블리

7주차 중간고사

8주차 차세대시퀀싱 (Next Generation Sequencing)

9주차 개인유전체학 I

10주차

개인유전체학 II

11주차

발현체학

12주차

메타지놈

13주차

최신연구동향

Page 3: 생물정보학 10강

Personal Genome

지난시간에는..

• 개인지놈에는어떠한변이가있을수있는가?

• 이를발견하기위해서는어떤방법이있는가?

Page 4: 생물정보학 10강

질병과유전

질병의원인

- 기후- 음식- 공해- 생활습관..

환경적인요인 유전적인요인

- Mendelian Diseases주로단일유전자의변이에의한질병멘델의유전법칙에의해유전

- Complex Diseases여러개의유전자에의해결정되는질병멘델의유전법칙을따르지않음

Page 5: 생물정보학 10강

단일유전자의변이와질병연관성

• 하나의유전자에서의변이에의한질병

• 멘델유전법칙을따름

- Sexual / Autosomal 변이가성염색체에있는가 (예: 색맹), 아니면상염색체에있는가?- Dominant /Recessive 변이가열성인가우성인가?

• 주로희귀질병

- 겸상적혈구빈혈증 Sickled Cell Anemina- 헌팅턴 Huntington Diseases- 듀센형근위축증 Duchenne Musclar Dystropy- Cystic Fibrosis- Spinal Muscular atrophy

• Breast Cancer

- BRCA1/BRCA2

Page 6: 생물정보학 10강

겸상적혈구빈혈증 Sickled Cell Anemina

http://www.snpedia.com/index.php/Rs334

Beta-globin의 6번째 Glutamate(A) 가 Valine(T)로변화GAG (E) -> GTG (V)

<-T:T는말라리아에대한내성이10배증가함

Page 7: 생물정보학 10강

듀센형근위축증 Duchenne Musclar Dystropy

Dystrophin

근육에서액틴과세포막을연결하는단백질

인체내에서가장긴유전자

- 2.4Mb- Exon의수 : 79개- 단백질의크기 : 425kDa

Nonsense (Stop Codon) 혹은 indel에의해유전자가파괴되면질병이유래됨

Page 8: 생물정보학 10강

헌팅턴 Huntington Diseases

- Neurogegenerative genetic disorders - Autosomal Dominant Diseases

- Huntingtin 유전자내의 CAG Repeat의증가로인해발생

두개의 Huntingtin 중어느한쪽에서돌연변이가발생하면돌연변이가우성이되어질병이발생

http://www.uniprot.org/uniprot/P42858

Page 9: 생물정보학 10강

CAG Repeats

36개이상의 CAG Repeat가있는경우에는정상적인단백질형성을저해

Page 10: 생물정보학 10강

BRCA1/BRCA2

- Tumor Suppressor Gene

기능을저해하는돌연변이가존재하는경우유방암 /난소암의발생을증가시킴

- BRCA1/BRCA2 의돌연변이존재는전체유방암의 5-10% 의원인이됨 (유전적인유방암의약 20-25%)

- 전체여성인구중유방암의확률 : 12%- BRCA1/2 돌연변이가있는사람중 70세이전에유방암이발생활확률 : 40-

50%

- 예방적유방절제술 : BRCA1/2 돌연변이가있어유방암의위험도가높은경우미리유방을

Page 11: 생물정보학 10강

Angelica Jollie got a preventive double mastectomy

Page 12: 생물정보학 10강

BRCA1/BRCA2 Genetic Testing

Myriad Genetic 이라는회사에서특허권을가지고있었음

“유전자에대한특허권은타당한가?”

2013년 미대법원에의해서특허가무효화됨

Genomic DNA Isolation

PCR Exon regions of BRCA1/BRCA2

Sequencing

Identity mutations

Page 13: 생물정보학 10강

단일유전자분석의한계

• 단일유전자의돌연변이에의해질병과같이형질에큰영향을주는경우는그리많지않음.

• 대개의경우복수의유전자에서의작은변화의총합이형질의변화를초래

• 대개의만성질병의경우한두개의유전자에서의변화에따라서형질이나타나는것이아니라수많은유전자가관여함

- 당뇨, 고혈압, 비만, 심장질환, 동맥경화증,알츠하이머..

• 이러한경우여기에관여되는유전자를어떻게연구할것인가?

Page 14: 생물정보학 10강

Genome Wide Association Study(GWAS)

• 환자군과정상군을선택• 이들의 SNP/SNV 등의 DNA Variation을모두검사• 이러한 DNA Variation 중정상군에비해서환자에게서통계적으로유의하게많이• 나타나는유전변이는어떠한것이있는가?• 이들을분석하여특정질환과관련된유전변이를발굴

Page 15: 생물정보학 10강

Common disease-Common Variant Hypothesis

• 발병빈도가높은질환 (심장질환, 암, 고혈압, 비만, 당뇨…) 중에서유전적인요인이존재한다면..

• 이러한것은발병빈도가높으므로, 이러한것을유발하는유전변이는인구중많이존재하는유전변이안에대개있을것이다

• 따라서인구중에서빈도가높은 SNP 만을골라서, 이러한 SNP의분포를조사하면여러가지만성질환의유전요인을규명할수있을것이다.

Page 16: 생물정보학 10강

SNP Microarray for GWAS

Affymetrix, http://www.affymetrix.comAssay ~ 0.7 - 5M SNPs

Sample binds to

array

Labeled probes

bind to sample,

differentiating

between the

two alleles

Make it bright

enough then

measure intensity

of array

Page 17: 생물정보학 10강

• 핵심아이디어 : 정상인과질병을가진사람간의유전적변이를찾아보자

• Microarray 를통하여수백만의유전변이를한번에찾고,

• 공통적인유전변이의위치를파악하고

• 통계적으로유의한변이를찾기위해수많은사람의코호트 (Cohorts)가필요

Genome-wide association

studies (GWAS)

Page 18: 생물정보학 10강
Page 19: 생물정보학 10강

Manolio et al., Clin Invest 2008

Page 20: 생물정보학 10강

Linkage Map

재조합 : Recombination

B

b

A

a

B

b

A

a B

bA

a

b

b

A

a

유전자사이의거리가멀수록, 재조합의확률이높다

W V M

30 3w v m

W-V 빈도 : 30, W-M 빈도 : 33%, V-M 빈도 : 3%

Page 21: 생물정보학 10강

Linkage Disequillibirum (LD)

Page 22: 생물정보학 10강

Manhattan plot

Page 23: 생물정보학 10강

Filter for Mendelian inheritance

엄마, 아빠, 자식의 데이터가 다 있다고가정했을때

A|A-----A|A

|

A|T ← 엄마아빠가 다 A|A인데 T는 어디서 튀어나온거임?

(De novo mutation일수도 있으나 대개는Genotyping Error)

Page 24: 생물정보학 10강

• GWAS의최종결과물 : 정상인에비해환자군에서통계적으로유의하게많이나오는 SNP의목록

• GWAS를통해서다음과같은질문에답을얻을수있는가?

– Genotype로부터질병위험도를예측가능한가?

– 어떤유전변이가질병을유발하는가?

– 어떤유전자가질병에관여하는가?

Page 25: 생물정보학 10강

GWAS 결과가말해주는것들

• 유전형에서부터질병위험보를예측가능한가?– 아마도..그렇지만 GWAS를통해예측가능한유전적인요인은극히일부

– “Missing heritability”

• 어떤변이가질병을유발하는지알수있나?– 아뇨.

– GWAS는질병을유발하는변이와같이따라다니는 SNP를마커로하여질병을추적하므로그변이자체가질병을유발하지는않는다

Page 26: 생물정보학 10강

GWAS 결과가말해주는것들

• 어떤유전자가관여하는가?

– 실제질병의기전을알기위해서는이걸알아야함.

– GWAS 를통해서 ‘용의자’ 유전자를추정하곤하나, 대개이전의연구결과에의해서추정된것들

– GWAS 결과만가지고특정한유전자를찝어내는경우는거의없으며, 대개의 GWAS

에서찾아진변이는유전자사이/인트론에위치하는경우가많음

– 전사조절의경우다른위치에있는유전자의전사를조절하는경우도있으나이를

Page 27: 생물정보학 10강

GWAS 결과의의미부여

• GWAS 를통해찾아진 ‘질병과연관성이있을수있는유전변이’ 의 93% 가 Intergenic Region에위치

• 만약직접단백질을코딩하는부분의변화, 혹은전사조절을하는프로모터영역의변화가아니라면이들은어떻게유전자발현등에영향을미치어질병과관련성을부여할것인가?

• 이러한것들의의미를부여하기위한여러가지시도들

Page 28: 생물정보학 10강

DNA라고다똑같다는편견을버리세요

Page 29: 생물정보학 10강

Chromatin이풀려있는영역

Chromatin이풀려있는영역은 ‘뭔가’

생물학적으로의미가있는역할을할가능성이높음

이러한영역은 DNase 처리에민감

이러한영역을찾으려면?

Page 30: 생물정보학 10강
Page 31: 생물정보학 10강

GWAS SNP과 DHS 영역과의관계

Page 32: 생물정보학 10강
Page 33: 생물정보학 10강

원거리에있는유전자와 GWAS SNP의상호작용

Page 34: 생물정보학 10강

Missing heritability

• 복합적인질병이유전적인요인에의해서일어나나

• GWAS study를통해서발굴된 ‘유전요인’ 은 전체유전요인의 5% 이하만설명가능

Height = 80-90% genetic GWAS explains <5%Autism = 90% genetic GWAS explains <5%

• 나머지는어디로갔는가?

• Common disease-Common Variant Hypothesis : 맞는가?

Page 35: 생물정보학 10강

Common Disease, Rare Variant

• 일반적인질병은흔히보는유전변이에의해서일어나는것이아니라

• 빈도가낮은유전변이의총합에의해서생성됨

• 이런빈도가낮은유전변이는대개 SNP Microarray에존재하지않음

Page 36: 생물정보학 10강

Non-SNP Variants 의중요성

• GWAS ignored all but SNPs – no structural or copy-number variants (CNVs):– Detection of CNVs using SNP arrays is very limited

• These have been shown important in schizophrenia, autism, microcephaly, heart disease…many more.

• Also, we know major genome differences between humans (even monozygotic twins)

• Good evidence that these regions are very dynamic, i.e. non-Mendelian

Page 37: 생물정보학 10강

Next generation sequencing

● Rare Variant 는 Microarray로발굴하는것이불가능

● 이경우에는 NGS 에의한시퀀싱이필요

● 지놈에존재하는 “all” SNP를발굴

● 비쌈 -.-

● WGS or Exome

Page 38: 생물정보학 10강

The 1000 Genome Project

- 목표 : 시퀀싱을통해서인간지놈이어디까지변화할수있는지알아보는프로젝트

- 왜이런게필요한가?

Page 39: 생물정보학 10강
Page 40: 생물정보학 10강

• 질병관련변이가 “이근처” 에있는줄은알았지만,• 실제로이질병을유발하는데관련되어있는변이는무엇인가?• 이를위해서는인구중에서일어날수있는모든변이를알아야겠음!

Page 41: 생물정보학 10강

1000 Genome Projects

• 인간유전학의기본데이터를알기위한국제공동프로젝트

- 취지 : 복수의지놈을시퀀싱하여인간에서발생할수있는거의대다수의 “일반적인” 변이를몽땅다알아내기

- 목표• Discover population level human genetic variations of all

types (95% of variation > 1% frequency)• Define haplotype structure in the human genome• Develop sequence analysis methods, tools, and other

reagents that can be transferred to other sequencing projects

Page 42: 생물정보학 10강

그결과개인지놈을시퀀싱하여나온변이중 dbSNP에안나오는

것은..

Date Fraction not in dbSNP

February, 2000 98%

February, 2001 80%

April, 2008 10%

February, 2011 2%

Now <1%

Page 43: 생물정보학 10강

http://www.1000genomes.org

43

Page 44: 생물정보학 10강

http://browser.1000genomes.org

Page 45: 생물정보학 10강

Genes and SNPs

UTRCoding

Intron

Line indicates number of SNPS Each Line is One SNP

Page 46: 생물정보학 10강

Region in Detail

Page 47: 생물정보학 10강

File upload to view with 1000

Genomes data

• Supports popular file types:

– BAM, BED, bedGraph, BigWig, GBrowse,

Generic, GFF, GTF, PSL, VCF*, WIG

Manage your data

Page 48: 생물정보학 10강

Uploaded VCF Example:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.wgs.phase1_release_v3.20101

123.snps_indels_sv.sites.vcf.gz

Page 49: 생물정보학 10강

Gene View

49

Click a Gene then ‘Variation Table’ or ‘Variation Image’

Download

as csv

Get in vcf format

Gene Tab

Page 50: 생물정보학 10강

Variation Image

• Gene variation

zoom

Page 51: 생물정보학 10강

Transcript Tab:

Variations

Effect on Protein:

• SIFT

• PolyPhen

Page 52: 생물정보학 10강

Variation Pages

52

Page 53: 생물정보학 10강

http://browser.1000genomes.org

Page 54: 생물정보학 10강

Exome Aggregation Consortium

http://exac.broadinstitute.org/

여러가지목적으로시퀀싱된 Exome Data를모아서, 실제로유전자의기능에영향을미칠만한변이들의목록을조사한데이터베이스

Page 55: 생물정보학 10강
Page 56: 생물정보학 10강
Page 57: 생물정보학 10강

DEMO

1000 Genome Browsers

Exome Aggregation Consortium

Page 58: 생물정보학 10강

Personal Genome and medicine

• 기성복 vs 맞춤정장

• 기성의학 vs 맞춤정장

• 사람마다유전변이가있고, 특정약물에대한반응도도틀림

• 개개인의유전정보에따른맞춤치료

• 개인유전체정보를얻는것이급선무

Page 59: 생물정보학 10강

약물의유효성

Page 60: 생물정보학 10강

약물유전학 Pharmacogenetics

• 약물에대해서다른반응을보이는유전적변이에대한연구

• 약물에대한반응성은유전적인요인이 20-95% 이상차지

• 비유전적요인 : 나이, 기관, 약물상호작용, 질병종류

Page 61: 생물정보학 10강

Personalized medicine today yesterday

• Cytochrome P450 genotyping test– Enzyme group ‘cytochrome P450’ (CYP450– Many types of medications(including antidepressents, anticoagulants,

proton pump inhibitors, etc)– Determine dosing and effects of these drugs.

• Thiopurine methyltransferase test– Thiopurine– Thiopurine methyltransferase (TPMT)

• UGT1A1 TA repeat genotype test– Irinotecan (Camptosar)– UGT1A1 enzyme

• Dihydropyrimidine dehydrogenase test– 5-flourouracil (5-FU)– Dihydropyrimidine dehydrogenase enzyme– Responsible for breaking down 5-FU

Page 62: 생물정보학 10강

Uses in Muscular Dystrophy:

• Becker and Duchenne MD – same family of disease; Duchenne’s more severe than Becker’s because generally the reading frame is preserved in BMD while it is not in DMD.

• DMD – death around age 20; BMD – life expectancy may be reduced, but some have a normal life span. Severity partially depends on mutation.

• Dystrophin is the largest known gene in the human body, located on the X chromosome.

• 79 exons• ~15% caused by premature

stop codons• Phenotype-genotype

correlation studies

Page 63: 생물정보학 10강

Gentamicin treatment in DMD/BMD

• Aminoglycoside antibiotic synthesized by Micromonospora

• Works by binding the 30S subunit (inhibition site) of the bacterial ribosome, interrupting protein synthesis (stop codon readthrough)

Gentamicin treatment of Duchenne and Becker muscular dystrophy due to nonsense mutations. (Wagner et al 2001)

• Some success in mdx mouse model – suppressed truncation of protein and improved phenotype.

• Cons: highly nephrotoxic; can have psychiatric side effects.

Page 64: 생물정보학 10강

Ataluren (PTC-124) and PRO051• Mutation specific

• Both aim to restore reading frame:– Ataluren does this through ribosomal stop codon readthrough

– PRO051 does this through exon skipping (block splicing machinery)

Duchenne Becker phenotype

Page 65: 생물정보학 10강

•Nonsense mutations result in a premature stop codon (UAG, UAA, or UGA) and cause a truncated protein.

•Works best on UGA stop codon.

Concept applicable to other diseases that also result from nonsense mutations, such as cystic fibrosis and nonsense-mutation hemophilia A and B (nmHA/B).

Ataluren mechanism

Page 66: 생물정보학 10강

Cancer Pharmacogenomics and Tumor and Germline Genomes

Wang L et al. N Engl J Med 2011;364:1144-1153.

Page 67: 생물정보학 10강

Anticancer drugs approved by the Food and Drug Administration with labeling regarding pharmacogenomic biomarkers

Wang L et al. N Engl J Med 2011;364:1144-1153.

Page 68: 생물정보학 10강

Lebrikizumab

treatment

in asthma :

efficacy

related to

serum periostin

Corren et al NEJM 2011

Lebrikizumab is a monoclonal

antibody that neutralizes IL-13.

IL-13 induces bronchial epithelial cells

to secrete periostin.

Patients with high serum periostin

respond better.

Page 69: 생물정보학 10강
Page 70: 생물정보학 10강
Page 71: 생물정보학 10강