생물학 연구를 위한 컴퓨터 활용기술 11강
TRANSCRIPT
Computational Skill for Modern Biology Research
Department of BiologyChungbuk National University
11th Lecture 2015.11.24
NGS Analysis IV : Gene Set Analysis
Syllabus주 수업내용1 주차 Introduction : Why we need to learn this stuff?
2 주차 Basic of Unix and running BLAST in your PC
3 주차 Unix Command Prompt II and shell scripts
4 주차 Basic of programming (Python programming)
5 주차 Python Scripting II and sequence manipulations
6 주차 Ipython Notebook and Pandas
7 주차 Basic of Next Generation Sequencings and Tutorial
8 주차9 주차 Next Generation Sequencing Analysis I
10 주차 Next Generation Sequencing Analysis II
11 주차 Next Generation Sequencing Analysis III
12 주차 Next Generation Sequencing Analysis IV
13 주차14 주차
Differential Expression Data MiningSlueth
Analysis-Test Table
Download Table
Data Mining with Ipython NotebookRead ‘test_table.csv’ as dataFrame
P Values FDR(False Discovery Rate)
Mean expression level (logged)
Fold Change
Remove datasets without data
Filtering
Fold change is bigger than 2 FDR is less than 0.01
Observation(expression level is higher than 2)
Read abundance table for each samples
Save them as abundance.csv
Read abundance Table in Pandas
Same Transcripts
Different Samples
Extract transcripts id with differential expression
Select transcripts with differential expression met on criterions
Using ‘pivot’, reshape dataFrame
Calcurate average of tpm for treatment and samples, and filter them out
Draw Clustermap
Use packages called ‘seaborn’ (if it is not there)
In command line, conda install seaborn
Clustermap
Red : overexpressed geneBlue : Downregulated gene
Zoom out these regions
Find out Gene names corresponding upper regions
Application of NGS technology- DNA : Genome Sequencing
• Genome Sequence• Personal Genomic Sequencing : Variant Discovery
- RNA : RNA-Seq
• Expression levels of mRNA
- Anything Else?
- Epigenetics States of Cell
DNA methylationHistone methylation
- Transcription Factors binding : ChIP Sequencing
- Chromatin Status
- RIP-Seq : RNA-Protein Interactions
Application of NGS technology
Application of NGS on Epigenetics
Epigenetics : changes in gene expression without sequence changes
During development of organisms, cell undergoes various differentiation stageAlthough they share common DNA, they have different expression pattern
How these different expresion patterns were determined?
DNA Methylation
Histone Modification
Two Factors in epigenetics
Histone Modification
NGS and Epigenetics- How we can deduce DNA methylation or Histone Mark?
- DNA methylation : Bisulfide Sequencing
- Histone Mark : Chromatin Immunoprecipitation – Sequencing (CHiP-Seq)
* 어떻게 Methylation 된 C 를 알 수 있는가 ?
Bisulfide Sequencing
• By treatment of Bisulfide on DNA, Cytosine is changed as Uracile (Read as T)
• Methylated Cytosin resistant to bisulfide treatments
Genome Wide BS-Seq
Analysis of Bisulfide Sequencing
CGGGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGG
1. Reference Sequence
2. C-T Conversion except CG
CGGGCGTGGTGGCGCGCGTTTGTAATTTTAGTTATTCGGGAGGTTGAGGTAGGAGAATCGTTTGAATTCGGGAGGCGGAGGTTGTAGTGAGTCGAGATCGCGTTATTGTATTTTAGTTTGG
CGGGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGG
3. Converted sequence
4. Align sequecing results in Converted Sequence
Analysis of small portion of sequences
http://services.ibc.uni-stuttgart.de/BDPC/index.php
http://services.ibc.uni-stuttgart.de/BDPC/BISMA/examples_unique.php
<- 시퀀싱 데이터
<- 레퍼런스 데이터
DNA Methylation in Genome Browser
- Histone Mark : Chromatin Immunoprecipitation – Sequencing (CHiP-Seq)
Histone MethylationHistone Acetylation
-> align on reference genome
지놈의 어떤 영역에 어떤 Histone Mark 가 있는지를 파악가능
Histone Mark
After Sequencing
Quality Control
Align to reference Genome
Analysis of alignment file (Finding Peak)
Motif Discovery / Secondary Analysis
ChIP results in Genome Browser
H3K4me3 : Mark for active Promoter
H3K27ac : Mark for active Promoter
Transcription Start
H3K27me3 : Inactive chromatin
ChIP with other factors
Transcription Factors
“Yamanaka Factors”
- Oct4, Sox2, Klf-4, c-Myc (OSKM)- Transcription Factors which express abundantly in Embryonic Stem Cell- Screened from 24 transcription factors expressed in ESC- Retroviral expression of these 4 genes in embryonic/Adult fibroblast transformCells into ‘Stem Cell Like’ cells
iPSC (induced Pluripotent Stem Cell)
Molecular event of induced pluripotency
Questions
How we know the specific transcription factors bind which DNA?
Electrophoresis Mobility Shift Assay (EMSA)
Binding of Protein with DNASlow down migration speed in gel
Label DNA with isotpe
Drawbacks : Low throughput, You cannot test genome wide levels
Genome Sequence
Target Site of Transcription Factor
Chromatin immunoprecipitationSequencing
Genome Sequence
Read Depth ( 얼마나 많은 시퀀싱 Read 가 특정위치에 쌓여있는가 ?) 에 의해 전사인자의결합부위를 확인
Sequence Mapping
Transcription Factor
Gene Expressed by Estrogen Stimulations
Transcription Factor Binding
Transcription
Chromatin Status
“ 단단히 꼬여있는 부분과 그렇지 않은 부분의 파악”
RIP-Seq
ChIP-Seq : Find out DNA regions bind to specific protein
Then, How about RNA?How we can find RNAs bound on specific Proteins?
RIP-Seq : RNA interacting Protein Sequencing
http://rbpdb.ccbr.utoronto.ca//
고등생물에는약 200-400 개의RNA bindingProtein 이 존재
http://cistrome.org/dc