dnase i seq data analysis strategy - fudan universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf ·...

27
DNase I Seq data Analysis Strategy Dragon Star 2013 QianQin 同济大学

Upload: others

Post on 12-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

DNase I Seq data Analysis Strategy

Dragon  Star  2013  QianQin  同济大学

Page 2: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

WorkflowMapping(BWA/Bow8e)

Reads  filtering  and  format(SAMTOOLS  /Picard)

Peaks  Calling  (MACS/hotspot)

Pileup(Convert  to  bigwiggle) Peaks  BED  1 Peaks  BED  2

1.  Sampling  down  by  mappable  reads  2.  Scale  mappable  reads

1.  Data  comparison(bedops,  BEDTOOLS)  2.  Union  BED  3.  Mo8f  discovery  

Correla8on

Filtering  BedGraph,  BED(BEDTOOLS,  bedClip)

QC qrqc,  FastQC

Page 3: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Warm up

Page 4: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Examples on DHS

He,  H.  H.,  Meyer,  C.  A.,  Chen,  M.  W.,  Jordan,  V.  C.,  Brown,  M.,  &  Liu,  X.  S.  (2012).  Genome  research,  22(6),  1015–25.  doi:10.1101/gr.133280.111

Neph,  S.,  Vierstra,  J.,  Stergachis,  A.  B.,  Reynolds,  A.  P.,  Haugen,  E.,  Vernot,  B.,  Thurman,  R.  E.,  et  al.  (2012).  Nature,  489(7414),  83–90.  doi:10.1038/nature11212

Page 5: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Uncompress BAM to Fastq

•  Single  End  data  bamToFastq  –i  path_to_bam  –fq  output.fastq  

-­‐i  input  bam  files  -­‐fq  output  fastq  files    -­‐fq2  pair  end  

Page 6: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Format instruction•  FASTQ:    – hdp://en.wikipedia.org/wiki/FASTQ_format  

•  SAM/BAM  •  BED,  BedGraph,  BigBed  •  Wiggle,  BigWiggle  •  narrowPeak,  broadPeak  •  bed.starch

hdps://genome.ucsc.edu/FAQ/FAQformat.html

hdp://code.google.com/p/bedops/wiki/starchAndUnstarch

Page 7: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

SAM/BAM file instruction•  BAM  is  compressed  SAM    •  FLAGS  for  SE:  –  0  for  posi8ve  strand,  16  for  nega8ve  strand,  4  for  unmapped  

•  FLAGS  for  PE:  –  R  mate  reverse  strand,  r  read  reverse  strand  –  147  pair2  –  strand,  99  pair  1  +  strand  –  83  pair1  –  strand,  163  pair2  +  strand  

•  Common  FLAG:  –  NM  for  mismatch  level  –  XT  for  custom  tags  

hdp://genome.sph.umich.edu/wiki/SAM

Page 8: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Tips on shell

du  –h  file  du  –sh  .  grep  A  input.fastq  grep  0  input.fastq  

cut  -­‐f  5  input.sam  cut  -­‐f  3,4  input.sam  |  uniq  |  wc  –l  cut  –f  3,4  input.sam  |  grep  chr21  |  wc  -­‐l  

Page 9: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Task 1: get reads mapping location

Page 10: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Bowtie/Bowtie2

•  Index  genome      

•  Single  End  

bow8e-­‐build  chr21.fa  chr21  bow8e2-­‐build  chr21.fa  chr21  

bow4e2  chr21  input.fastq  -­‐S  output.sam  bow4e  chr21  input.fastq  -­‐S  output.sam

Page 11: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

BWA•  Index  genome  

•  Mapping    

bwa  index  -­‐a  bwtsw  chr21.fa  

bwa  aln  -­‐t  4  chr21.fa    input.fastq  -­‐f  output.fai    bwa  samse  -­‐f  output.sam  chr21.fa  output.fai  input.fastq    

Page 12: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Task 2: Alignment conversion and

mapping statistics

Page 13: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Samtools / Picard for Conversion

 

Convert  SAM  to  BAM

Convert  BAM  to  SAM

samtools  view  -­‐h  input.bam  -­‐o  output.sam  samtools  view  -­‐X  input.bam  -­‐o  output.sam  samtools  view  -­‐x  input.bam  -­‐o  output.sam

samtools  view  -­‐bS  input.sam  -­‐o  output.bam  samtools  sort  input.bam  output_sorted  samtools  merge  merge.bam  input1.bam  input2.bam

Page 14: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Samtools / Picard for reads filter and statistics

samtools  flagstat  input.bam

samtools  view  -­‐bq  1  input.bam  >  output.bam

Get  reliable  aligned  reads

Mapping  sta8s8cs

Page 15: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

BEDTOOLS/BEDOPS for reads format conversion

bamToBed  -­‐i  input.bam  >  input.bed

bedops  -­‐u  input1.bed  input2.bed  >  output.bedEquals

cat  input1.bed  input2.bed  |  sort-­‐bed  -­‐  >  output.bed

Convert  BAM  to  BED

Merge  BED  files

Page 16: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Task 3: Predict open chromatin regions

Page 17: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Peaks calling tools

•  MACS14/2  – hdps://github.com/taoliu/MACS/  – Built-­‐in  Cistrome,  user-­‐friendly  – Support  Pair  end  mode  

•  Hotspot  – Need  shell  and  Linux  opera8on  experience  – Largely  dependency  – hdp://www.uwencode.org/proj/hotspot/

Page 18: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

MACS14

macs14  -­‐t  test.bam  -­‐n  test  Rscript  test_model.r  ##  model  image  

macs14  -­‐-­‐keep-­‐dup  all  -­‐t  test.bam  -­‐n  test  

Keep  duplicates  or  not

Model  failed

macs14  -­‐-­‐keep-­‐dup  all    -­‐t  test.bam  -­‐n  test  -­‐-­‐nomodel  -­‐-­‐shiFsize  73

Page 19: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

MACS2•  Peaks  calling  –  macs2  callpeak  -­‐t  test.sam    -­‐n  test  –  macs2  callpeak  -­‐-­‐nomodel  -­‐-­‐shimsize  73  -­‐t  test.sam    -­‐n  test  

•  Down  sampling  –  macs2  randsample  -­‐t  test.sam  -­‐n  5000  -­‐-­‐seed  25  -­‐o  test.bed  

•  Filter  duplicates  –  macs2  filterdup  -­‐i  test.bam  -­‐o  test.bed  

•  Pileup  –  macs2  pileup  -­‐i  test.bam  –extsize  3  -­‐o  test.bed  –  sort  -­‐k1,1  -­‐k2,2  test.bed  >  sort.bed  

Page 20: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Task 4: Replicates consistency

Page 21: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

bedtools/bedops for comparison

bedops  –i  input1.bed  input2.bed  >  output.bed  bedtools  intersect  –a  input1.bed    -­‐b  input2.bed  >  output.bed  

bedops  –e  input1.bed  input2.bed  intersectBed  –a  -­‐u  input1.bed  input2.bed  

Get  input1.bed  overlapped  regions  only

Get  intersec8on  regions

Get  input1.bed  complementary  regions  

bedops  –d  input1.bed  input2.bed  intersectBed  –v  –a  input1.bed  –b  input2.bed  

Page 22: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Task 5: data visualization, annotation and

Motif discovery

Page 23: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

MDSeqpos

MDSeqPos.py  input.bed  -­‐d  -­‐m  cistrome.xml  -­‐p  0.05  hg19  -­‐s  hs

-­‐p  p  value  -­‐s  species  -­‐d  denovo  or  not  -­‐m  mo8f  databases,  transfac.xml,  cistrome.xml

sort  -­‐r  -­‐g  -­‐k  5  peaks.bed  >  input.bed

Get  most  accessible  chroma8n  regions

Mo8f  analysis

Page 24: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Data visualization and Cistrome application annotation

IGV  •  Set  data  ranges  •  Auto  scale  •  Find  most  enrichment  regions  •  Load  wiggle  and  peaks  BED  

RegPoten8al.py  -­‐t  test_peaks.bed  -­‐g  /mnt/Storage/data/sync_cistrome_lib/ceaslib/GeneTable/hg19  -­‐n  test  -­‐d  10000

Get  open  chroma8n  regions  nearby  genes  

Page 25: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Task summary

•  Get  Fastq  •  Mapping    •  Get  proper  format  •  Peaks  calling  •  Comparison  of  replicates  peaks    •  Data  visualiza8on  and  mo8f  analysis

Page 26: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

师资队伍

曹志伟 江赐忠 张勇

全职教授 �

Shirley Liu (Harvard) � Zhiping Weng

(UMass)

Wei Li (Baylor)

海外 �

讲座教授 �

千人计划

973首席科学家上海市浦江人才上海市东方学者计划上海市曙光计划

教育部新世纪优秀人才

上海市科委科技启明星计划教育部新世纪优秀人才

兼职教授 �

李亦学

协助引进 �

张帆刘雷

千人计划

Page 27: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’

Welcome join us !