hands-on tutorial on snp calling - transplant · hands-on tutorial on snp calling...

28
Hands-on Tutorial on SNP Calling Georg Haberer Manuel Spannagl Plant Genome and Systems Biology Group/PGSB

Upload: trinhnguyet

Post on 10-Apr-2018

331 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Hands-on Tutorial on SNP Calling

Georg  Haberer  Manuel  Spannagl  

Plant  Genome  and  Systems  Biology  Group/PGSB  

Page 2: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Types of Genomic Variation

Page 3: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

(Some) Applications of Genomic Variation

•  SNPs  have  broad  applica=ons  

•  High  frequency  •  Advanced  

subs=tu=on  models  –  Jukes-­‐Cantor  –  Generalized  =mes  

reversible  …  

•  NGS:  drama=c  impact  on  SNP  studies  

Page 4: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

NGS Snp Calling: A Simple Task?

Page 5: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Our Little Project in the Course

Page 6: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Aims of the practical course

You will learn …

•  run  programs  via  cmd  line  •  sketchy  understanding  of  

underlying  algorithms  •  Elements  of  a  basic  SNP  

pipeline  •  Interpret,  understand  and  

read  important  file  formats  •  Founda=on  to  develop  your  

own  SNP  pipeline  

You will NOT

•  Complete  overview  of  SNP  calling  methods  and  soNware  tools  

•  In-­‐depth  discussion  of  algorithms  

•  The  all-­‐in-­‐one  Swiss  army  knife  for  all  possible  applica=ons,  datasets  and  species    

Page 7: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

The FASTQ File Format

Page 8: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

ASCII

Page 9: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Phred Scores

Page 10: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Base Qualities in FASTQ

Page 11: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Diversion: The LINUX command line

Page 12: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Principle Cmd-Structure of our Programms

Page 13: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Practical Part I

•  Part  A:  The  LINUX  command  line  •  Part  B:  Read  mapping  

•  Please  finish  aNer  you  have  typed  both  commands  of  B.2,  they  will  run  in  the  background  while  we  will  proceed  with  the  presenta=on  

Page 14: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

BWA and samtools

•  BWA:  Burrow-­‐Wheeler  Alignment  –  Short  read  mapper  based  on  suffix  arrays  –  Modules  to  map  long  reads  –  Generates  SAM  (Sequence  Alignment/Map)  format    

•  Samtools  is  a  collec=on  of  programs  to  manipulate  SAM  forma[ed  files  –  Sor=ng,  Merging,  Indexing,  Viewing    

•  Alterna=ve  to  samtools:  java-­‐based  Picard  toolkit  –  h[p://sourceforge.net/projects/picard/  

Page 15: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Why do have to index the genome? The Alignment Problem for NGS Data

Page 16: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Genome Indexing: Fast (nearly) Exact Searches

Page 17: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

NGS Aligners/Mappers

•  NGS  aligner  are  rather  mappers  NOT  aligners!  

•  Considera=ons  for  selec=ng  an  aligner  –  Maintenance/updates?  –  PE  and  single  reads  –  Long/short  reads  (miSeq,  Illumina  …)  –  Pla`orm  (SOLID,  Illumina,  454)  –  Gapped/ungapped  alignments  –  Handling  of  unmapped  reads  and  

mul=ple  hits  

Page 18: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

SAM/BAM: The NGS Alignment Format

Page 19: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

SAM/BAM Format: Flags and Cigar Notation

•  h[ps://broadins=tute.github.io/picard/explain-­‐flags.html  •  Cigar  nota=on:  comprehensive  nota=on  of  pairwise  alignment  

Page 20: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Practical Part II

•  Please  complete  Part  C  

Page 21: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

SNP Calling •  Hardfilters  

–  eg.  mpileup  as  input  –  Use  #of  observa=ons,  mapping    and  base  quality  etc  etc  

•  Bayesian/Probabilis=c  models  –  Use  bayesian  sta=s=cs  to  derive  genotype  probabili=es  under  

data  observa=on  (~read  amappings)  –  Use  error  models  

•  Postprocessing  –  Hard  quality  filters  –  Machine  learning  methods  –  Training  and  evalua=on  on  known  SNPs  (eg.  1000  genome  

project),  literature  or  genotyping  arrays  

Page 22: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Bcftools and Tabix

•  Bc#ools  is  a  collec=on  of  u=li=es  to  call  SNPs  and  manipulate  VCF  (variant  call  format)  files  –  Call  SNPs  and  small  indels  –  Annotate  and  subselect  entries  from  VCF  files  –  Query,  filter,  merge    …  VCF  files  –  h[ps://samtools.github.io/bcNools/bcNools.html    

•  Tabix  generates  indices  for  tab-­‐delimited  files  (eg  VCF)  –  h[p://www.htslib.org/doc/tabix.html  

Page 23: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

VCF Format (1)

Page 24: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

VCF Format (2)

Page 25: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Practical Part 3

•  Please  finish  part  D  +  E  •  ANer  this  we  will  have  some  concluding  remarks,  •  And  you  will  have  just  developed  your  first  basic  SNP  pipeline,  congrats!  

Page 26: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Some directions to go further …

•  Look  at  exis=ng  workflows  and  soNware  •  Bash  scripts  to  chain  your  commands  •  Divide  &  Conquer:  paralleliza=on  in  a  batch  queue    •  Basic  knowledge  of  a  scrip=ng  language,  eg.  python  

Page 27: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

A (real) Workflow for SNP Calling

Page 28: Hands-on Tutorial on SNP Calling - transPLANT · Hands-on Tutorial on SNP Calling Georg&Haberer& Manuel&Spannagl& PlantGenome& and&Systems&BiologyGroup/PGSB

Additional Popular SNP Callers

•  GATK:  Genome  Analysis  Toolkit    –  h[ps://www.broadins=tute.org/gatk/index.php  

•  soapSNP  –  h[p://soap.genomics.org.cn/soapsnp.html  

•  freebayes:  calls  on  pooled  data  possible  –  h[ps://github.com/ekg/freebayes  

•  varscan  –  h[p://varscan.sourceforge.net/  

•  Galaxy:  web-­‐based  &  local,  workflows  –  h[ps://usegalaxy.org/  

•  Commercial  products  like  CLS,  Golden  Helix