primary analysis tutorial depracated

Bioinforma)cs Primary Analysis Tutorial

Phil Richmond, PRA Dowell Lab

University of Colorado, Biofron)ers Ins)tute

Outline

•  Intro – Things that will be covered – Things that won’t be covered

•  Workflow •  Mapping with Bow)e •  File Conversion with Samtools •  Visualiza)on with IGV •  Extras

Sequencing

•  There are many different types of sequencing including 454, Illumina, SOLiD, IonTorrent, and more.

•  If you are interested in each type of sequencing…

Things that will be covered

•  The primary analysis that I will walk through is a “bare bones” analysis, meant to take your reads from Illumina sequencer to visualizer, as well as some organiza)onal prac)ces – Mapping (Bow)e/BWA) – File format conversion – Visualiza)on

Things that won’t be covered

•  Post/preprocessing steps that I’m leaving out include: –  FastX analysis of raw reads and adapter clipping, etc. –  PCR duplicate marking (Illumina) on raw reads –  Base Quality Score Recalibra)on (GATK) on mapped reads –  Local Realignment around indels on mapped reads

•  Any Secondary or Ter)ary analysis or scrip)ng techniques –  Secondary analysis by personal appt. –  Scrip)ng techniques by joining Dave Knox’s python class

Login to Tuxedo

•  Login with –X op)on to open X11 viewer. •  On a PC…see me for separate instruc)ons to pipe visualiza)on

•  ssh –X richmonp@tuxedo.colorado.edu

Working Directory

•  We will be working in /data/Tutorial/<Student> –  cd /data/Tutorial/Phil/

•  The necessary files for the tutorial are in /data/Tutorial/Files/ –  Parent113010.fa is the reference (e. coli) genome –  Parent120710.gff is the annota)on file –  Sample1_single.fastq is the reads file we are working with

Organiza)on

•  In your own directory (/data/Tutorial/<Student>/) create the following sub-‐directories: – Genome/

•  Keep the fasta and gff files here – Bow)e/

•  Keep the Bow)e alignments, and post-‐processing of bow)e alignments here

– Fastq/ •  Keep the raw fastq files here

Workflow Raw Reads (Fastq)

Mapped Reads (SAM)

Mapping (Bow)e)

Binary Mapped Reads (SORTED.BAM)

File Conversion (SAMTOOLS)

Visualiza)on (IGV)

Mapped Reads (SAM)

Mapping (Bow)e)

Visualiza)on (IGV)

Fastq file

•  File extension .fastq or .fq •  Example: @Read_iden)fier_and_flowcell_info ACGTCCGGTTNNN… + B$!?NP\\\[%&C…

•  For more info on ASCII encoding QV scores…go to wikipedia

Read ID Read Sequence Read QV ID Read QV Sequence

Mapped Reads (SAM)

Mapping (Bow)e)

Visualiza)on (IGV)

Mapping the Short Reads •  Taking each read and mapping it to a reference genome

– Bow)e

TGCATGCATGCATGCATGCATGCATGCATGCATGCAAAAAGCATGCATGCA

TGCATGAATGCAAAAAGCATGCA

Bow)e-‐Build Command

•  In order to map the reads to a genome, you must acquire the genome in the .fasta (.fa) format, and then index it.

•  bow)e-‐build -‐f <in.fasta> <out_prefix> – $bow)e-‐build SGDv4.fasta SGDv4_bow)e

Bow)e command

•  Now we map back to the reference we just indexed.

•  bow)e <reference_in.prefix> -‐q <in.fastq> -‐S <out.SAM> 2> <out.stderr> – $ bow)e /data/Tutorial/Phil/Genome/Bow)e_index/SGDv3_bow)e –q Sample1.fastq –S Sample1_ bow)e.sam 2> Sample1_bow)e.stderr

Sam File

•  Tab Delimited •  hup://genome.sph.umich.edu/wiki/SAM •  Open Example SAM

Mapped Reads (SAM)

Mapping (Bow)e)

Visualiza)on (IGV)

Samtools Commands

•  samtools view –bS <in.sam> -‐o <out.bam> – $samtools view –bS Sample1_bow)e.sam –o Sample1_bow)e.bam

•  samtools sort <in.bam> <out.sorted> – $samtools sort Sample1_bow)e.bam Sample1_bow)e.sorted

•  samtools index <in.sorted.bam> – $samtools index Sample1_bow)e.sorted.bam

Mapped Reads (SAM)

Mapping (Bow)e)

Visualiza)on (IGV)

•  Located at /data2/IGV/ •  Several different versions available, recommend either:

•  /data2/IGV/IGV_2.1.19/igv.jar •  /data2/IGV/IGV_1.5.64/igv.jar

•  To run IGV: –  java –Xmx5g –jar <igv.jar>

•  $java –Xmx5g –jar /data2/IGV/IGV_1.5.64/igv.jar &

IGV: Crea)ng a genome

•  Reference Instruc)ons on sheet.

Bow)e and Bfast IGV

Advantages to Bfast Gapped Mapping

Bfast Mapping Loosely

If you are gexng the hang of it quickly…

•  Try going through the next few commands

BWA Paired end •  /usr/local/src/bwa-‐0.6.2/bwa index –a is –f <in.fasta>

•  Map each read in the pair independently •  /usr/local/src/bwa-‐0.6.2/bwa aln <reference.prefix> <in_1.fq> > <out.sai>

•  Finalize the mapping by conver)ng (for both reads) both the .SAI and the .FQ into a final SAM alignment:

•  /usr/local/src/bwa-‐0.6.2/bwa sampe <reference.prefix> <in_1.sai> <in_2.sai> <in_1.fq> <in_2.fq> > <out_paired.sam>

Bow)e Unique Mapping

•  Inves)gate the different Bow)e op)ons: – Look at –m (number of mappings per read), -‐v (number of mismatches per seed)

TopHat Spliced Mapping

•  /usr/local/src/tophat-‐2.0.4.Linux_x86_64/tophat –G <in.gff> -‐o <output_directory> <bow)e_index> <in.fastq>

The end…for now.

primary analysis tutorial depracated

Science

tutorial inventor 2009 stress analysis

tutorial - processo di risk analysis nella uni en iso 9001...

laboratory science analysis of human cytomegalovirus...

a p guide to performing wide-area coordination analysis ·...

ansys tutorial { 2-d fracture analysis ansys release...

phase analysis and structure refinement – a tutorial part...

pre-primary hello pre-primary!

analysis of the primary presenting symptoms and

data analysis in hardware a tutorial on vhdl and fpgas

tutorial analysis services

software modeling analysis -...

tutorial 08 probabilistic analysis - rocscience.com...

csci 3160 design and analysis of algorithms tutorial 1...

comparative analysis of delivery of primary eye care in...

calculation of thermal power plant steam boiler for analysis...

multiresolution analysis & wavelets (quick tutorial)

a tutorial on multivariate statistical...

engi 7706/7934: finite element analysis abaqus cae...

tutorial solidworks stress analysis pada rangka meja

primary data reduction and analysis - embl hamburg