primary analysis tutorial depracated
TRANSCRIPT
![Page 1: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/1.jpg)
Bioinforma)cs Primary Analysis Tutorial
Phil Richmond, PRA Dowell Lab
University of Colorado, Biofron)ers Ins)tute
![Page 2: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/2.jpg)
Outline
• Intro – Things that will be covered – Things that won’t be covered
• Workflow • Mapping with Bow)e • File Conversion with Samtools • Visualiza)on with IGV • Extras
![Page 3: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/3.jpg)
Sequencing
• There are many different types of sequencing including 454, Illumina, SOLiD, IonTorrent, and more.
• If you are interested in each type of sequencing…
![Page 4: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/4.jpg)
Things that will be covered
• The primary analysis that I will walk through is a “bare bones” analysis, meant to take your reads from Illumina sequencer to visualizer, as well as some organiza)onal prac)ces – Mapping (Bow)e/BWA) – File format conversion – Visualiza)on
![Page 5: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/5.jpg)
Things that won’t be covered
• Post/preprocessing steps that I’m leaving out include: – FastX analysis of raw reads and adapter clipping, etc. – PCR duplicate marking (Illumina) on raw reads – Base Quality Score Recalibra)on (GATK) on mapped reads – Local Realignment around indels on mapped reads
• Any Secondary or Ter)ary analysis or scrip)ng techniques – Secondary analysis by personal appt. – Scrip)ng techniques by joining Dave Knox’s python class
![Page 6: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/6.jpg)
Login to Tuxedo
• Login with –X op)on to open X11 viewer. • On a PC…see me for separate instruc)ons to pipe visualiza)on
• ssh –X [email protected]
![Page 7: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/7.jpg)
Working Directory
• We will be working in /data/Tutorial/<Student> – cd /data/Tutorial/Phil/
• The necessary files for the tutorial are in /data/Tutorial/Files/ – Parent113010.fa is the reference (e. coli) genome – Parent120710.gff is the annota)on file – Sample1_single.fastq is the reads file we are working with
![Page 8: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/8.jpg)
Organiza)on
• In your own directory (/data/Tutorial/<Student>/) create the following sub-‐directories: – Genome/
• Keep the fasta and gff files here – Bow)e/
• Keep the Bow)e alignments, and post-‐processing of bow)e alignments here
– Fastq/ • Keep the raw fastq files here
![Page 9: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/9.jpg)
Workflow Raw Reads (Fastq)
Mapped Reads (SAM)
Mapping (Bow)e)
Binary Mapped Reads (SORTED.BAM)
File Conversion (SAMTOOLS)
Visualiza)on (IGV)
![Page 10: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/10.jpg)
Workflow Raw Reads (Fastq)
Mapped Reads (SAM)
Mapping (Bow)e)
Binary Mapped Reads (SORTED.BAM)
File Conversion (SAMTOOLS)
Visualiza)on (IGV)
![Page 11: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/11.jpg)
Fastq file
• File extension .fastq or .fq • Example: @Read_iden)fier_and_flowcell_info ACGTCCGGTTNNN… + B$!?NP\\\[%&C…
• For more info on ASCII encoding QV scores…go to wikipedia
Read ID Read Sequence Read QV ID Read QV Sequence
![Page 12: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/12.jpg)
Workflow Raw Reads (Fastq)
Mapped Reads (SAM)
Mapping (Bow)e)
Binary Mapped Reads (SORTED.BAM)
File Conversion (SAMTOOLS)
Visualiza)on (IGV)
![Page 13: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/13.jpg)
Mapping the Short Reads • Taking each read and mapping it to a reference genome
– Bow)e
TGCATGCATGCATGCATGCATGCATGCATGCATGCAAAAAGCATGCATGCA
TGCATGAATGCAAAAAGCATGCA
![Page 14: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/14.jpg)
Bow)e-‐Build Command
• In order to map the reads to a genome, you must acquire the genome in the .fasta (.fa) format, and then index it.
• bow)e-‐build -‐f <in.fasta> <out_prefix> – $bow)e-‐build SGDv4.fasta SGDv4_bow)e
![Page 15: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/15.jpg)
Bow)e command
• Now we map back to the reference we just indexed.
• bow)e <reference_in.prefix> -‐q <in.fastq> -‐S <out.SAM> 2> <out.stderr> – $ bow)e /data/Tutorial/Phil/Genome/Bow)e_index/SGDv3_bow)e –q Sample1.fastq –S Sample1_ bow)e.sam 2> Sample1_bow)e.stderr
![Page 16: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/16.jpg)
Sam File
• Tab Delimited • hup://genome.sph.umich.edu/wiki/SAM • Open Example SAM
![Page 17: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/17.jpg)
Workflow Raw Reads (Fastq)
Mapped Reads (SAM)
Mapping (Bow)e)
Binary Mapped Reads (SORTED.BAM)
File Conversion (SAMTOOLS)
Visualiza)on (IGV)
![Page 18: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/18.jpg)
Samtools Commands
• samtools view –bS <in.sam> -‐o <out.bam> – $samtools view –bS Sample1_bow)e.sam –o Sample1_bow)e.bam
• samtools sort <in.bam> <out.sorted> – $samtools sort Sample1_bow)e.bam Sample1_bow)e.sorted
• samtools index <in.sorted.bam> – $samtools index Sample1_bow)e.sorted.bam
![Page 19: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/19.jpg)
Workflow Raw Reads (Fastq)
Mapped Reads (SAM)
Mapping (Bow)e)
Binary Mapped Reads (SORTED.BAM)
File Conversion (SAMTOOLS)
Visualiza)on (IGV)
![Page 20: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/20.jpg)
IGV
• Located at /data2/IGV/ • Several different versions available, recommend either:
• /data2/IGV/IGV_2.1.19/igv.jar • /data2/IGV/IGV_1.5.64/igv.jar
• To run IGV: – java –Xmx5g –jar <igv.jar>
• $java –Xmx5g –jar /data2/IGV/IGV_1.5.64/igv.jar &
![Page 21: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/21.jpg)
IGV: Crea)ng a genome
• Reference Instruc)ons on sheet.
![Page 22: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/22.jpg)
Bow)e and Bfast IGV
Bow$e
Bfast
Gene
![Page 23: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/23.jpg)
Advantages to Bfast Gapped Mapping
Bow$e
Bfast
Gene
![Page 24: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/24.jpg)
Bfast Mapping Loosely
Bow$e
Bfast
Gene
![Page 25: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/25.jpg)
If you are gexng the hang of it quickly…
• Try going through the next few commands
![Page 26: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/26.jpg)
BWA Paired end • /usr/local/src/bwa-‐0.6.2/bwa index –a is –f <in.fasta>
• Map each read in the pair independently • /usr/local/src/bwa-‐0.6.2/bwa aln <reference.prefix> <in_1.fq> > <out.sai>
• Finalize the mapping by conver)ng (for both reads) both the .SAI and the .FQ into a final SAM alignment:
• /usr/local/src/bwa-‐0.6.2/bwa sampe <reference.prefix> <in_1.sai> <in_2.sai> <in_1.fq> <in_2.fq> > <out_paired.sam>
![Page 27: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/27.jpg)
Bow)e Unique Mapping
• Inves)gate the different Bow)e op)ons: – Look at –m (number of mappings per read), -‐v (number of mismatches per seed)
![Page 28: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/28.jpg)
TopHat Spliced Mapping
• /usr/local/src/tophat-‐2.0.4.Linux_x86_64/tophat –G <in.gff> -‐o <output_directory> <bow)e_index> <in.fastq>
![Page 29: Primary analysis tutorial depracated](https://reader033.vdocuments.pub/reader033/viewer/2022052606/5871d0db1a28ab423c8b5a41/html5/thumbnails/29.jpg)
The end…for now.