20140711 3 t_clark_ercc2.0_workshop

Post on 17-Jul-2015

164 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FIND MEANING IN COMPLEXITY © Copyright 2012 by Pacific Biosciences of California, Inc. All rights reserved.

Tyson Clark 7/11/14

Single Molecule, Real-Time Sequencing of Full-length cDNA Transcripts

Single Molecule, Real-Time (SMRT) DNA Sequencing

PacBio RS II

P5-C3 Sequencing Chemistry

Transcript Diversity

Current State of Transcript Assembly

“The  way  we  do  RNA-seq now is…  you  take  the  transcriptome,  you blow it up into pieces and then you try to figure out how they all go back together again…    If  you  think  about  it,  it’s  kind of a crazy way to do things”.

Michael Snyder Stanford University

Tal Nawy (2013) End-to-end RNA sequencing,

Nature Methods 10: 1144–1145

Ian Korf (2013) Genomics: the state of the art in RNA-seq analysis. Nature Methods 10: 1165-1166.

PacBio Iso-Seq for High-quality, Full-length Transcripts

PolyA mRNA AAAAA

AAAAA

AAAAA

AAAAA

cDNA synthesis with adapters

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

AAAAA TTTTT

Size partitioning & PCR amplification

SMRTbell ligation

PacBio RS II Sequencing

Experimental Pipeline

Informatics Pipeline

Remove adapters Remove artifacts

Clean sequence

reads

Reads clustering

Isoform clusters

Consensus calling

Nonredundant transcript isoforms

Quality filtering

Final isoforms PacBio raw sequence

reads

Raw 5’  primer 3’  primer

Map to reference genome

Experimental pipeline Informatics pipeline

PacBio raw sequence reads

Figure 1

a b

AAAA

AAAA

AAAAAAAAAA

AAAAAAAAAAAAAAA

Size partitioning &PCR amplification

cDNA synthesiswith adapters

SMRTbell ligation

RS sequencing

Remove adaptersRemove artifacts

Reads clustering

Quality filtering

Cleansequence reads

Nonredundant transcript isoforms

Final isoforms

TTTT

TTTT

Consensus calling

Isoform clusters

Map to reference genome

Evidence-based gene models

polyA mRNA

AAAA

AAAA

TTTT

TTTT

AAAATTTT

AAAATTTT

AAAATTTT

AAAATTTT

Evidenced-based gene models

(AAA)n

(TTT)n

SMRT adapter

1 2 3 4 5

6 7 8 9 10

(TTT)n

(AAA)n

5’  UTR Coding sequence 3’  

UTR polyA

tail

SMRT adapter

https://github.com/PacificBiosciences/cDNA_primer/

(AAA)n Reads of Insert (AAA)n

Detailed Clontech workflow for conversion of cDNA into SMRTbell libraries

7

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length 1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

1-2 kb

2-3 kb

3-6 kb

Size Selection (Blue Pippin or Gel)

1-2 kb

2-3 kb

3-6 kb

Re-Amplification

1-2 kb

2-3 kb

3-6 kb

SMRTbell Template Preparation

1-2 kb

2-3 kb

3-6 kb

SMRT Sequencing

3-6 kb

Optional Size Selection (Blue Pippin)

Brain Amplified cDNA – Testing PCR Enzymes

8

Phusion Kapa Hifi SeqAmp

Brain Amplified cDNA (zoom)

9

Phusion Kapa Hifi SeqAmp

2nd Amplification (after Blue Pippin size selection)

10

4000

2000

1250

800 500

Brain

1-2

kb

2-3

kb

3-6

kb

5-10

kb

6-10

kb

8-12

kb

10-1

5 kb

Kapa Polymerase

2nd Amplification (after Blue Pippin size selection)

11

4000

2000

1250

800 500

Heart

1-2

kb

2-3

kb

3-6

kb

5-10

kb

8-12

kb

Liver

1-2

kb

2-3

kb

3-6

kb

5-10

kb

Kapa Polymerase

Amplified cDNA from Multiple Human Tissues

12

Brain Heart Liver

SageELF

13

Brain Amplifed cDNA – Size Selected

14

M 12 11 10 9 8 7 6 5 4 3 2 1 800-

1600

1600

-270

0

2700

-480

0

4800

-800

0

3000 1500

800 500 300

100

SageELF BluePippin

Kapa Polymerase

15

SageELF – 12 size bins (Amplified cDNA)

SageELF – 12 size bins (Amplified cDNA)

16

Brain cDNA – ELF Size Selected – 2nd Amplification

17

Actual FL Lengths from each ELF Fraction

18

ELF 12 (400 bp) Actual: 181 - 266 bp

ELF 11 (550 bp) Actual: 370 - 480 bp

ELF 10 (800 bp) Actual: 617 – 727 bp

(25 percentile – 75 percentile)

Actual FL Lengths from each ELF Fraction

19

ELF 9 (1.2 kb) Actual: 955 – 1113 bp

ELF 8 (1.5 kb) Actual: 1355 – 1544 bp

ELF 7 (1.8 kb) Actual: 1800 – 2033 bp

Actual FL Lengths from each ELF Fraction

20

ELF 6 (2.5 kb) Actual: 2398 – 2737 bp

ELF 5 (3 kb) Actual: 3193 – 3574 bp

ELF 4 (4 kb) Actual: 2127 – 4664 bp

Actual FL Lengths from each ELF Fraction

21

ELF 3 (5.5 kb) Actual: 1342 – 6075 bp

ELF 2 (7 kb) Actual: 1229 – 7446 bp

ELF 1 (9 kb) 180 min Actual: 1295 – 1814 bp

Actual FL Lengths from each ELF Fraction

Summarizing ELF for Size Selection

ELF Lane # Actual FL range ELF12-400bp 181 - 266 bp

ELF11-500bp 370 - 480 bp

ELF10-800bp 617 - 727 bp

ELF9-1.2kb 955 - 1113 bp

ELF8-1.5kb 1355 - 1544 bp

ELF7-1.8kb 1800 - 2033 bp

ELF6-2.5kb 2398 - 2737 bp

ELF5-3kb 3193 - 3574 bp

ELF4-4kb 2127 - 4664 bp

ELF3-5.5kb 1342 - 6075 bp

ELF2-7kb 1229 - 7446 bp

ELF1-9kb 1295 - 1814 bp

The Good: 1. One run, 12 fractions 2. Finer size fractions (~ 200 bp) 3. 100 bp – 10 kb spread

The Not-Good-Yet: 1. > 4 kb gets small inserts competing To Work On: 1. New beta machine 2. Combining fractions

Targeted Sequencing

24

Targeted Sequencing

25

Targeted Sequencing

26

ERCC 2.0 Controls (from the PacBio perspective)

• Long Transcripts (>10kb, if possible)

• Transcript Isoforms that span size bins

• Complex alternative splicing patterns

• Diversity of GC contents

27

Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, and SMRTbell are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.

top related