eve161 lecture 2

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

EVE 161:Microbial Phylogenomics

!Lecture #2:

Evolution of Sequencing !

UC Davis, Winter 2014 Instructor: Jonathan Eisen

!1


Where we are going and where we have been

• Previous lecture: !1. Introduction

• Current Lecture: !2. Sequencing Evolution

• Next Lecture: !3. Era I: The Tree of Life

!2


Main Paper

!3


ANRV353-GG09-20 ARI 25 July 2008 14:57

Next-Generation DNASequencing MethodsElaine R. MardisDepartments of Genetics and Molecular Microbiology and Genome Sequencing Center,Washington University School of Medicine, St. Louis MO 63108; email: [email protected]

Annu. Rev. Genomics Hum. Genet. 2008.9:387–402

First published online as a Review in Advance onJune 24, 2008

The Annual Review of Genomics and Human Geneticsis online at genom.annualreviews.org

This article’s doi:10.1146/annurev.genom.9.081307.164359

Copyright c⃝ 2008 by Annual Reviews.All rights reserved

1527-8204/08/0922-0387$20.00

Key Wordsmassively parallel sequencing, sequencing-by-synthesis, resequencing

AbstractRecent scientific discoveries that resulted from the application of next-generation DNA sequencing technologies highlight the striking impactof these massively parallel platforms on genetics. These new meth-ods have expanded previously focused readouts from a variety of DNApreparation protocols to a genome-wide scale and have fine-tuned theirresolution to single base precision. The sequencing of RNA also hastransitioned and now includes full-length cDNA analyses, serial analysisof gene expression (SAGE)-based methods, and noncoding RNA dis-covery. Next-generation sequencing has also enabled novel applicationssuch as the sequencing of ancient DNA samples, and has substantiallywidened the scope of metagenomic analysis of environmentally derivedsamples. Taken together, an astounding potential exists for these tech-nologies to bring enormous change in genetic and biological researchand to enhance our fundamental biological knowledge.

387

Click here for quick links to Annual Reviews content online, including:

• Other articles in this volume• Top cited articles• Top downloaded articles• Our comprehensive search

FurtherANNUALREVIEWS

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.

ANRV353-GG09-20 ARI 25 July 2008 14:57

Next-Generation DNASequencing MethodsElaine R. MardisDepartments of Genetics and Molecular Microbiology and Genome Sequencing Center,Washington University School of Medicine, St. Louis MO 63108; email: [email protected]

Annu. Rev. Genomics Hum. Genet. 2008.9:387–402

First published online as a Review in Advance onJune 24, 2008

The Annual Review of Genomics and Human Geneticsis online at genom.annualreviews.org

This article’s doi:10.1146/annurev.genom.9.081307.164359

Copyright c⃝ 2008 by Annual Reviews.All rights reserved

1527-8204/08/0922-0387$20.00

Key Wordsmassively parallel sequencing, sequencing-by-synthesis, resequencing

AbstractRecent scientific discoveries that resulted from the application of next-generation DNA sequencing technologies highlight the striking impactof these massively parallel platforms on genetics. These new meth-ods have expanded previously focused readouts from a variety of DNApreparation protocols to a genome-wide scale and have fine-tuned theirresolution to single base precision. The sequencing of RNA also hastransitioned and now includes full-length cDNA analyses, serial analysisof gene expression (SAGE)-based methods, and noncoding RNA dis-covery. Next-generation sequencing has also enabled novel applicationssuch as the sequencing of ancient DNA samples, and has substantiallywidened the scope of metagenomic analysis of environmentally derivedsamples. Taken together, an astounding potential exists for these tech-nologies to bring enormous change in genetic and biological researchand to enhance our fundamental biological knowledge.

387

Click here for quick links to Annual Reviews content online, including:

• Other articles in this volume• Top cited articles• Top downloaded articles• Our comprehensive search

FurtherANNUALREVIEWS

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.

!4


Many other papers of interest

• http://www.microbialinformaticsj.com/content/2/1/3/

• http://www.hindawi.com/journals/bmri/2012/251364/abs/

• http://m.cancerpreventionresearch.aacrjournals.org/content/5/7/887.full

!5

http://m.cancerpreventionresearch.aacrjournals.org/content/5/7/887.full


Sequencing Technology

!6


Key Issues

• Cost / bp • Read length • Paired end • Ease of feeding • Error profiles • Barcoding potential

!7


Timeline

!8


TimelineApproaching to NGS

Discovery of DNA structure(Cold Spring Harb. Symp. Quant. Biol. 1953;18:123-31)

1953

Sanger sequencing method by F. Sanger(PNAS ,1977, 74: 560-564)

1977

PCR by K. Mullis(Cold Spring Harb Symp Quant Biol. 1986;51 Pt 1:263-73)

1983

Development of pyrosequencing(Anal. Biochem., 1993, 208: 171-175; Science ,1998, 281: 363-365)

1993

1980

1990

2000

2010

Single molecule emulsion PCR 1998

Human Genome Project(Nature , 2001, 409: 860–92; Science, 2001, 291: 1304–1351)

Founded 454 Life Science 2000

454 GS20 sequencer(First NGS sequencer) 2005

Founded Solexa 1998

Solexa Genome Analyzer(First short-read NGS sequencer) 2006

GS FLX sequencer(NGS with 400-500 bp read lenght) 2008

Hi-Seq2000(200Gbp per Flow Cell) 2010

Illumina acquires Solexa(Illumina enters the NGS business) 2006

ABI SOLiD(Short-read sequencer based upon ligation) 2007

Roche acquires 454 Life Sciences(Roche enters the NGS business) 2007

NGS Human Genome sequencing(First Human Genome sequencing based upon NGS technology) 2008

From Slideshare presentation of Cosentino Cristian http://www.slideshare.net/cosentia/high-throughput-equencing

!9

http://www.slideshare.net/cosentia/high-throughput-equencing


TimelineApproaching to NGS

Discovery of DNA structure(Cold Spring Harb. Symp. Quant. Biol. 1953;18:123-31)

1953

Sanger sequencing method by F. Sanger(PNAS ,1977, 74: 560-564)

1977

PCR by K. Mullis(Cold Spring Harb Symp Quant Biol. 1986;51 Pt 1:263-73)

1983

Development of pyrosequencing(Anal. Biochem., 1993, 208: 171-175; Science ,1998, 281: 363-365)

1993

1980

1990

2000

2010

Single molecule emulsion PCR 1998

Human Genome Project(Nature , 2001, 409: 860–92; Science, 2001, 291: 1304–1351)

Founded 454 Life Science 2000

454 GS20 sequencer(First NGS sequencer) 2005

Founded Solexa 1998

Solexa Genome Analyzer(First short-read NGS sequencer) 2006

GS FLX sequencer(NGS with 400-500 bp read lenght) 2008

Hi-Seq2000(200Gbp per Flow Cell) 2010

Illumina acquires Solexa(Illumina enters the NGS business) 2006

ABI SOLiD(Short-read sequencer based upon ligation) 2007

Roche acquires 454 Life Sciences(Roche enters the NGS business) 2007

NGS Human Genome sequencing(First Human Genome sequencing based upon NGS technology) 2008 Miseq

Roche Jr Ion Torrent PacBio Oxford


!10



Generation I: Manual

!11


Maxam-Gilbert Sequencing

!12



http://www.pnas.org/content/74/2/560.full.pdf

!13

http://www.pnas.org/content/74/2/560.full.pdf



!14



!15


Sanger Sequencing

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC431765/

!16

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC431765/


PhiX174

!17


Sanger Sequencing

!18


Sanger Sequencing

!19


Nobel Prize 1980

!20


Generation II: Automation

!21


Automation ISanger method with labeled dNTPs

The Sanger mehtods is based on the idea that inhibitors can terminate elongation of DNA at specific points

-

Roche 454

ABi SOLiD

HeliScope

Nanopore

Sanger method

Illumina GAII

!22


Sanger method with labeled dNTPsThe Sanger mehtods is based on the idea that inhibitors can

terminate elongation of DNA at specific points -

Roche 454

ABi SOLiD

HeliScope

Nanopore

Sanger method

Illumina GAII

Automation I

!23


Automation 2

!24


Automated Sanger Highlights

• 1991: ESTs by Venter • 1995: Haemophilus influenzae genome • 1996: Yeast, archaea genomes • 1999: Drosophila genome • 2000: Arabidopsis genome • 2000: Human genome • 2004: Shotgun metagenomics

!25


Generation III: Clusters not clones

!26


Generation III: Clusters not clones

!27


Next Gen Sequencing OutlineNext-generation sequencing platforms

Isolation and purification of target DNA

Sample preparation

Library validation

Cluster generationon solid-phase Emulsion PCR

Sequencing by synthesis with 3’-blocked reversible

terminatorsPyrosequencing Sequencing by ligation

Single colour imaging

Sequencing by synthesis with 3’-unblocked reversible

terminators

Am

plifi

catio

nSe

quen

cing

Imag

ing

Four colour imaging

Data analysis

Roche 454Illumina GAII ABi SOLiD Helicos HeliScope


!28



454

!29


454Pyrosequencing

Sanger method

-

ABi SOLiD

HeliScope

Nanopore

Roche 454

Illumina GAII


!30



Roche 454Pyrosequencing

Sanger method

-

ABi SOLiD

HeliScope

Nanopore

Roche 454

Illumina GAII


!31



454 -> Roche• 1st “Next-generation” sequencing

system to become commercially available, in 2004

• Uses pyrosequencing • Polymerase incorporates nucleotide • Pyrophosphate released • Eventually light from luciferase released

• Three main steps in 454 method • Library prep • Emulsion PCR • Sequencing

!32


454

!33


454 Step1: Library Prep

ANRV353-GG09-20 ARI 25 July 2008 14:57

Anneal sstDNA to an excess ofDNA capture beads

Emulsify beads and PCRreagents in water-in-oilmicroreactors

Clonal amplification occursinside microreactors

Break microreactors andenrich for DNA-positivebeads

Amplified sstDNA library beads Quality filtered bases

a

b

c

DNA library preparation

Emulsion PCR

Sequencing

A

A

A B

B

B

4.5 hours

8 hours

7.5 hours

Ligation

Selection(isolate ABfragmentsonly)

•Genome fragmented by nebulization•No cloning; no colony picking•sstDNA library created with adaptors•A/B fragments selected using avidin-biotin purification

gDNA sstDNA library

sstDNA library Bead-amplified sstDNA library

•Well diameter: average of 44 µm•400,000 reads obtained in parallel•A single cloned amplified sstDNA bead is deposited per well

390 Mardis

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.

gDNA fragmented by nebulization or sonication

Fragments are end- repaired and ligated to adaptors containing universal priming sites

Fragments are denatured and AB ssDNA are selected by avidin/biotin purification (ssDNA library)

From Mardis 2008. Annual Rev. Genetics 9: 387.

!34


454 Step 2: Emulsion PCR

ANRV353-GG09-20 ARI 25 July 2008 14:57






a

b

c


Emulsion PCR

Sequencing

A

A

A B

B

B

4.5 hours

8 hours

7.5 hours

Ligation



gDNA sstDNA library



390 Mardis

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.


!35


454 Step 3: Sequencing

ANRV353-GG09-20 ARI 25 July 2008 14:57






a

b

c


Emulsion PCR

Sequencing

A

A

A B

B

B

4.5 hours

8 hours

7.5 hours

Ligation



gDNA sstDNA library



390 Mardis

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.


!36


Pyrosequencing

44 µm

Pyrosequecning

Reads are recorded as flowgrams

Annu. Rev. Genomics Hum. Genet., 2008, 9: 387-402Nature Reviews genetics, 2010, 11: 31-46

Sanger method

-

ABi SOLiD

HeliScope

Nanopore

Roche 454

Illumina GAII


454 Step 3: Sequencing

!37



454 Key Issues• Number of repeated nucleotides

estimated by amount of light ... many errors

• Reasonable number of failures in EM-PCR and other steps

• Systems • GS20 • FLX • FLX Titanium • FLX Titanium XL • Junior

!38

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !39

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014XDISCONTINUED

!40


Solexa

!41


SolexaSequecning by synthesis with reversible terminator

Sanger method

Roche 454

ABi SOLiD

HeliScope

Nanopore

Illumina GAII

-


!42



IlluminaSequecning by synthesis with reversible terminator

Sanger method

Roche 454

ABi SOLiD

HeliScope

Nanopore

Illumina GAII

-


!43



Illumina

• Sequencing by synthesis • 1st system released in 2006 by Solexa • Acquired by Illumina in 2007 • Systems

• GA • GAII • HiSeqs • HiScan • MiSeq

!44


AccessoriesCluster station

Genome Analyzer IIxPaired-end module Linux server

Bioanalyzer 2100

Instrumentation

Sample preparation

Clusters amplification

Sequencing by synthesis

Analysis pipeline

Introduction

Illumina GAII

High throughput

!45


Illumina

!46


AC06CH13-Mardis ARI 13 May 2013 16:4

DNA fragments

Blunting by fill-inand exonuclease

Addition ofA-overhang

Phosphorylation

Ligationto adapters

P

P P

P

PP

A

A

c

A

A

A

T

T

T

G

G

GC

C

C

3'

5'

5'

Emission

Excitation

IncorporateDetect

DeblockCleave fluor

5'

3'

DNA NHN

O

OH free 3' end

OOO

x

3'

NHN

O

Block

Cleavagesite

Fluor

OOPPP

a Illumina’s library-preparation work flow b

Graftedflow cell

Templatehybridization

Initialextension

Denaturation

OH OH

P7 P5

Firstdenaturation

First cycleannealing

First cycleextension

Second cycledenaturation

Clusteramplification

P5linearization

Block withddNTPs

Denaturation andsequencing primer

hybridization


Second cycleannealing

Second cycleextension

n = 35total

Figure 3(a) Illumina R⃝ library-construction process. (b) Illumina cluster generation by bridge amplification. (c) Sequencing by synthesis withreversible dye terminators.

synthesized strands by denaturation and regenerates the clusters by performing a limited bridgeamplification to improve the signal-to-noise ratio in the second read. After the amplification step,the opposite ends of the fragments are released from the flow cell surfaces by a different chemicalcleavage reagent (corresponding to a labile group on the reverse adapter), and the fragments areprimed with the reverse primer. Sequencing proceeds as described above. All of these steps occuron-instrument with the flow cell in place and without manual intervention, so the correlation ofposition from forward (first) to reverse (second) reads is maintained and yields a very high read-pairconcordance upon read alignment to the reference genome.

Illumina data have an error model that is described as having decreasing accuracy withincreasing nucleotide addition steps. When errors occur, they are predominantly substitutionerrors, in which an incorrect nucleotide identity is assigned to the base. The error percentageof most Illumina reads is approximately 0.5% at best (i.e., 1 error in 200 bases). Sources of noiseinclude (a) phasing, wherein increasing numbers of fragments fall out of phase with the majority

www.annualreviews.org • Next-Generation Sequencing Platforms 295

Ann

ual R

evie

w o

f Ana

lytic

al C

hem

istry

201

3.6:

287-

303.

Dow

nloa

ded

from

ww

w.a

nnua

lrevi

ews.o

rgby

Uni

vers

ity o

f Cal

iforn

ia -

Dav

is o

n 01

/09/

14. F

or p

erso

nal u

se o

nly.

Illumina

!47


Illumina Flow CellFlow cell

Sample preparation



Analysis pipeline

Introduction

Illumina GAII

High throughput

From Slideshare presentation of Cosentino Cristian http://www.slideshare.net/cosentia/high-throughput-equencing !48




DNA fragments



Phosphorylation

Ligationto adapters

P

P P

P

PP

A

A

c

A

A

A

T

T

T

G

G

GC

C

C

3'

5'

5'

Emission

Excitation

IncorporateDetect

DeblockCleave fluor

5'

3'

DNA NHN

O

OH free 3' end

OOO

x

3'

NHN

O

Block

Cleavagesite

Fluor

OOPPP


Graftedflow cell


Initialextension

Denaturation

OH OH

P7 P5

Firstdenaturation





P5linearization

Block withddNTPs


hybridization




n = 35total





Ann

ual R

evie

w o

f Ana

lytic

al C

hem

istry

201

3.6:

287-

303.

Dow

nloa

ded

from

ww

w.a

nnua

lrevi

ews.o

rgby

Uni

vers

ity o

f Cal

iforn

ia -

Dav

is o

n 01

/09/

14. F

or p

erso

nal u

se o

nly.

Illumina

!49


Illumina Step 1: Attach DNA

ANRV353-GG09-20 ARI 25 July 2008 14:57

to prepare each strand for the next incorpora-tion by DNA polymerase. This series of stepscontinues for a specific number of cycles, as de-termined by user-defined instrument settings,which permits discrete read lengths of 25–35

bases. A base-calling algorithm assigns se-quences and associated quality values to eachread and a quality checking pipeline evaluatesthe Illumina data from each run, removingpoor-quality sequences.

Adapter

DNA fragment

Dense lawnof primers

Adapter

Attached

DNA

Adapters

Prepare genomic DNA sampleRandomly fragment genomic DNAand ligate adapters to both ends ofthe fragments.

Attach DNA to surfaceBind single-stranded fragmentsrandomly to the inside surfaceof the flow cell channels.

Bridge amplificationAdd unlabeled nucleotidesand enzyme to initiate solid-phase bridge amplification.

Denature the doublestranded molecules

Nucleotides

a

Figure 2The Illumina sequencing-by-synthesis approach. Cluster strands created by bridge amplification are primed and all four fluorescentlylabeled, 3′-OH blocked nucleotides are added to the flow cell with DNA polymerase. The cluster strands are extended by onenucleotide. Following the incorporation step, the unused nucleotides and DNA polymerase molecules are washed away, a scan buffer isadded to the flow cell, and the optics system scans each lane of the flow cell by imaging units called tiles. Once imaging is completed,chemicals that effect cleavage of the fluorescent labels and the 3′-OH blocking groups are added to the flow cell, which prepares thecluster strands for another round of fluorescent nucleotide incorporation.

392 Mardis

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.

The Illumina sequencing-by-synthesis approach. Cluster strands created by bridge amplification are primed and all four fluorescently labeled, 3′-OH blocked nucleotides are added to the flow cell with DNA polymerase. The cluster strands are extended by one nucleotide. Following the incorporation step, the unused nucleotides and DNA polymerase molecules are washed away, a scan buffer is added to the flow cell, and the optics system scans each lane of the flow cell by imaging units called tiles. Once imaging is completed, chemicals that effect cleavage of the fluorescent labels and the 3′-OH blocking groups are added to the flow cell, which prepares the cluster strands for another round of fluorescent nucleotide incorporation


!50


Illumina Step 2: Bridge PCR

ANRV353-GG09-20 ARI 25 July 2008 14:57

to prepare each strand for the next incorpora-tion by DNA polymerase. This series of stepscontinues for a specific number of cycles, as de-termined by user-defined instrument settings,which permits discrete read lengths of 25–35

bases. A base-calling algorithm assigns se-quences and associated quality values to eachread and a quality checking pipeline evaluatesthe Illumina data from each run, removingpoor-quality sequences.

Adapter

DNA fragment

Dense lawnof primers

Adapter

Attached

DNA

Adapters

Prepare genomic DNA sampleRandomly fragment genomic DNAand ligate adapters to both ends ofthe fragments.

Attach DNA to surfaceBind single-stranded fragmentsrandomly to the inside surfaceof the flow cell channels.

Bridge amplificationAdd unlabeled nucleotidesand enzyme to initiate solid-phase bridge amplification.

Denature the doublestranded molecules

Nucleotides

a

Figure 2The Illumina sequencing-by-synthesis approach. Cluster strands created by bridge amplification are primed and all four fluorescentlylabeled, 3′-OH blocked nucleotides are added to the flow cell with DNA polymerase. The cluster strands are extended by onenucleotide. Following the incorporation step, the unused nucleotides and DNA polymerase molecules are washed away, a scan buffer isadded to the flow cell, and the optics system scans each lane of the flow cell by imaging units called tiles. Once imaging is completed,chemicals that effect cleavage of the fluorescent labels and the 3′-OH blocking groups are added to the flow cell, which prepares thecluster strands for another round of fluorescent nucleotide incorporation.

392 Mardis

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.

The Illumina sequencing-by-synthesis approach. Cluster strands created by bridge amplification are primed and all four fluorescently labeled, 3′-OH blocked nucleotides are added to the flow cell with DNA polymerase. The cluster strands are extended by one nucleotide. Following the incorporation step, the unused nucleotides and DNA polymerase molecules are washed away, a scan buffer is added to the flow cell, and the optics system scans each lane of the flow cell by imaging units called tiles. Once imaging is completed, chemicals that effect cleavage of the fluorescent labels and the 3′-OH blocking groups are added to the flow cell, which prepares the cluster strands for another round of fluorescent nucleotide incorporation


!51


Clusters

!52


Sequencing by Synthesis


DNA fragments



Phosphorylation

Ligationto adapters

P

P P

P

PP

A

A

c

A

A

A

T

T

T

G

G

GC

C

C

3'

5'

5'

Emission

Excitation

IncorporateDetect

DeblockCleave fluor

5'

3'

DNA NHN

O

OH free 3' end

OOO

x

3'

NHN

O

Block

Cleavagesite

Fluor

OOPPP


Graftedflow cell


Initialextension

Denaturation

OH OH

P7 P5

Firstdenaturation





P5linearization

Block withddNTPs


hybridization




n = 35total





Ann

ual R

evie

w o

f Ana

lytic

al C

hem

istry

201

3.6:

287-

303.

Dow

nloa

ded

from

ww

w.a

nnua

lrevi

ews.o

rgby

Uni

vers

ity o

f Cal

iforn

ia -

Dav

is o

n 01

/09/

14. F

or p

erso

nal u

se o

nly.

!53


Illumina Step 3: Sequencing

ANRV353-GG09-20 ARI 25 July 2008 14:57

Applied Biosystems SOLiDTM

SequencerThe SOLiD platform uses an adapter-ligatedfragment library similar to those of the othernext-generation platforms, and uses an emul-sion PCR approach with small magnetic beadsto amplify the fragments for sequencing. Un-like the other platforms, SOLiD uses DNA lig-ase and a unique approach to sequence the am-plified fragments, as illustrated in Figure 3a.Two flow cells are processed per instrumentrun, each of which can be divided to containdifferent libraries in up to four quadrants. Readlengths for SOLiD are user defined between25–35 bp, and each sequencing run yields be-tween 2–4 Gb of DNA sequence data. Once

the reads are base called, have quality values,and low-quality sequences have been removed,the reads are aligned to a reference genome toenable a second tier of quality evaluation calledtwo-base encoding. The principle of two-baseencoding is shown in Figure 3b, which illus-trates how this approach works to differenti-ate true single base variants from base-callingerrors.

Two key differences that speak to the utilityof next-generation sequence reads are (a) thelength of a sequence read from all current next-generation platforms is much shorter than thatfrom a capillary sequencer and (b) each next-generation read type has a unique error modeldifferent from that already established for

b

Laser

First chemistry cycle:determine first baseTo initiate the firstsequencing cycle, addall four labeled reversibleterminators, primers, andDNA polymerase enzymeto the flow cell.

Image of first chemistry cycleAfter laser excitation, capture the imageof emitted fluorescence from eachcluster on the flow cell. Record theidentity of the first base for each cluster.

Sequence read over multiple chemistry cyclesRepeat cycles of sequencing to determine the sequenceof bases in a given fragment a single base at a time.

Before initiating thenext chemistry cycleThe blocked 3' terminusand the fluorophorefrom each incorporatedbase are removed.

GCTGA...

Figure 2(Continued )

www.annualreviews.org • Next-Generation DNA Sequencing Methods 393

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.


!54


Illumina SBSSBS technology

Sample preparation



Analysis pipeline

Introduction

Illumina GAII

High throughput




Illumina

!56


ABI Solid

!57


ABI SolidSequecning by ligationSanger method

Roche 454

-

HeliScope

Nanopore

ABi SOLiD

Illumina GAII




ABI Solid DetailsANRV353-GG09-20 ARI 25 July 2008 14:57

A C G T

1st base

2nd base

ACGT

3'TAnnnzzz5'

3'TCnnnzzz5'

3'TGnnnzzz5'

3'TTnnnzzz5'

Cleavage site

Di base probesSOLiD™ substrate

3'TA

AT

Universal seq primer (n)

3'P1 adapter Template sequence

P OH

Universal seq primer (n–1)

Ligase

Phosphatase

+

1. Prime and ligate

2. Image

4. Cleave off fluor

5. Repeat steps 1–4 to extend sequence

3'

Universal seq primer (n–1)1. Melt off extended sequence

2. Primer reset3'

AA AC G

G GG

C C

C

T AA

A GG

CC

T T TT

6. Primer reset

7. Repeat steps 1–5 with new primer

8. Repeat Reset with , n–2, n–3, n–4 primers

TA

AT

AT3'

TA

AT3'

Excite Fluorescence

Cleavage agent

P

HO

TAAA AG AC AAATTT TC TG TT AC

TGCGGC 3'

3. Cap unextended strands

3'

PO4

1 2 3 4 5 6 7 ... (n cycles)Ligation cycle

3'

3'1 μmbead

1 μmbead

1 μmbead

–1


Universal seq primer (n)




3'

3'

3'

3'

3'

1

2

3

4

5

Primer round 1

Template

Primer round 2 1 base shift

Glass slide

3'5' Template sequence1 μmbead P1 adapter

Bridge probe

Bridge probe

Bridge probe

Read position

Indicates positions of interrogation

35 34 3332313029282726252423222120191817161514131211109876543210

Ligation cycle

Prim

er ro

und

1 2 3 4 5 6 7

a

394 Mardis

Ann

u. R

ev. G

enom

. Hum

an G

enet

. 200

8.9:

387-

402.

Dow

nloa

ded

from

arjo

urna

ls.a

nnua

lrevi

ews.o

rgby

Uni

vers

idad

Nac

iona

l Aut

onom

a de

Mex

ico

on 1

1/17

/09.

For

per

sona

l use

onl

y.

The ligase-mediated sequencing approach of the Applied Biosystems SOLiD sequencer. In a manner similar to Roche/454 emulsion PCR amplification, DNA fragments for SOLiD sequencing are amplified on the surfaces of 1-μm magnetic beads to provide sufficient signal during the sequencing reactions, and are then deposited onto a flow cell slide. Ligase-mediated sequencing begins by annealing a primer to the shared adapter sequences on each amplified fragment, and then DNA ligase is provided along with specific fluorescent- labeled 8mers, whose 4th and 5th bases are encoded by the attached fluorescent group. Each ligation step is followed by fluorescence detection, after which a regeneration step removes bases from the ligated 8mer (including the fluorescent group) and concomitantly prepares the extended primer for another round of ligation. (b) Principles of two-base encoding. Because each fluorescent group on a ligated 8mer identifies a two-base combination, the resulting sequence reads can be screened for base-calling errors versus true polymorphisms versus single base deletions by aligning the individual reads to a known high-quality reference sequence.From Mardis 2008. Annual Rev. Genetics 9: 387. !59


Comparison in 2008Roche (454) Illumina SOLiD

Chemistry Pyrosequencing Polymerase-based Ligation-based

Amplification Emulsion PCR Bridge Amp Emulsion PCR

Paired ends/sep Yes/3kb Yes/200 bp Yes/3 kb

Mb/run 100 Mb 1300 Mb 3000 Mb

Time/run 7 h 4 days 5 days

Read length 250 bp 32-40 bp 35 bp

Cost per run (total)

$8439 $8950 $17447

Cost per Mb $84.39 $5.97 $5.81From “Introduction to Next Generation Sequencing” by Stefan Bekiranov prometheus.cshl.org/twiki/pub/Main/CdAtA08/CSHL_nextgen.ppt !60

http://prometheus.cshl.org/twiki/pub/Main/CdAtA08/CSHL_nextgen.ppt


Comparison in 2012Roche (454) Illumina SOLiD

Chemistry Pyrosequencing Polymerase-based Ligation-based

Amplification Emulsion PCR Bridge Amp Emulsion PCR

Paired ends/sep Yes/3kb Yes/200 bp Yes/3 kb

Mb/run 100 Mb 1300 Mb 3000 Mb

Time/run 7 h 4 days 5 days

Read length 250 bp 32-40 bp 35 bp

Cost per run (total)

$8439 $8950 $17447

Cost per Mb $84.39 $5.97 $5.81From “Introduction to Next Generation Sequencing” by Stefan Bekiranov prometheus.cshl.org/twiki/pub/Main/CdAtA08/CSHL_nextgen.ppt !61

http://prometheus.cshl.org/twiki/pub/Main/CdAtA08/CSHL_nextgen.ppt


Bells and Whistles

• Multiplexing • Paired end • Mate pair • ????

!62


Multiplexing

!63


Small amounts of DNA

!64


Capture MethodsHigh throughput sample preparation

Sample preparation



Analysis pipeline

Introduction

Illumina GAII

High throughput

Nature Methods, 2010, 7: 111-118

RainDanceMicrodroplet PCR

Roche NimblegenSalid-phase capture with custom-

designed oligonucleotide microarray

Reported 84% of capture efficiency

Reported 65-90% of capture efficiency


!65



High throughput sample preparation

Sample preparation



Analysis pipeline

Introduction

Illumina GAII

High throughput

Agilent SureSelectSolution-phase capture with

streptavidin-coated magnetic beads

Reported 60-80% of capture efficiency


Capture Methods

!66



Moleculo

Large fragmentsDNA

Isolate and amplify

CACC GGAA TCTC ACGT AAGG GATC AAAA

Sublibrary w/ unique barcodes

Sequence w/ Illumina

Assemble seqs w/ same codes

!67


Generation 3.5

• Even faster

!68


Ion Torrent PGM

!69


Applied Biosystems Ion Torrent PGM

Workflow similar to that for Roche/454 systems. !Not surprising, since invented by people from 454.

!70


Ion Torrent PGM

!71


Generation IV??

• Single molecule sequencing

!72


Pacific Biosciences

!73


Pacific Biosciences

!74


Pacific Biosciences

4332 dx.doi.org/10.1021/ac2010857 |Anal. Chem. 2011, 83, 4327–4341

Analytical Chemistry REVIEW

Φ29 polymerase. Each amplified product of a circularizedfragment is called a DNA nanoball (DNB). DNBs are selectivelyattached to a hexamethyldisilizane (HMDS) coated silicon chipthat is photolithographically patterned with aminosilane activesites. Figure 3A illustrates the DNB array design.The use of the DNBs coupled with the highly patterned array

offers several advantages. The production of DNBs increasessignal intensity by simply increasing the number of hybridizationsites available for probing. Also, the size of the DNB is on thesame length scale as the active site or “sticky” spot patterned onthe chip, which results in attachment of one DNB per site. Sincethe active sites are spaced roughly 1 μm apart, up to three billionDNB can be fixed to a 1 in. by 3 in. silicon chip.51 In addition toincreasing the number of sequencing fragments per chip, thelength scales of the size and spacing of the DNBs maximizes thepixel use in the detector. This highly efficient approach togenerating a hybridization array results in decreased reagentcosts and increased throughput compared to other secondgeneration DNA sequencing arrays that have been used tosequence complete human genomes.19,22,52

Once the DNB array chip is generated, a library of fortycommon probes is used in combination with standard anchorsand extended anchors to perform an unchained hybridizationand ligation assay. The forty common probes consist of twosubsets: probes that interrogate 50 of the distinct adapter site inthe DNB and probes that interrogate 30 of the distinct adaptersite in the DNB. In each subset, there are five sets of fourcommon probes; each probe is 9 bases in length. Each setcorresponds to positions 1 to 5 bases away from the distinctadapter sites in the sequencing substrate, and within each set,there are four distinct markers corresponding to each base. Thestandard anchors bind directly to the 50 or 30 end of the adaptersite on the DNB and allow for hybridization and ligation of thecommon probes. The extended anchor scheme consists ofligation of a pair of oligo anchors (degenerate and standard) toexpand the hybridized anchor region 5 bases beyond the adaptersites in the DNB and into the sequencing region. This combi-natorial probe-anchor ligation (cPAL) method developed byComplete Genomics extends read lengths from 5 bases to 10bases and results in a total of 62 to 70 bases sequenced per DNB.A schematic demonstrating both the standard anchor schemeand the extended anchor scheme is shown in Figure 3B.

Each hybridization and ligation cycle is followed by fluorescentimaging of the DNB spotted chip and subsequently regenerationof the DNBs with a formamide solution. This cycle is repeateduntil the entire combinatorial library of probes and anchors isexamined. This formula of the use of unchained reads andregeneration of the sequencing fragment reduces reagent con-sumption and eliminates potential accumulation errors that canarise in other sequencing technologies that require close tocompletion of each sequencing reaction.19,52,53

Complete Genomics showcased their DNB array and cPALtechnology by resequencing three human genomes and reportedan average reagent cost of $4,400 per genome.50 The threegenome samples sequenced in this study (NA07022, NA19240,and NA20431) were then compared to previous sequencedata.54,55 The average coverage of these samples ranged from45X to 87X, and the percent of genome identified ranged from86% to 95%. While this technology greatly increases throughputcompared to Sanger/CE and second-generation sequencingtechnologies, there are several drawbacks to CompleteGenomics’approach. First, the construction of circular sequencing fragmentsresults in an underrepresentation of certain genome regions,which leads to partial genome assembly downstream. Also, thesize of the circular sequencing fragments (∼400 bases) as well asthe very short read lengths (∼10 bases) prevents complete andaccurate genome assembly, given that these fragments are shorterthan a number of the long repetitive regions.Just five months after Complete Genomics’ proof-of-concept

study was published, the first externally published application ofComplete Genomic’s sequencing technology was released. Agroup at the Institute for Systems Biology in Seattle, Washington(USA), studied the genetic differences in a human family offour.56 In the study, whole-genome sequencing was usedto determine four candidate genes responsible for two rareMendelian disorders: Miller syndrome and primary ciliary dys-kinesia. The subjects were a set of parents and their two childrenwho both suffer from the disorders. This study highlighted thebenefits of whole genome sequencing within a family whendetermining Mendelian traits. The ability to recognize inheri-tance patterns greatly reduced the genetic search space forrecessive disorders and increased the sequencing accuracy. Inthe end, sequencing the entire family instead of just the twosiblings affected by Miller syndrome and primary ciliary

Figure 2. Schematic of PacBio’s real-time single molecule sequencing. (A) The side view of a single ZMW nanostructure containing a single DNApolymerase (Φ29) bound to the bottom glass surface. The ZMW and the confocal imaging system allow fluorescence detection only at the bottomsurface of each ZMW. (B) Representation of fluorescently labeled nucleotide substrate incorporation on to a sequencing template. The correspondingtemporal fluorescence detection with respect to each of the five incorporation steps is shown below. Reprinted with permission from ref 39. Copyright2009 American Association for the Advancement of Science.

Figure 2. Schematic of PacBio’s real-time single molecule sequencing. (A) The side view of a single ZMW nanostructure containing a single DNA polymerase (Φ29) bound to the bottom glass surface. The ZMW and the confocal imaging system allow fluorescence detection only at the bottom surface of each ZMW. (B) Representation of fluorescently labeled nucleotide substrate incorporation on to a sequencing template. The corresponding temporal fluorescence detection with respect to each of the five incorporation steps is shown below.

From Niedringhaus et al. Analytical Chemistry 83: 4327. 2011.

!76


Complete Genomics

!77


Complete Genomics



dyskinesia greatly decreased the number of false positive genecandidates which ultimately reduced the number gene candidatesfrom 34 to just four.Just one month later, the second externally published applica-

tion of Complete Genomics’s sequencing technology was re-leased by a research group at Genetech.57 The study comparedthe genome of primary lung tumor cells to that of adjacentnormal tissue obtained from a 51-year-old Caucasian male whoreported a heavy 15-year smoking history. When the completegenomes of different tissue samples were compared, over 50,000single-base variations were identified and 530 previously re-ported single-base mutations were confirmed. Consequently,the importance of complete cancer genome analysis in under-standing cancer evolution and treatment was brought to light dueto the large number of single nucleotide mutations locatedoutside of oncogenes as well as chromosomal structural varia-tions found in the primary lung tumor.A third application of the high-throughput cPAL method

developed by Complete Genomics was published by a researchgroup from the University of Texas Southwestern MedicalCenter in Dallas, Texas (USA).58 This group used whole genomesequencing to diagnose a hypercholesterolemic 11-month-oldgirl with sitosterolemia after a series of blood tests and selectivegenetic sequencing were unable to confer a reasonable diagnosis.The gene and the subsequent mutations responsible for the

sitosterolemia phenotype were determined after comparison ofthe patient’s genome to a collection of reference genomes.Ultimately, it was determined that the patient failed the standardblood test due to low levels of plant sterols that were the result ofa heavy diet of breast milk. This study illustrated the importanceof whole genome sequencing in effectively diagnosing a disease inthe presence of complex environmental factors that can influencestandard assays.

’SEQUENCING BY SYNTHESIS

The idea of sequencing by synthesis has been around for sometime and is the basis for several second-generation sequencingtechnologies including Roche’s 454 sequencing platform andIllumina’s line of sequencing systems. 454’s pyrosequencingmethod, which uses an enzyme cascade to produce light from apyrophosphate released during nucleotide incorporation, wasfirst piloted in the late 1980s and developed for DNA sequencingin themid-1990s.26,59!63 Illumina’s fluorescently labeled sequen-cing by synthesis technique employs fluorescently labeled nu-cleotides with reversible termination chemistry and modifiedpolymerases for improved incorporation of nucleotideanalogues.19 These sequencing by synthesis methods increasedthroughput compared to first-generation sequencing methods;however, optical imaging is needed to detect each sequencing

Figure 3. Schematic of Complete Genomics’ DNB array generation and cPAL technology. (A) Design of sequencing fragments, subsequent DNBsynthesis, and dimensions of the patterned nanoarray used to localize DNBs illustrate the DNB array formation. (B) Illustration of sequencing with a setof common probes corresponding to 5 bases from the distinct adapter site. Both standard and extended anchor schemes are shown. Reprinted withpermission from ref 50. Copyright XXXX American Association for the Advancement of Science.



dyskinesia greatly decreased the number of false positive genecandidates which ultimately reduced the number gene candidatesfrom 34 to just four.Just one month later, the second externally published applica-

tion of Complete Genomics’s sequencing technology was re-leased by a research group at Genetech.57 The study comparedthe genome of primary lung tumor cells to that of adjacentnormal tissue obtained from a 51-year-old Caucasian male whoreported a heavy 15-year smoking history. When the completegenomes of different tissue samples were compared, over 50,000single-base variations were identified and 530 previously re-ported single-base mutations were confirmed. Consequently,the importance of complete cancer genome analysis in under-standing cancer evolution and treatment was brought to light dueto the large number of single nucleotide mutations locatedoutside of oncogenes as well as chromosomal structural varia-tions found in the primary lung tumor.A third application of the high-throughput cPAL method

developed by Complete Genomics was published by a researchgroup from the University of Texas Southwestern MedicalCenter in Dallas, Texas (USA).58 This group used whole genomesequencing to diagnose a hypercholesterolemic 11-month-oldgirl with sitosterolemia after a series of blood tests and selectivegenetic sequencing were unable to confer a reasonable diagnosis.The gene and the subsequent mutations responsible for the

sitosterolemia phenotype were determined after comparison ofthe patient’s genome to a collection of reference genomes.Ultimately, it was determined that the patient failed the standardblood test due to low levels of plant sterols that were the result ofa heavy diet of breast milk. This study illustrated the importanceof whole genome sequencing in effectively diagnosing a disease inthe presence of complex environmental factors that can influencestandard assays.

’SEQUENCING BY SYNTHESIS

The idea of sequencing by synthesis has been around for sometime and is the basis for several second-generation sequencingtechnologies including Roche’s 454 sequencing platform andIllumina’s line of sequencing systems. 454’s pyrosequencingmethod, which uses an enzyme cascade to produce light from apyrophosphate released during nucleotide incorporation, wasfirst piloted in the late 1980s and developed for DNA sequencingin themid-1990s.26,59!63 Illumina’s fluorescently labeled sequen-cing by synthesis technique employs fluorescently labeled nu-cleotides with reversible termination chemistry and modifiedpolymerases for improved incorporation of nucleotideanalogues.19 These sequencing by synthesis methods increasedthroughput compared to first-generation sequencing methods;however, optical imaging is needed to detect each sequencing

Figure 3. Schematic of Complete Genomics’ DNB array generation and cPAL technology. (A) Design of sequencing fragments, subsequent DNBsynthesis, and dimensions of the patterned nanoarray used to localize DNBs illustrate the DNB array formation. (B) Illustration of sequencing with a setof common probes corresponding to 5 bases from the distinct adapter site. Both standard and extended anchor schemes are shown. Reprinted withpermission from ref 50. Copyright XXXX American Association for the Advancement of Science.

Figure 3. Schematic of Complete Genomics’ DNB array generation and cPAL technology. (A) Design of sequencing fragments, subsequent DNB synthesis, and dimensions of the patterned nanoarray used to localize DNBs illustrate the DNB array formation. (B) Illustration of sequencing with a set of common probes corresponding to 5 bases from the distinct adapter site. Both standard and extended anchor schemes are shown.

From Niedringhaus et al. Analytical Chemistry 83: 4327. 2011. !78


Oxford Nanopore

!79


Oxford Nanopore

This diagram shows a protein nanopore set in an electrically resistant membrane bilayer. An ionic current is passed through the nanopore by setting a voltage across this membrane. If an analyte passes through the pore or near its aperture, this event creates a characteristic disruption in current. By measuring that current it is possible to identify the molecule in question. For example, this system can be used to distinguish the four standard DNA bases and G, A, T and C, and also modified bases. It can be used to identify target proteins, small molecules, or to gain rich molecular information for example to distinguish the enantiomers of ibuprofen or molecular binding dynamics.

From Oxford Nanopores Web Site

!80


Oxford Nanopore

• Figure6. BiologicalnanoporeschemeemployedbyOxfordNanopore.(A)SchematicofRHLproteinnanoporemutantdepictingthepositionsofthe cyclodextrin (at residue 135) and glutamines (at residue 139). (B) A detailed view of the β barrel of the mutant nanopore shows the locations of the arginines (at residue 113) and the cysteines. (C) Exonuclease sequencing: A processive enzyme is attached to the top of the nanopore to cleave single nucleotides from the target DNA strand and pass them through the nanopore. (D) A residual current-vs-time signal trace from an RHL protein nanopore that shows a clear discrimination between single bases (dGMP, dTMP, dAMP, and dCMP). (E) Strand sequencing: ssDNA is threaded through a protein nanopore and individual bases are identified, as the strand remains intact. Panels A, B, and D reprinted with permission from ref 91. Copyright 2009 Nature Publishing Group. Panels C and E reprinted with permission from Oxford Nanopore Technologies (Zoe McDougall).#



detected across a metal oxide-silicon layered structure. Thevoltage signal is induced across the capacitor by the passage ofcharged nucleotides in longitudinal direction.79 A different read-out approach is optical detection (Figure 5B). A typical opticalrecognition of nucleotides is essentially executed in two steps.First, each base (A, C, G, or T) in the target sequence isconverted into a sequence of oligonucleotides, which are thenhybridized to two-color molecular beacons (with fluorophoresattached).80 Because the four nucleotides (A, C, G, or T) have tobe determined, the two fluorescent probes are coupled in pairs touniquely define each base. For example, if the two probes are Aand B, the four unique permutations will be AA, AB, BA, and BB.As the hybridized DNA strand is threaded through the nanopore,the fluorescent tag is stripped off from its quencher and an opticalsignal is detected. Both protein81 and solid-state nanopores canbe used.80,82 Detailed electronic measurement schemes andoptical readout methods have been reviewed in more detail inpreviously published papers.71,83,84

In a 2008 review article,71 Daniel Branton and colleaguesdiscussed the nanopore sequencing development and the pros-pect of low sample preparation cost at high throughput. Theyestimated that purified genomic DNA sufficient for sequencing(∼108 copies or 700 μg) could be extracted and purified fromblood at a cost of less than $40/sample using commercial kits. Allexisting sequencing techniques require breaking the DNA intosmall fragments of ∼100 bps and sequencing those chunksmultiple times to find overlapping regions, so that they can bereassembled together. Because one of most appealing advantagesof nanopores is achieving long read lengths, the genomic assemblyprocess should be considerably simplified. In practice, the readlength may be limited only by the DNA shearing that occursduring pipetting in the sample preparation step. For example,Meller and Branton85 demonstrated that 25 kb ssDNA could bethreaded through a biological nanopore and 5.4 kb ssDNA

translocated through a solid-state nanopore. In addition, severalgroups have shown very high throughput of small oligonucleo-tides (∼5.8 oligomers (s μM)!1)85 and native ssDNA anddsDNA (∼3!10 kb at 10!20 nM concentration).86,87

Protein Nanopore Sequencing.Oxford Nanopore Technol-ogies, formerly Oxford Nanolabs, together with leading academiccollaborators, has addressed some of the aforementioned tech-nological challenges and implemented the nanopore technologyin a commercial product (GridION system). Oxford Nanopore,founded by Prof. Hagan Bayley at University of Oxford, aimed atcommercializing the research work on biological nanoporescoming out of his laboratory. The company works in collabora-tion with Professors Daniel Branton, George Church, and JeneGolovchenko at Harvard; David Deamer and Mark Akeson atUCSC; and John Kasianowicz at NIST.Recently, Gordon Sanghera, CEO of Oxford Nanopore,

announced that the company is preparing to launch the GridIONsystem88 for direct single-molecule analysis, which would adoptexonuclease sequencing. The system is based on “lab on a chip”technology and integrates multiple electronic cartridges into arack-like device. A single protein nanopore is integrated in a lipidbilayer across the top of a microwell, equipped with electrodes.Multiple microwells are incorporated onto an array chip, andeach cartridge holds a single chip with integrated fluidics andelectronics for sample preparation, detection, and analysis. Thesample is introduced into the cartridge, which is then inserted inan instrument called a GridION node. Each node can be usedseparately or in a cluster, and all nodes communicate with eachother and with the user’s network and storage system in real time.Although the main application of the platform is sequencing ofDNA, it can be adapted (by proper modification of the RHLnanopore) for the detection of proteins and small molecules.Oxford Nanopore’s first-generation systems utilize the hepta-

meric protein R-hemolysin (RHL) (Figure 6A).72,89!91 RHL is

Figure 6. Biological nanopore scheme employed by Oxford Nanopore. (A) Schematic of RHL protein nanopore mutant depicting the positions of thecyclodextrin (at residue 135) and glutamines (at residue 139). (B) A detailed view of the β barrel of the mutant nanopore shows the locations ofthe arginines (at residue 113) and the cysteines. (C) Exonuclease sequencing: A processive enzyme is attached to the top of the nanopore to cleave singlenucleotides from the target DNA strand and pass them through the nanopore. (D) A residual current-vs-time signal trace from anRHLprotein nanoporethat shows a clear discrimination between single bases (dGMP, dTMP, dAMP, and dCMP). (E) Strand sequencing: ssDNA is threaded through aprotein nanopore and individual bases are identified, as the strand remains intact. Panels A, B, and D reprinted with permission from ref 91. Copyright2009 Nature Publishing Group. Panels C and E reprinted with permission from Oxford Nanopore Technologies (Zoe McDougall).

Figure6. BiologicalnanoporeschemeemployedbyOxfordNanopore.(A)SchematicofRHLproteinnanoporemutantdepictingthepositionsofthe cyclodextrin (at residue 135) and glutamines (at residue 139). (B) A detailed view of the β barrel of the mutant nanopore shows the locations of the arginines (at residue 113) and the cysteines. (C) Exonuclease sequencing: A processive enzyme is attached to the top of the nanopore to cleave single nucleotides from the target DNA strand and pass them through the nanopore. (D) A residual current-vs-time signal trace from an RHL protein nanopore that shows a clear discrimination between single bases (dGMP, dTMP, dAMP, and dCMP). (E) Strand sequencing: ssDNA is threaded through a protein nanopore and individual bases are identified, as the strand remains intact. Panels A, B, and D reprinted with permission from ref 91. Copyright 2009 Nature Publishing Group. Panels C and E reprinted with permission from Oxford Nanopore Technologies (Zoe McDougall). From Niedringhaus et al. Analytical Chemistry 83: 4327. 2011.

!81


Nanopores



would result, in theory, in detectably altered current flow throughthe pore. Theoretically, nanopores could also be designed tomeasure tunneling current across the pore as bases, each with adistinct tunneling potential, could be read. The nanopore ap-proach, while still in development, remains an interesting potentialfourth-generation technology. This “fourth-generation”moniker issuggested, since optical detection is eliminated along with therequirement for synchronous reagent wash steps.

Nanopore technologies may be broadly categorized into twotypes, biological and solid-state. The protein alpha hemolysin,which natively bridges cellular membranes causing lysis, was firstused as a model biological nanopore. The protein was insertedinto a biolayer membrane separating two chambers while sensi-tive electronics measured the blockade current, which changed asDNA molecules moved through the pore. However, chemicaland physical similarities between the four nucleotides made thesequence much more difficult to read than envisioned. Further,sufficient reduction of electronic noise remains a constantchallenge, which is achievable in part by slowing the rate ofDNA translocation. Recently, Oxford Nanopore and severalother academic groups working on this concept have madeprogress toward addressing these challenges.

The second class is based on the use of nanopores fabricatedmechanically in silicon or other derivative. The use of thesesynthetic nanopores alleviates the difficulties ofmembrane stabilityand protein positioning that accompany the biological nanoporesystem Oxford Nanopore has established. For example, Nabsyscreated a system using a silicon wafer drilled with nanopores usinga focused ion beam (FIB), which detects differences in blockadecurrent as single-strandedDNAboundwith specific primers passesthrough the pore. IBM created a more complex device that aimsto actively pause DNA translocation and interrogate each base fortunneling current during the pause step. The technology of bothof these nanopore types are presented in detail below.

John Kasianowicz and colleagues73 were the first to show thetranslocation of polynucleotides (poly[U]) through a Staphylo-coccal R-hemolysin (RHL) biological nanopore suspended in a

lipid bilayer, using ionic current blockage method. The authorspredicted that single nucleotides could be discriminated as longas: (1) each nucleotide produces a unique signal signature; (2)the nanopore possesses proper aperture geometry to accommo-date one nucleotide at a time; (3) the current measurements havesufficient resolution to detect the rate of strand translocation; (4)the fragment should translocate in a single direction whenpotential is applied; and (5) the nanopore/supporting mem-brane assembly should be sufficiently robust. All of the biologicaland synthetic nanopores have barrels of ∼5 nm (which isconsiderably longer than the base-to-base distance of 3.4 Å) inthickness and accommodate∼10!15 nucleotides at a time. It is,therefore, impossible to achieve single-base resolution usingblockage current measurements. In addition, the average rateat which a polymer typically translocates through a nanopore ison the order of 1 nucleotide/μs (i.e., on the order of MHzdetection), which is too fast to resolve. The nucleotide strandshould be slowed down to ∼1 nucleotide/ms to allow for a pA-current signal at 120!150 mV applied potential.74 Furthermore,the translocation of a polymer strand should be uniform betweentwo events. The time distribution of two processes (capture,entry, and translocation) is nonPoisson and often differs by anorder of magnitude. This means that two molecules pass througha nanopore at considerably different rates and the slower onecould be missed or misinterpreted. Andre Marziali and co-workers at UBC75,76 used force spectroscopy to study theseevents through single-molecule bond characterization. The non-uniform kinetics of DNA passage through an RHL nanopore isattributed to weak binding of DNA to amino acid residues of theprotein nanopore.77

Because of these challenges with ionic current measurements(the current created by the flow of ions through the nanopore),researchers have looked at other measurement schemes such asthe detection of tunneling current and capacitance changes(Figure 5A). In transverse tunneling current scheme, electrodesare positioned at the pore opening and the signal is detected fromsubnanometer probes.78 In capacitance measurements, voltage is

Figure 5. Nanopore DNA sequencing using electronic measurements and optical readout as detection methods. (A) In electronic nanopore schemes,signal is obtained through ionic current,73 tunneling current,78 and voltage difference79 measurements. Eachmethodmust produce a characteristic signalto differentiate the four DNAbases. Reprinted with permission from ref 83. Copyright 2008 Annual Reviews. (B) In the optical readout nanopore design,each nucleotide is converted to a preset oligonucleotide sequence and hybridized with labeledmarkers that are detected during translocation of the DNAfragment through the nanopore. Reprinted from ref 82. Copyright 2010 American Chemical Society.

Nanopore DNA sequencing using electronic measurements and optical readout as detection methods.(A)In electronic nanopore schemes, signal is obtained through ionic current,73 tunneling current, and voltage difference measurements. Each method must produce a characteristic signal to differentiate the four DNA bases. (B) In the optical readout nanopore design, each nucleotide is converted to a preset oligonucleotide sequence and hybridized with labeled markers that are detected during translocation of the DNA fragment through the nanopore.

From Niedringhaus et al. Analytical Chemistry 83: 4327. 2011.


Oxford Nanopore

From Oxford Nanopores Web Site

!83

eve161 lecture 2

Education