From Molecular Computingto Molecular Programming
萩谷 昌己Masami Hagiya
JSPS Project on Molecular Computing
• Funded by Japan Society for Promotion of Science
• Research for the Future Program– biocomputing field ( 生命情報 )
chaired by Prof. Anzai• molecular computing
• artificial cell (chemical IC)
• evolutionary computation
• signal transduction
• complex systems
• October 1996 - March 2001
• Project Leader - Masami Hagiya (Computer Science)
• Members– Takashi Yokomori (Computer Science)– Masayuki Yamamura (Computer Science)
– Masanori Arita (Genome Informatics)
– Akira Suyama (Biophysics)– Yuzuru Husimi (Biophysics)– Kensaku Sakamoto (Biochemistry)– Shigeyuki Yokoyama (Biochemistry)
JSPS Project on Molecular Computing
Goals of Molecular Computing• Analyses and Applications of Computational Power of Bio
molecules– Understanding Life from the Viewpoint of Computation
• computational mystery of life
• origin of life, wet artificial life
– Engineering Applications (not restricted to computation)
• combinatorial optimization
• gene expression analysis
• nanotechnology, nanomachine
• cryptography
• medical and pharmaceutical applications in the future
• New Computational Model, New Simulation Technology
Major Achievements• Suyama’s Dynamic Programming DNA Computers
– reduction of molecules by breadth-first search– automation by robots
• Sakamoto’s Hairpin Engines– Whiplash PCR and SAT Engine– molecular computation by hairpin formation– autonomous molecular computation
• Theoretical Studies by Yokomori’s Group• Nishikawa’s Simulator for DNA computations• Arita’s New Tool for Code Design• Husimi’s 3SR-Based Evolutionary Reactor• Yamamura’s Aqueous Computing (with Head)
Dynamic ProgrammingDNA Computers
Adleman-Lipton Paradigm• Adleman ( Science 1994 )
– Solving Hamilton Path Problem by DNA
• Lipton, et al.– Solving SAT Problem by DNA
• Massively Parallel Computation by Molecules– Mainly for Combinatorial Optimization– Random Generation by Self-Assembly
• solution candidate = DNA molecule
– Selection by Molecular Biology Experiments
Scaling Up ⇒ Efforts to increase yields and reduce errors
Robot and Chemical IC
cf. Hamiltonian Path Problem by Adleman
Dynamic ProgrammingDNA Computer
• “counting” ( Ogihara and Ray )• “dynamic programming” ( Suyama )• Iteration of Generation and Selection
– Generation of Candidates of Partial Solutions– Selection of Solutions
• The order of computational complexity does not decrease, but the amount of necessary molecules is drastically reduced.– 3-SAT
Brute Force v.s. Dynamic ProgrammingDNA Computers
Solution
GeneratingWHOLE
solution spaceAT ONCE
GeneratingWHOLE
solution spaceAT ONCE
SelectionSelection
Brute ForceLarge pool size
Low reaction rate
Solution
GeneratingPARTIAL
solution spaceSTEP BY STEP
GeneratingPARTIAL
solution spaceSTEP BY STEP
SelectionSelection
Dynamic Programmingsmall pool size
High reaction rate
3-CNF SAT Solution on DP DNA Computer
}{
YES
)()(
)()(
)()(
)()(
)()(
clauses10variables,4
4321
432432
432431
421431
321321
321321
FFTT XXXX
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
:Solution
:Problem
Basic Operations forDynamic Programming DNA Computers
get (T, +s), get (T, -s)get DNA molecules with a subsequence s (without s) in a tube T
append (T, s, e)append a subsequence s at the end of DNA molecules with a splint e in a tube T
merge (T, T1, T2, …, Tn)merge DNA molecules in tubes T1, T2, …, Tn into a tube T
amplify(T, T1, T2, …, Tn)amplify DNA molecules in a tube T and divide them into tubes T1, T2, …, and Tn
detect(T)detect DNA molecule in a tube T
Implementation of Basic Operations
annealingand
ligation s
s
immobilizationand
cold wash
s
s
hot wash
s
Taq DNA ligase
get (T, +s), get (T, -s)
s
s
annealing
immobilization
cold wash
hot washs get (T, +s)
get (T, -s)
s
s
amplify (T, T1, T2, …Tn)
PCR
immobilizationand
cold wash
hot washand
divide
annealing T
T1, T2, …Tn
append (T, s, e)
e
e
DP algorithm for 3CNF-SAT on DNA Computers
end
return
end
end
end
thenif
end
thenif
dotofor
dotofor
begin
function
);(detect
);,merge(
);,,(append);,,(append
);,,(getuvsat
);,,(getuvsat
1
);,,(amplify
3
};,,,{
),,,...,,,(sat3dna
/1
/1
1
212121212
111
n
FTk
Fk
FTk
Fk
Fw
FTk
FTk
Tk
Tw
T
jjT
wT
w
kj
jjF
wF
w
kj
Fw
Twk
FFFTTFTT
mmm
T
TTT
XXXTTXXXTT
vuTT
xw
vuTT
xw
mj
TTT
nk
XXXXXXXXT
wvuwvu
end
return
begin
function
;
);,(merge
);,(get
/*/*);,'(get
);,(get');,(get
),,(getuvsat
T
Tv
Tu
T
Tv
Fu
Tv
Fu
Fu
Fu
Tu
Fu
Tu
Tu
T
TTT
XTT
omittedbecanXTT
XTTXTT
vuT
merge)get3(
merge)
append2(amplify)2(
operationsofNumber
m
n
DP algorithm for 3CNF-SAT
)( 432 xxx
)( 432 xxx
)( 432 xxx
k’s loop: k ranges over variable indices j’s loop: j ranges over clause indices if xk is the 3rd literal of the j-th clause then remove those assignments which satisfy neither the 1st nor the 2nd literal append Xk
F to the remaining assignments (do similarly if ¬ xk is the 3rd literal)
X1F X2
T X3T
X1F X2
F X3T
X1T X2
T X3F
X1T X2
F X3F
k = 4
X1T X2
T X3F X4
F
Confirmation of the Solution by PCR (k=4)
}{),,(append 4/
34 TFTTTw
T XXXTT }{),,(append 43214/
34FFTTFFTFF
wF XXXXXXXTT
},,,{ 321321321321FFTFTTTFFTTFF
wT
w XXXXXXXXXXXXTT
merge
)}get();get();{get();get(
),,(getuvsat
)(:7
merge
)}get();get();{get();get(
),,(getuvsat
)(:6
merge
)}get();get();{get();get(
),,(getuvsat
)(:5
3111
77
431
2111
66
421
3111
55
431
TTFF
Fw
TTFF
Fw
FFTT
Tw
XXXX
vuT
xxxj
XXXX
vuT
xxxj
XXXX
vuT
xxxj
merge
)}get();get();{get();get(
),,(getuvsat
)(:10
merge
)}get();get();{get();get(
),,(getuvsat
)(:9
merge
)}get();get();{get();get(
),,(getuvsat
)(:8
3222
1010
432
3222
99
432
3222
88
432
FTFF
Fw
FTTT
Fw
TFTT
Fw
XXXX
vuT
xxxj
XXXX
vuT
xxxj
XXXX
vuT
xxxj
dotofor 101j
end
24 26 28 30 32 34
elution time (min)
RF
U
M SM
no S
24 26 28 30 32 34
elution time (min)
RF
U FT
),( 32FT XX
),( 32FF XX
),( 31FT XX
),( 31TT XX
),( 31FF XX
),( 31TF XX
26 27 28 29 30
elution time (min)
RF
U
An Amount of DNA for ComputationBrute Force v.s. Dynamic Programming
100 variable 3-CNF SAT
Adleman-Lipton’s
Brute Force
DNA Computers
2x1012 g of dsDNA
1x1012 g of ssDNA
Dynamic Programming
DNA Computers
2x10-3 g of dsDNA
(1x10-3 g of ssDNA)
4×1016
1×1015
4×1013
n = 100
Number of Variables v.s.Number of Molecules
On Scaling Up the Size of Computations
• The size of random pools currently used:– 210 … 310
Rapidly increasing.
• The number of molecules in a test tube:– 1010 … 1012
We will soon reach the limit on the number of molecules in a test tube. That is, we will fully utilize the parallelism of molecules in a single tube.
⇒ Multiple Tubes, Chemical IC, Cells, etc.… But reaching the limit is the current goal.
Robot for DNA Computing Based on MAGTRATIONTM
Magnetic Beads in MAGTRATIONTM
Automation of DNA Computations• Robot for DNA Computing Based on
MAGTRATIONTM
s
s
Annealing
Immobilization
Cold wash
Hot washs get (T, +s)
get (T, -s)
s
s
Automation of the Get Command
[Instrument][Reset Counter] 0[Home Position] 0[MJ-Open Lid]・・・[Get1(0)][Get2(1)][Append(2)]・・・[Exit]
protocol-level
(1-1-4) [MJ-Open Lid]Do 2 _SEND "LID OPEN" Do 10 _SEND "LID?" Wait_msec 500 _CMP_GSTR "OPEN" IF_Goto EQ 0 ;open Wait_msec 1000 LoopLoop; Time outEnd;open
script-level
end
return
end
end
end
thenif
end
thenif
dotofor
dotofor
begin
function
);(detect
);,merge(
);,,(append
);,,(append
);,,(getuvsat
);,,(getuvsat
1
);,,(amplify
3
};,,,{
),,,...,,,(sat3dna
/1
/1
1
212121212
111
n
FTk
Fk
FTk
Fk
Fw
F
Tk
FTk
Tk
Tw
T
jjT
wT
w
kj
jjF
wF
w
kj
Fw
Twk
FFFTTFTT
mmm
T
TTT
XXXTT
XXXTT
vuTT
xw
vuTT
xw
mj
TTT
nk
XXXXXXXXT
wvuwvu
Pascal/C-level
Programming in DNA Computer
Hairpin Engines
Autonomous Molecular Computing• Adleman-Lipton Paradigm
– Generation of Candidates = Autonomous Reaction– Selection of Solutions = Operations from Outside
• One-Pot Reaction ⇒ Autonomous Computation
Comutation by Successive Autonomous Reactions by Molecules– Winfree’s DNA Tile– Sakamoto’s Hairpin Engines
• Whiplash PCR and SAT Engine
cf. Winfree’s DNA Tile
cf. Winfree’s DNA Tile
Hairpin Engines
• Molecular Computation by Hairpin Formation– Hairpin --- Typical Secondary Structure
• Whiplash PCR– DNA Automaton: State Machine by DNA
– 5 Transitions in a Control Experiment
• SAT Engine– Selection by Hairpin Structures of DNA
– 3‐SAT: 6-Variable 10-Clause Formula
SAT Engine• Sakamoto et al., Science, May 19, 2000.• Selection by Hairpin Structures of DNA
– digestion by restriction enzyme– exclusive PCR
• 3-SATssDNA consisting of literals,
each selected from a clausecomplementary literal = complementary sequencedetection of inconsistency hairpin⇒
• The essential part of the SAT computation is done by hairpin formation.– Autonomous Molecular Computation
b ¬ be
(a∨b∨c)∧( ¬ d∨e∨¬ f)∧ … ∧( ¬ c∨¬ b∨a)∧ ...
b ¬ bdigestion by restriction enzymeexclusive PCR
Selection by Hairpin Structures• Digestion by Restriction Enzyme
– Hairpins are cut at the restriction site inserted in each literal sequence.
• Exclusive PCR– PCR is inefficient for hairpins.– In exclusive PCR, solution is diluted in each
cycle to keep the difference in amplification.• The number of steps is independent on the number
of variables or clauses.
Generation of Random Pool
(a∨b∨c)∧(d∨e∨f)∧(g∨h∨i)∧(j∨k∨l)
a d g j
b e h k
c f i l
Chemically Synthesized
Generation of Random Pool
(a∨b∨c)∧(d∨e∨f)∧(g∨h∨i)∧(j∨k∨l)
a d g j
b e h k
c f i l
Generation of Random Pool
(a∨b∨c)∧(d∨e∨f)∧(g∨h∨i)∧(j∨k∨l)
a
d
g
j
b
e
h
kc
f
i l
4 5 5 54 4 4 4 4 49 8
BstXI BstXIBstNI BstNI BstNI
30
Generation of Random Pool
4
6-Variable 10-Clause Formula
(a∨b∨!c)∧(a∨c∨d)∧(a∨!c∨!d)∧(!a∨!c∨d)∧(a∨!c∨e)∧(a∨d∨!f)∧(!a∨c∨d)∧(a∨c∨!d)∧(!a∨!c∨!d)∧(!a∨c∨!d)
! = ¬
Solution of a6-Variable 10-Clause formula
Whiplash PCR• DNA Automaton : State Machine by DNA
– Polymerization of a Hairpin Form– Polymerization Stop
• Autonomous SIMD Computation of Boolean μ-formulas
• Solving NP-Complete Problems in O(1)-Stepe.g., vertex cover:
vertex cover candidate = transition table = ssDNA
vertex cover = transition table that reaches the final state
• 5 Transitions in a Control Experiment
x B A xC
Bx
ab
Whiplash PCR
x B A xC
B
Whiplash PCR
x B A x C B x
a
Whiplash PCR
x B A x C B x
a
bc
Whiplash PCR
5 Transitions ina Control Experiment
0 12
34
56
7
A Perspective on Molecular Computing
• Structure Formation = Computation– probabilistic process– governed by thermodynamics and kinetics
• Molecular Computation as– Probabilistic (Randomized) Computation– It should be analyzed by
• complexity theory (for random algorithms)• thermodynamics and kinetics.• cf. Winfree
• Computational Mystery of Life– Why is life so efficient computationally?
• protein folding• gene regulation, signal transduction, etc.
Molecular Programming
Molecular Programming• Designing and controlling biomolecular reactions• Biomolecules (DNA,RNA,protein) --- combinatorial complexity
• Molecular program --- two parts– the part encoded in molecules themselves (e.g., DNA seq.)
• We should go beyond simple code design.– the part implemented by a sequence of lab. operations
• Molecular programming ...– controls conformational change and self-assembly by coordin
ating the two parts.
• Various applications– gene expression analysis– nanotechnology and nanomachine– combinatorial chemistry
Simple Example of Molecular Programming
• PCR (Polymerase Chain Reaction)– primers– various parameters
• high temperature and period• low temperature and period
– polymerase (enzyme)
• Molecular programming in PCR ...– Designing primer sequences– Setting various parameters– Selecting polymerase
More Sophisticated Example
• Suyama’s Universal DNA Chip– ENCODE : conversion from mRNA transcripts t
o DCN (DNA-coded number)– AMPLIFY : amplification of DCN by PCR– DECODE : detection of DCN by the universal D
NA chip (DNA capillary array)
AMPLIFYSD
ED
DECODE
D1 D2
D2kD1j
(j =0,1,…,n-1)
labelP
universal DNA chip
D1j
(j =0,1,…,n-1)
siei
biotin
DCNi
Ai
target transcript i
SA magnetic beads
ED
SDD1i
D2i
ED
SDD1i
D2i
ENCODE
More Sophisticated Example• DNA Chip and Molecular Programming
– ENCODE– AMPLIFY
– ANALYSIS : information processing on DCN• Example: If G1 is expressed, G2 is not expressed, and
G3 is expressed, then there is a danger of disease D1 and no danger of D2
– DECODE
• Such rules (programs) can be represented by molecules!– Whiplash PCR or Sakakibara’s recent work
• Merits– No need for sequencing --- efficiency and confidentiality
Plan for the Next Proposal
• Being submitted to Ministry of Education
• Has not got through.
• 4 sub-proposals– Theory of Molecular Programming– Molecular Programming for Self-assembly– Molecular Programming by Evolution– Molecular Programming in Chemical IC