바이오정보기술 (bit) 과 바이오지능 (biointelligence) 장 병 탁 서울대...
TRANSCRIPT
바이오정보기술바이오정보기술 (BIT)(BIT) 과 과 바이오지능바이오지능 (Biointelligence)(Biointelligence)
장 병 탁서울대 컴퓨터공학부
E-mail: [email protected]://scai.snu.ac.kr./~btzhang/
Byoung-Tak ZhangSchool of Computer Science and Engineering
Seoul National University
2
OutlineOutline
Introduction
Bioinformation Technology (BIT) = BT + IT
Bioinformatics, Biocomputing, Biochips
Biointelligence = BT + AI
Concept, Methodology, Technology
Applied Biointelligence
Summary
Further Information
3
IntroductionIntroduction
4
Biotechnology RevolutionBiotechnology Revolution
Year
2000
Biotechnology Age
1950
Information Age
AD 1760
Industrial Age
Econom
ical V
alue
Agricultural Age
BC 6000
5
Human Genome ProjectHuman Genome Project
Genome Health Implications
A New
Disease
Encyclopedia
New Genetic
Fingerprints
New
Diagnostics
New
Treatments
Goals• Identify the approximate 100,000 genes in human DNA• Determine the sequences of the 3 billion bases that make up human DNA• Store this information in database• Develop tools for data analysis• Address the ethical, legal and social issues that arise from genome research
6
Bioinformation Technology (BIT)Bioinformation Technology (BIT)= BT + IT= BT + IT
BTBTITIT
In silico Biology (e.g. Bioinformatics)
In vivo Informatics (e.g. Biocomputing)
7
Bioinformation TechnologyBioinformation TechnologyBioinformaticsBioinformaticsBiocomputingBiocomputing
BiochipsBiochips
8
BioinformaticsBioinformatics
9
What is Bioinformatics?What is Bioinformatics?
Bioinformatics vs. Computationl Biology Bioinformatik (in German): Biology-based computer scien
ce as well as bioinformatics (in English)
Bioinformatics vs. Computationl Biology Bioinformatik (in German): Biology-based computer scien
ce as well as bioinformatics (in English)
Informatics – computer science
Bio – molecular biology
Bioinformatics – solving problems arising from biology using methodology from computer science.
10
What is DNA?What is DNA?
AACCTGCGGAAGGATCATTACCGAGTGCGGGTCCTTTGGGCCCAACCTCCCATCCGTGTCTATTGTACCCGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGCCTCTGCCCCCCGGGCCCGTGCCCGCCGGAGACCCCAACACGAACACTGTCTGAAAGCGTGCAGTCTGAGTTGATTGAATGCAATCAGTTAAAACTTTCAACAATGGATCTCTTGGTTCCGGCATGCAATCAGTCCCGTTGCTTCGGCACTGTCTGAAAGCGCCTTTGGGCCCAACCTCCCATCCGTGTCTATTGTACCCGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGCCGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGCTATTGTACCCGTTGCTTCGGATCTCTTGGGGATCTCTTGGTTCCGGCATGCAATCAGTCCCGTTGCTTCGGCACTGTCTGAAAGCGCCTTTGGGCCCAACCTCCCACCGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGGCCGCCGGGGGCACTGTCTGAAAGCTCGGCCGCC
11
The Structure of DNAThe Structure of DNASugar-phosphate
backbone
HydrogenbondsBase
RNA consists of A, C, G, and U, where U plays the same role as T Watson-Crick complementary pairs:
A and T (or A and U) C and G
Hybridization: when 2 strands of complementary DNA (or one strand of DNA and one strand of complementary RNA) stick together
12
Molecular Biology: Flow of Molecular Biology: Flow of Information Information
DNA RNA Protein Function
DNA
Phe Cys LysCysAspCys ArgSerAla
Leu
Protein
AC
TG
GAAGCT
TATC
13
DNA (gene) RNA ProteinDNA (gene) RNA Protein
controlstatement
TATA start
Termination stop
controlstatement
Ribosomebinding
gene
Transcription (RNA polymerase)
mRNA
Protein
Transcription (Ribosome)
5’ utr 3’ utr
14
Nucleotide and Protein SequenceNucleotide and Protein Sequence
aacctgcgga aggatcattaccgagtgcgg gtcctttgggcccaacctcc catccgtgtctattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgccggagacccc aacacgaacactgtctgaaa gcgtgcagtctgagttgatt gaatgcaatcagttaaaact ttcaacaatggatctcttgg ttccggctgc tattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgccggagacccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg cggagacccc
gcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgcaacctgcgga aggatcattaccgagtgcgg gtcctttgggcccaacctcc catccgtgtctattgtaccc tgttgcttcggcgggcccgc cgcttgtcggagttaaaact ttcaacaatggatctcttgg ttccggctgc tattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgccggagacccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg cggagacccc gcgggcccgc cgcttgtcggccgccggggg ggcgcctctg
cgcttgtcgg ccgccgggggccccccgggc ccgtgcccgccggagacccc aacacgaacactgtctgaaa gcgtgcagtctgagttgatt gaatgcaatcagttaaaact ttcaacaatggatctcttgg aacctgcggaccgagtgcgg gtcctttgggcccaacctcc catccgtgtctattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgagttaaaact ttcaacaatggatctcttgg ttccggctgc tattgtaccc tgttgcttcggcgggcccgc cgcttgtcggccgccggggg ggcgcctctgccccccgggc ccgtgcccgccggagacccc tgttgcttcg
SQ sequence 1344 BP; 291 A; C; 401 G; 278 T; 0 other
DNA (Nucleotide) Sequence
CG2B_MARGL Length: 388 April 2, 1997 14:55 Type: P Check:
9613 .. 1
MLNGENVDSR IMGKVATRAS SKGVKSTLGT RGALENISNV ARNNLQAGAK KELVKAKRGM TKSKATSSLQ SVMGLNVEPM EKAKPQSPEP MDMSEINSAL EAFSQNLLEG VEDIDKNDFD NPQLCSEFVN DIYQYMRKLE REFKVRTDYM TIQEITERMR SILIDWLVQV HLRFHLLQET LFLTIQILDR YLEVQPVSKN
KLQLVGVTSM LIAAKYEEMY PPEIGDFVYI TDNAYTKAQI RSMECNILRR LDFSLGKPLC IHFLRRNSKA GGVDGQKHTM AKYLMELTLP EYAFVPYDPS EIAAAALCLS SKILEPDMEW GTTLVHYSAY SEDHLMPIVQ KMALVLKNAP TAKFQAVRKK YSSAKFMNVS TISALTSSTV MDLADQMC
Protein (Amino Acid) Sequence
15
Some FactsSome Facts
1014 cells in the human body. 3.109 letters in the DNA code in every cell in your
body. DNA differs between humans by 0.2%, (1 in 500
bases). Human DNA is 98% identical to that of
chimpanzees. 97% of DNA in the human genome has no known
function.
16
EMBL Database GrowthEMBL Database Growth
0
1
2
3
4
5
6
7
8
9
10
1982 1984 1986 1988 1990 1992 1994 1996 1998 2000year
millio
ns o
f record
s
total number of records (millions)
17
Bioinformatics Is About:Bioinformatics Is About:
Elicitation of DNA sequences from genetic material
Sequence annotation (e.g. with information from experiments)
Understanding the control of gene expression (i.e. under what circumstances proteins are transcribed from DNA)
The relationship between the amino acid sequence of proteins and their structure.
18
Background of BioinformaticsBackground of Bioinformatics
Biological information infra Biological information management systems Analysis software tools Communication networks for biological research
Massive biological databases DNA/RNA sequences Protein sequences Genetic map linkage data Biochemical reactions and pathways
Need to integrate these resources to model biological reality and exploit the biological knowledge that is being gathered
19
Extension of Bioinformatics ConcExtension of Bioinformatics Concept ept Genomics
Functional genomics Structural genomics
Proteomics: large scale analysis of the proteins of an organism
Pharmacogenomics: developing new drugs that will target a particular disease
Microarry: DNA chip, protein chip
20
Applications of BioinformaticsApplications of Bioinformatics
Drug design Identification of genetic risk factors Gene therapy Genetic modification of food crops and animals Biological warfare, crime etc.
Personal Medicine? E-Doctor?
21
SNP (Single Nucleotide PolymorpSNP (Single Nucleotide Polymorphism)hism)
Finding single nucleotide changes at specific regions of genes
Diagnosis of hereditary diseases Personal drug Finding more effective drugs and
treatments
22
Problems in BioinformaticsProblems in Bioinformatics
Structure analysisStructure analysis Protein structure comparison Protein structure prediction RNA structure modeling
Pathway analysisPathway analysis Metabolic pathway Regulatory networks
Sequence analysisSequence analysis Sequence alignment Structure and function prediction Gene finding
Expression analysisExpression analysis Gen expression analysis Gene clustering
23
The Complete Microarray BioinforThe Complete Microarray Bioinformatics Solutionmatics Solution
DataManagement
Databases
StatisticalAnalysis
ImageProcessing
Automation
DataMining
ClusterAnalysis
24
Bioinformatics as Information TecBioinformatics as Information Technologyhnology
Bioinformatics
InformationRetrieval
GenBankSWISS-PROT
Hardware
Agent
Machine Learning
Algorithm
Supercomputing
Information filteringMonitoring agent
ClusteringRule discoveryPattern recognition
Sequence alignment
Biomedical text analysis
Database
25
Bioinformatics on the WebBioinformatics on the Web
sample
array
hybridization
scanner
relationaldatabase
Data management
The experimental process
webinterface
image analysis results andsummaries
links to otherinformation
resources
downloaddata to otherapplications
Data analysis and interpretation
26
BiocomputingBiocomputing
27
Biocomputing vs. BioinformaticsBiocomputing vs. Bioinformatics
BTBTITIT
Bioinformatics
Biocomputing
28
Traveling Salesman ProblemTraveling Salesman Problem
The traveling salesman problem: as the number of cities grows, even supercomputers have difficulty finding the shortest path.
1
0
3
2 5
6
4
29
Adleman’s Molecular Computer: Adleman’s Molecular Computer: A Brute Force MethodA Brute Force Method
Each city (vertex) is represented by a different sequence of nucleotides (6 here, but Ad
leman used 20).
A DNA linker (edge) joining two
city (vertex) strands.
30
AGCTTAGG
ATGGCATG
ATCCTACC
Vertex 1 Vertex 2
Edge 12
Step 1 : Hybridization
AGCTTAGG ATGGCATGATCC TACC
AGCTTAGGATCCTACC
Step 2 : Ligation
AGCTTAGGATGGCATGGAATCCGATGCATGGCTCGAATCC ACGTACCG
Vertex 1
ATGGCATG
Vertex 4
Step 3 : PCR
32 bp 16 bp
Step 4 : Gel Electrophoresis
AGCTTAGGATGGCATGGAATCCGA…TCGAATCC
Bead for vertex 1
Step 5 : Magnetic Bead Affinity Separation
31
Molecular Operators for DNA Molecular Operators for DNA ComputingComputing
• Hybridization: complementary pairing of two single-stranded polynucleotides
5’- AGCATCCA –3’
3’- TCGTAGGT –5’
+5’- AGCATCCA –3’3’- TGCTAGGT –5’
• Ligation: attaching sticky ends to a blunt-ended molecule
TGACTACGACTG
ATGCATGCTACG
+ ATGCATGCTGACTACGTACGTGAC
sticky end
32
DNA finds a solution!DNA finds a solution!
A Hamiltonian path with all vertices included is isolated and recovered
33
Why DNA Computing?Why DNA Computing?
6.022 1023 molecules / mole Immense, Brute Force Search of All Possibilities
Desktop: 109 operations / sec Supercomputer: 1012 operations / sec 1 mol of DNA: 1026 reactions
Favorable Energetics: Gibb’s Free Energy
1 J for 2 1019 operations Storage Capacity: 1 bit per cubic nanometer
-1mol 8kcalG
34
DNA Computers vs. Conventional DNA Computers vs. Conventional ComputersComputers
DNA-based computers Microchip-based computers
slow at individual operations fast at individual operations
can do billions of operations simultaneously
can do substantially fewer operations simultaneously
can provide huge memory in small space
smaller memory
setting up a problem may involve considerable preparations
setting up only requires keyboard input
DNA is sensitive to chemical deterioration
electronic data are vulnerable but can be backed up easily
35
Research GroupsResearch Groups
MIT, Caltech, Princeton University, Bell Labs EMCC (European Molecular Computing Consorti
um) is composed of national groups from 11 European countries
BioMIP Institute (BioMolecular Information Processing) at the German National Research Center for Information Technology (GMD)
Molecular Computer Project (MCP) in Japan Leiden Center for Natural Computation (LCNC)
36
Applications of Biomolecular ComApplications of Biomolecular Computingputing Massively parallel problem solving Combinatorial optimization Molecular nano-memory with fast associative search AI problem solving Medical diagnosis Cryptography Drug discovery Further impact in biology and medicine:
Wet biological data bases Processing of DNA labeled with digital data Sequence comparison Fingerprinting
37
BiochipsBiochips
38
DNA ChipDNA Chip
39
DNA Chip TechnologyDNA Chip Technology
40
Classification of DNA Chip Classification of DNA Chip TechnologyTechnology
Photolithography
Inkjetting
Mechanical micro-spotting
41
How DNA Chips Are MadeHow DNA Chips Are Made
42
Photolithography ChipPhotolithography Chip
.Light-directed Oligonucleotide Synthesis
43
Microarray RobotMicroarray Robot
44
DNA Chip ApplicationsDNA Chip Applications
Gene discovery: gene/mutated gene Growth, behavior, homeostasis …
Disease diagnosis Drug discovery: Pharmacogenomics Toxicological research: Toxicogenomics
45
Protein ChipsProtein Chips
A new paradigm in protein molecular mapping strategies
46
Bioelectronic Devices Bioelectronic Devices
Au Coated Glass
Bio-Memory Device
Au
Cyt c
GFP
Glass
Electron Sensitizer
Electron Acceptor
Patterned Bio-Film
47
History of Lab-on-a-ChipHistory of Lab-on-a-Chip
48
Integrates sample handling, separation and detection and data analysis for: DNA, RNA and protein solutions using LabChip technology.
Lab-on-a-chip TechnologyLab-on-a-chip Technology
49
BiointelligenceBiointelligence
Concept and HistoryConcept and HistoryMethodologyMethodologyTechnologyTechnologyApplicationsApplications
50
Concept and HistoryConcept and History
51
Biointelligence (BI)Biointelligence (BI)
Study of artificial intelligence based on biotechnology
Biointelligence as a new technology Solving AI problems using biotechnology (BT) or BIT Using BT to solve AI problems
Biointelligence as a new application Using AI techniques to solve BT problems
Biointelligence as a new research field Biochemistry = Biology + Chemistry Bioinformatics = Biology + Informatics Biointelligence (BI) = Biology (BT) + Intelligence (AI)
52
Relationships to Existing Relationships to Existing Research AreasResearch Areas
Information Information TechnologyTechnology
(IT)(IT)
Information Information TechnologyTechnology
(IT)(IT)
AIAIAIAIBioinformationBioinformationTechnology (BIT)Technology (BIT)BioinformationBioinformationTechnology (BIT)Technology (BIT)
BiotechnologyBiotechnology(BT)(BT)
BiotechnologyBiotechnology(BT)(BT)
BiointelligenceBiointelligence(BI)(BI)
BiointelligenceBiointelligence(BI)(BI)
53
Related Research FieldsRelated Research Fields
Artificial IntelligenceArtificial Intelligence
BiointelligenceBiointelligenceBioinformaticsBioinformatics BiocomputingBiocomputing
BiochipsBiochipsBioinformation Bioinformation
TechnologyTechnology
Bioinformation Bioinformation
TechnologyTechnology
54
Biological AI: HistoryBiological AI: History
Symbolic AISymbolic AI
• 1943: Production rules • 1956: “Artificial Intelligence” • 1958: LISP AI language• 1965: Resolution theorem proving
• 1970: PROLOG language• 1971: STRIPS planner• 1973: MYCIN expert system• 1982-92: Fifth generation computer systems project• 1986: Society of mind
• 1994: Intelligent agents
Symbolic AISymbolic AI
• 1943: Production rules • 1956: “Artificial Intelligence” • 1958: LISP AI language• 1965: Resolution theorem proving
• 1970: PROLOG language• 1971: STRIPS planner• 1973: MYCIN expert system• 1982-92: Fifth generation computer systems project• 1986: Society of mind
• 1994: Intelligent agents
Biological AIBiological AI
• 1943: McCulloch-Pitt’s neurons • 1959: Perceptron• 1965: Cybernetics• 1966: Simulated evolution• 1966: Self-reproducing automata
• 1975: Genetic algorithm
• 1982: Neural networks• 1986: Connectionism• 1987: Artificial life
• 1992: Genetic programming• 1994: DNA computing
Biological AIBiological AI
• 1943: McCulloch-Pitt’s neurons • 1959: Perceptron• 1965: Cybernetics• 1966: Simulated evolution• 1966: Self-reproducing automata
• 1975: Genetic algorithm
• 1982: Neural networks• 1986: Connectionism• 1987: Artificial life
• 1992: Genetic programming• 1994: DNA computing
55
Paradigm Shift in AI ResearchParadigm Shift in AI Research
Symbolic Subsymbolic Knowledge -based
Learning-based
Deduction Induction
Model-driven Data-driven
Top-down Bottom-up High-level Low-level
Reflective Reflexive
Individual Collective
Deep-thought Reactive behavior
Syntactic Semantic
Discrete Continuous
Deterministic Stochastic
Logic Probabilistic
56
Computers and BiosystemsComputers and Biosystems
(Moravec, 1988)(Moravec, 1988)
57
Biointelligence MethodologyBiointelligence Methodology
58
Four Levels of BiointelligenceFour Levels of Biointelligence
Molecular IntelligenceMolecular Intelligence
Cellular IntelligenceCellular Intelligence
Organismic IntelligenceOrganismic Intelligence
Ecological IntelligenceEcological Intelligence
<= Focus of classical AI
59
Comparison of Biointelligence TeComparison of Biointelligence Technologieschnologies
MolecularMolecular
IntelligenceIntelligenceCellularCellular
IntelligenceIntelligenceOrganismicOrganismicIntelligenceIntelligence
EcologicalEcological
IntelligenceIntelligence
Basic unitBasic unit molecules cells organism population
BiologyBiology Molecularbiology
cell biology neurobiology ecology
PhenomenonPhenomenon self-assembly development learning evolution
Time (typical)Time (typical) seconds days months years
CommunicatioCommunicationn
lock-keymechanism
electrochemicalsignals
neuro-transmitters
audiovisual,symbolic
Basic Basic operationoperation
ligationhybridization
cell divisiondifferentiation
excitationinhibition
cooperationcompetition
ComputationalComputational
modelsmodelsDNA/molecularcomputing
cell-automataimmune nets
neural netssemantic nets
evolutionaryalgorithms
ChipsChips DNA chipsprotein chips
embryonic chipslab-on-a-chip
neurochips evolvablehardware
60
Biomolecular Information ProcesBiomolecular Information Processingsing
DNA SequenceDNA SequenceDNA SequenceDNA Sequence
mRNA SequencemRNA SequencemRNA SequencemRNA Sequence
Protein SequenceProtein SequenceProtein SequenceProtein Sequence
Folded ProteinFolded ProteinFolded ProteinFolded Protein
Transcription
Translation
Folding
61
FeaturesFeatures
Stochastic (vs. deterministic) Massively parallel (vs. sequential) Self-assembly (vs. programming) Liquid rather than solid-state Biochemical (vs. electronic) Biomolecule-based (vs. silicon-based)
62
Principles and Theoretical ToolsPrinciples and Theoretical Toolsfor Biointelligence Researchfor Biointelligence Research
Self-Assembly Self-Reproduction
Uncertainty Principle Occam’s Razor Principle
Information Theory Probability Theory Thermodynamics Statistical Physics
63
Biology-Based AI Models: Biology-Based AI Models: Existing ExamplesExisting Examples
Evolutionary ComputationEvolutionary Computation:
computational method
simulating natural selection
DNA ComputingDNA Computing: information pro
cessing based on biomolecules
Neural NetworksNeural Networks: computation
model imitating brain structure
64
Neural ComputationNeural Computation: The Brain : The Brain as Computeras Computer
1. 1011 neurons with 1014 synapses2. Speed: 10-3 sec3. Distributed processing4. Nonlinear processing5. Parallel processing
1. A single processor with complex circuits
2. Speed: 10 –9 sec 3. Central processing4. Arithmetic operation
(linearity) 5. Sequential processing
65
From Biological Neurons to From Biological Neurons to Artificial NeuronsArtificial Neurons
66
“Owing to this struggle for life, any variation, however slight and from whatever cause proceeding, if it be in any degree profitable to an individual of any species, in its infinitely complex relations to other organic beings and to external nature, will tend to the preservation of that individual, and will generally be inherited by its offspring.”
Origin of Species “Charles Darwin (1859)”
Evolutionary ComputationEvolutionary Computation: : Nature as ComputerNature as Computer
67
Variation and Selection: The Variation and Selection: The Principle Principle
solutions
1100101010101110111000110110011100110001
1100101110
10111011101100101010
crossovercrossover
mutationmutation
00110
1011101010
10011
00110 10010
evaluationevaluation
110010111010111010100011001001
solutions
fitnesscomputation
roulettewheel
selectionselectionnew
population
encoding
chromosomes
68
DNA ComputingDNA Computing: BioMolecules a: BioMolecules as Computers Computer
011001101010001 ATGCTCGAAGCT
69
HPPHPP
...
......
...ATGATG
ACGACG
TGCTGC
CGACGA
TAATAA
GCAGCA
CGTCGT...
...
...
...... ...
...
...
10
3
2 5
6
4
SolutionSolution
ATGTGCTAACGAACG
ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGT
TAAACG
CGACGT
TAAACGGCAACG
...
...
...
...
CGACGTAGCCGT
...
...
...
ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGTACGCGTAGCCGT
ACGCGT
......
...
...
...
ACGGCATAAATGTGCACGCGTACGCGAGCATAAATGCGATGCCGT
ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGT
...... ......
...
ACGCGAGCATAAATGTGCCGTACGCGAGCATAAATGTGCCGT
...
.........
...
Decoding
Ligation
Encoding
Gel Electrophoresis
Affinity Column
ACGCGAGCATAAATGTGCACGCGT
ACGCGAGCATAAATGCGATGCACGCGT
ACGCGAGCATAAATGTGCACGCGT
ACGCGAGCATAAATGCGATGCACGCGT
2
0 13 4
56
Node 0: ACG Node 3: TAANode 0: ACG Node 3: TAANode 1: CGA Node 4: ATGNode 1: CGA Node 4: ATGNode 2: GCA Node 5: TGCNode 2: GCA Node 5: TGC
Node 6: CGTNode 6: CGT
Flow of DNA ComputingFlow of DNA Computing
PCR(Polymerase
Chain Reaction)
70
Biointelligence TechnologyBiointelligence Technology
71
Biointelligence on a Chip?Biointelligence on a Chip?
Biological Computer
MolecularElectronics
BioinformationTechnology
Computing Models:The limit of conventional computing models
Computing Devices: The limit of siliconesemiconductor technology
Information Technology
Biotechnology
Biointelligence Chip
72
Intelligent Biomolecular InformatioIntelligent Biomolecular Information Processingn Processing
Bio-Memory Biocomputing
Theoretical Models
S
GFP
Cytochrome c
S
GFP
Cytochrome c
Bio-Processor
Input AInput AController
OutputReaction Chamber
(Calculating)
73
분자 컴퓨터 모델분자 컴퓨터 모델
Bio-diode 소자Bio-diode 소자• 단일 전자 소자• Bio-transistor 구성• Bio-memory
• 단일 전자 소자• Bio-transistor 구성• Bio-memory
Bio-logic gate 소자Bio-logic gate 소자• 단일 전자 소자• 직렬 processor• Thz 급 처리속도
• 단일 전자 소자• 직렬 processor• Thz 급 처리속도
One-chip 적용
분자 연산 소자분자 연산 소자• 병렬 processor• Thz 급 처리속도 (CPU)
• 병렬 processor• Thz 급 처리속도 (CPU)
74
Evolvable Biomolecular HardwarEvolvable Biomolecular Hardwaree
Sequence programmable and evolvable molecular systems have been constructed as cell-free chemical systems using biomolecules such as DNA and proteins.
75
Molecular Storage for Massively Molecular Storage for Massively Parallel Information RetrievalParallel Information Retrieval
Trillions of DNA
성 명 전화번호 주 소홍길동 419-1332 서울 송파구 잠실본동 211
송승헌 352-4730 인천시 남구 주안 5 동 23-1
원 빈 648-7921 경기도 구리시 아천동 246-2
송혜교 418-9362 서울시 영등포구 신길 2 동 11
…
전화번호부
76
The ‘Knight Problem’The ‘Knight Problem’
Given an n x n chess board, what position can a knight occupy such that no knight can attack another knight.
An example of SAT NP-complete for infinite boards Example: 3 x 3 Board
77
Three Solutions to the ‘Knight Three Solutions to the ‘Knight Problem’Problem’
Problem solved: 3 of the 31 solutions to the knight conundrum found by the RNA-based machine
78
Solving Logic Problems by Solving Logic Problems by Molecular ComputingMolecular Computing Satisfiability Problem
Find Boolean values for variables that make the given formula true
3-SAT Problem Every NP problems can be see
n as the search for a solution that simultaneously satisfies a number of logical clauses, each composed of three variables.
)or or ( AND )or or (
)or or ( AND )or or (
321321
654321
xxxxxx
xxxxxx
)()()( 324431 xxxxxx
79
DNA Chips for DNA ComputingDNA Chips for DNA Computing
I. Make: oligomer synthesis
II. Attach (Immobilized): 5’HS-C6-T15-CCTTvvvvvvvvTTCG-3’
III. Mark: hybridization
IV. Destroy: Enzyme rxn (ex.EcoRI)
V. Unmark * 문제를 만족시키지 않는 모든 stran
d 제거
VI. Readout: N cycle 의 마지막 단계에 해가 남게
되 면 , PCR 로 증폭하여 확인 !
80
Variable Sequences and the Variable Sequences and the Encoding SchemeEncoding Scheme
81
Tree-dimensional Plot and Tree-dimensional Plot and Histogram of the FluorescenceHistogram of the Fluorescence
S3: w=0, x=0, y=1, z=1
S7: w=0, x=1, y=1, z=1
S8: w=1, x=0, y=0, z=0
S9 : w=1, x=0, y=0, z=1
y=1: (w V x V y) 만족 z=1: (w V y V z) 만족 x=0 or y=1: (x V y) 만족 w=0: (w V y) 만족
Four spots with high fluorescence intensity correspond to the four expected solutions.
DNA sequences identified in the readout step via addressed array hybridization.
82
Applied BiointelligenceApplied Biointelligence
Bio-based AI Methods for Solving Bio-problemsBio-based AI Methods for Solving Bio-problems
83
Spillover of BiointelligenceSpillover of Biointelligence
Understanding information flow in biological construction
HealthcareHealthcareDrugsDrugs FoodsFoods
Analysis, modeling and management tools
84
Multilayer Perceptrons for Gene Multilayer Perceptrons for Gene Finding and PredictionFinding and Prediction
Coding potential valueCoding potential value
GC CompositionGC Composition
LengthLength
DonorDonor
AcceptorAcceptor
Intron vocabularyIntron vocabulary
basesDiscrete
exon score
0
1
sequence
score
85
Self-Organizing Maps for DNA MiSelf-Organizing Maps for DNA Microarray Data Analysiscroarray Data Analysis
Two-dimensional arrayof postsynaptic neurons
Bundle of synapticconnections
Winning neurons
Input
86
Biological Information ExtractionBiological Information ExtractionText Data
DB
LocationDate
DB Record
Database TemplateFilling
Data Analysis &Field Identify
Data Classify &Field Extraction
Information Extraction
Field PropertyIdentify & Learning
87
Medical BiointelligenceMedical Biointelligence
Automation of genome expressionanalysis
Integration ofmolecular data
Inference andmodeling systems
Molecular classification of cancer
Diagnosissystems
Organismmodeling
Drug design
Key aspects addressed Goal
88
E-DoctorE-Doctor
Diagnosis Expert System
Self-diagnosis
Pharmacy
Hospital
Personal Medicine
89
BioroboticsBiorobotics
Robot = Mechanical + Electronic (+ Biological) Biorobot = Biological + (Mechanical + Electronic) Biological Robots with Biointelligence
Self-reproduction Evolution Learning
90
ConclusionsConclusions
IT gets a growing importance in the advancement of BT (e.g., bioinformatics).
IT can benefit much from BT (e.g., biocomputing and biochips)
Bioinformation technology (BIT) is essential as a next-generation information technology.
From the AI point of view, biosystems are existing proofs of intelligent systems.
Biointelligence defined as a study of artificial intelligence based on biotechnology is a new technology and application area at the intersection of BT and IT.
Biological AI technologies can provide a short cut for building AI machines.
91
“The interface between biological systems and computational systems will become blurred, allowing powerful computational control of biological systems and implantation of computer interfaces into the human brain. Biology will be become the dominant metaphor for computer science, providing a framework for understanding and constructing complex computations.”
- Mark Gerstein
92
Further InformationFurther Information
93
Journals & ConferencesJournals & Conferences
Journals Biological Cybernetics (Springer) BioSystems (Elsevier) Artificial Intelligence in Medicine Bioinformatics (Oxford University Press) Computer Applications in the Bioscience (Oxford University Press) Computers in Biology and Medicine (Elsevier) IEEE Transactions on Biomedical Engineering IEEE Transactions on Evolutionary Computation
Conferences International Conference on Intelligent Systems for Molecular Biology (ISMB) Pacific Symposium on Biocomputing (PSB) International Conference on Computational Molecular Biology (RECOMB) IBC’s Annual Conference on Biochip Technologies International Meeting on DNA Based Computers IEEE Bioinformatics and Bioengineering Symposium (BIBE) International Symposium on Medical Data Analysis (ISMDA)
94
Web Resources: Web Resources: BioinformaticsBioinformatics
ANGIS - The Australian National Genomic Information Service: http://morgan.angis.su.oz.au/
Australian National University (ANU) Bioinformatics: http://life.anu.edu.au/ BioMolecular Engineering Research Center (BMERC): http://bmerc-www.bu.edu/ Brutlag bioinformatics group: http://motif.stanford.edu/ Columbia University Bioinformatics Center (CUBIC): http://cubic.bioc.columbia.edu/ European Bioinformatics Institute (EBI): http://www.ebi.ac.uk/ European Molecular Biology Laboratory (EMBL): http://www.embl-heidelberg.de/ Genetic Information Research Institute: http://www.girinst.org/ GMD-SCAI: http://www.gmd.de/SCAI/scai_home.html Harvard Biological Laboratories: http://golgi.harvard.edu/ Laurence H. BakerCenter for Bioinformatics and Biological Statistics: http://www.
bioinformatics.iastate.edu/ NASA Center for Bioinformatics: http://biocomp.arc.nasa.gov/ NCSA Computational Biology: http://www.ncsa.uiuc.edu/Apps/CB/ Stockholm Bioinformatics Center: http://www.sbc.su.se/ USC Computational Biology: http://www-hto.usc.edu/ W. M. Keck Center for Computational Biology: http://www-bioc.rice.edu/
95
Web Resources: BiocomputingWeb Resources: Biocomputing
European Molecular Computing Consortium (EMCC): http://www.csc.liv.ac.uk/~emcc/
BioMolecular Information Processing (BioMip): http://www.gmd.de/BIOMIP
Leiden Center for Natural Computation (LCNC): http://www.wi.leidenuniv.nl/~lcnc/
Biomolecular Computation (BMC): http://bmc.cs.duke.edu/
DNA Computing and Informatics at Surfaces: http://www.corninfo.chem.wisc.edu/writings/DNAcomputing.html
SNU Molecular Evolutionary Computing (MEC) Project: http://scai.snu.ac.kr/Research/
96
Web Resources: BiochipsWeb Resources: Biochips
DNA Microarry (Genome Chip): http://www.gene-chips.com/
Large-Scale Gene Expression and Microarray Link and Resources: http://industry.ebi.ac.uk/~alan/MicroArray/
The Microarray Centre at The Ontario Cancer Institute: http://www.oci.utoronto.ca/services/microarray/
Lab-on-a-Chip resources: http://www.lab-on-a-chip.com/
Mailing List: [email protected]
97
Books: BioinformaticsBooks: Bioinformatics
Cynthia Gibas and Per Jambeck, Developing Bioinformatics Computer Skills, O’REILLY, 2001.
Peter Clote and Rolf Backofen, Computational Molecular Biology: An Introduction, A John Wiley & Sons, Inc., 2000.
Arun Jagota, Data Analysis and Classification for Bioinformatics, 2000.
Hooman H. Rashidi and Lukas K. Buehler, Bioinformatics Basics Applications in Biological Science and Medicine, 1999.
Pierre Baldi and Soren Brunak, Bioinformatics: The Machine Learning Approach, MIT Press, 1998.
Andreas Baxevanis and B. F. Francis Ouellette, Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, A John Wiley & Sons, Inc., 1998.
98
Books: BiocomputingBooks: Biocomputing
Cristian S, Calude and Gheorghe Paun, Computing with Cells and Atoms: An introduction to quantum, DNA and membrane computing, Taylor & Francis, 2001.
Pâun, G., Ed., Computing With Bio-Molecules: Theory and Experiments, Springer, 1999.
Gheorghe Paun, Grzegorz Rozenberg and Arto Salomaa, DNA Computing, New Computing Paradigms, Springer, 1998.
C. S. Calude, J. Casti and M. J. Dinneen, Unconventional Models of Computation, Springer, 1998.
Tono Gramss, Stefan Bornholdt, Michael Gross, Melanie Mitchell and thomas Pellizzari, Non-Standard Computation: Molecular Computation-Cellular Automata-Evolutionary Algorithms-Quantum Computers, Wiley-Vch, 1997.
99
For more information:For more information:
http://scai.snu.ac.kr/http://scai.snu.ac.kr/