dna computing-based implementation of decision tree
DESCRIPTION
DNA computing-based Implementation of Decision tree. Advanced AI 컴퓨터공학부 임 예니 인지과학 협동 과정 이 은석 생물정보학 협동 과정 조 성범. 유전자 1. class. 유전자 2. class. 0. 환자 1. 0. 1. 0. 환자 2. 0. 1. 0. 0. 0. 환자 3. 1. 0. 0. 1. 1. 1. 0. 환자 4. Decision Tree using DNA computing. - PowerPoint PPT PresentationTRANSCRIPT
DNA computing-based Implementation of
Decision tree
Advanced AI
컴퓨터공학부 임 예니인지과학 협동 과정 이 은석
생물정보학 협동 과정 조 성범
Decision Tree using DNA computing
• Input strand organization At each attribute, instance value and
class label was coupled
After hybridization, length of strand means number of instances
유전자 1
class
유전자2
class
환자 1
0100
환자 2
1000
환자 3
1000
환자 4
1101
Gene A
Class Gene B
Class Gene C
Class
Patien
t 1
0 0 0 0 1 0
2 0 1 0 1 0 1
3 1 1 1 0 1 1
4 0 1 0 1 1 1
5 0 1 0 1 0 1
Cy5
{(00),(01),(10),(11)}5’ Sticky end Sticky end 3’
(0,0)
(0,1)
(1,0)
(1,1)
Calculation of Information Gain
• Information Gain(S,A) ≡ Entropy(S) - ∑(|Sv|/|S|)*Entropy(Sv)
= (|S0|/|S|)*Entropy(S0) +(|S1|/|S|)*Entropy(S1) In gene expression data, all attribute values are encode
d in binary mode.
(|S0|/|S|)*Entropy(S0) ≈ (|S0|/|S|)*(n1/|S0|) ≈ n1/|S|
(|S1|/|S|)*Entropy(S1) ≈ (|S1|/|S|)*(n2/|S1|) ≈ n2/|S|
∑(|Sv|/|S|)*Entropy(Sv) =
(|S0|/|S|)*Entropy(S0)+(|S1|/|S|)*Entropy(S1)
≈ (|S0|/|S|)*(n1/|S0|) + (|S1|/|S|)*(n2/|S1|)
≈ n1/|S|+ n2/|S| ≈ n1+n2
36822
1894 34836
1915 38982
(0,0) 13 39 24 46 54(0,1) 49 6 55 14 24(1,0) 47 21 36 14 6(1,1) 11 54 5 46 36
36822=0
1894 34836
1915 38982
(0,0) 11 7 13 13
(0,1) 6 46 13 21
(1,0) 2 6 0 0
(1,1) 43 3 36 28
1894=0
34836 1915 38982
(0,0) 5 11 11
(0,1) 6 1 1
(1,0) 0 0 1
(1,1) 6 5 4
DNA computing Vs Digital computing
• Rules from DNA computing
36822=0 -1894=0 -1915=0:0 -1915=1:1 -1894=1:1
Identical to conventional decision tree algorithm
Input Sequence<00>/<01>/<10>/<11>
5’ Sticky end Sticky end 3’
GCATAG GAAATGAGTT CTTTACTCAA CGTATC
ATAGGC TGATGCTACA ACTACGATGT TATCCG
AGGCAT GGTTGTGGCG CCAACACCGC TCCGTA
ATAGGA CAGTTATTTC GTCAATAAAG TATCCT
<00><01><10>
<11>
Implementation steps
1. Rule representing sequence 2. Hybridization 3. Construction random paths 4. Florescence detection: Check if a
specific rule appeared sequentially
5. Repeating step 3-5
Simulation Results
• 1st: each rule sequences: 1000,900,800,700 hybridization #: 1000
1st
0100200300400500600700800900
<00> <01> <10> <11>
연속
출현
시퀀
스수
1계열
Simulation Results
• 2nd:
2nd
0100200300400500600700800900
1000
<00> <01> <10> <11>
연
속출
현시
퀀스
수
1계열
Simulation Results
• 3rd:
3rd
0
100
200
300
400
500
600
700
<00> <01> <10> <11>
연속
출현
시퀀
스수
1계열
Simulation Results
• 4th:
4th
0100200300400500600700800900
1000
<00> <01> <10> <11>
연속
출현
시퀀
스수
1계열
(0,0) ; 0.85
(0,1) ; 0.91
(1,0) ; 0.62
(1,1) ; 0.87
Summary of Simulation Results
36822
1894 34836
1915 38982
(0,0) 13:11.05
39:33.15
24:20.4
46:39.1
54:45.9
(0,1) 49:44.59
6:5.46 55:50.05
14:12.74
24:21.84
(1,0) 47:29.14
21:13.1
36:22.32
14:8.68
6:3.72
(1,1) 11:9.57
54:4.69
5:4.35 46:40.2
36:31.3
Calculation of Root Node
with Simulation Results
Validation of decision tree resulting from DNA computing
and digital computing
Number of gene
Digital computi
ng
DNA computi
ng
3 82% 70%
5 90% 75%
Discussion
• Due to unspecific hybridization, simulation results were different from that of calculation
• Lack of Pruning process
• Cost
• More specific hybridization process