dna computing-based implementation of decision tree

DNA computing-based Implementation of

Decision tree

Advanced AI

컴퓨터공학부 임 예니인지과학 협동 과정 이 은석

생물정보학 협동 과정 조 성범

Decision Tree using DNA computing

• Input strand organization At each attribute, instance value and

class label was coupled

After hybridization, length of strand means number of instances

유전자 1

class

유전자2

class

환자 1

0100

환자 2

1000

환자 3

1000

환자 4

1101

Gene A

Class Gene B

Class Gene C

Class

Patien

t 1

0 0 0 0 1 0

2 0 1 0 1 0 1

3 1 1 1 0 1 1

4 0 1 0 1 1 1

5 0 1 0 1 0 1

Cy5

{(00),(01),(10),(11)}5’ Sticky end Sticky end 3’

(0,0)

(0,1)

(1,0)

(1,1)

Calculation of Information Gain

• Information Gain(S,A) ≡ Entropy(S) - ∑(|Sv|/|S|)*Entropy(Sv)

= (|S0|/|S|)*Entropy(S0) +(|S1|/|S|)*Entropy(S1) In gene expression data, all attribute values are encode

d in binary mode.

(|S0|/|S|)*Entropy(S0) ≈ (|S0|/|S|)*(n1/|S0|) ≈ n1/|S|

(|S1|/|S|)*Entropy(S1) ≈ (|S1|/|S|)*(n2/|S1|) ≈ n2/|S|

∑(|Sv|/|S|)*Entropy(Sv) =

(|S0|/|S|)*Entropy(S0)+(|S1|/|S|)*Entropy(S1)

≈ (|S0|/|S|)*(n1/|S0|) + (|S1|/|S|)*(n2/|S1|)

≈ n1/|S|+ n2/|S| ≈ n1+n2

36822

1894 34836

1915 38982

(0,0) 13 39 24 46 54(0,1) 49 6 55 14 24(1,0) 47 21 36 14 6(1,1) 11 54 5 46 36

36822=0

1894 34836

1915 38982

(0,0) 11 7 13 13

(0,1) 6 46 13 21

(1,0) 2 6 0 0

(1,1) 43 3 36 28

1894=0

34836 1915 38982

(0,0) 5 11 11

(0,1) 6 1 1

(1,0) 0 0 1

(1,1) 6 5 4

DNA computing Vs Digital computing

• Rules from DNA computing

36822=0 -1894=0 -1915=0:0 -1915=1:1 -1894=1:1

Identical to conventional decision tree algorithm

Input Sequence<00>/<01>/<10>/<11>

5’ Sticky end Sticky end 3’

GCATAG GAAATGAGTT CTTTACTCAA CGTATC

ATAGGC TGATGCTACA ACTACGATGT TATCCG

AGGCAT GGTTGTGGCG CCAACACCGC TCCGTA

ATAGGA CAGTTATTTC GTCAATAAAG TATCCT

<00><01><10>

<11>

Implementation steps

1. Rule representing sequence 2. Hybridization 3. Construction random paths 4. Florescence detection: Check if a

specific rule appeared sequentially

5. Repeating step 3-5

Simulation Results

• 1st: each rule sequences: 1000,900,800,700 hybridization #: 1000

1st

0100200300400500600700800900

<00> <01> <10> <11>

연속

출현

시퀀

스수

1계열

Simulation Results

• 2nd:

2nd

0100200300400500600700800900

1000

<00> <01> <10> <11>

연

속출

현시

퀀스

수

1계열

Simulation Results

• 3rd:

3rd

0

100

200

300

400

500

600

700

<00> <01> <10> <11>

연속

출현

시퀀

스수

1계열

Simulation Results

• 4th:

4th

0100200300400500600700800900

1000

<00> <01> <10> <11>

연속

출현

시퀀

스수

1계열

(0,0) ; 0.85

(0,1) ; 0.91

(1,0) ; 0.62

(1,1) ; 0.87

Summary of Simulation Results

36822

1894 34836

1915 38982

(0,0) 13:11.05

39:33.15

24:20.4

46:39.1

54:45.9

(0,1) 49:44.59

6:5.46 55:50.05

14:12.74

24:21.84

(1,0) 47:29.14

21:13.1

36:22.32

14:8.68

6:3.72

(1,1) 11:9.57

54:4.69

5:4.35 46:40.2

36:31.3

Calculation of Root Node

with Simulation Results

Validation of decision tree resulting from DNA computing

and digital computing

Number of gene

Digital computi

ng

DNA computi

ng

3 82% 70%

5 90% 75%

Discussion

• Due to unspecific hybridization, simulation results were different from that of calculation

• Lack of Pruning process

• Cost

• More specific hybridization process

dna computing-based implementation of decision tree

Documents