shared ensemble learning using multi-trees 전자전기컴퓨터공학과 g201249003 김영제...

22
Shared Ensemble Learning using Multi-trees 전전전전전전전전전전 G201249003 전전전 Database Lab

Upload: walter-weaver

Post on 16-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Shared Ensemble Learning using Multi-trees전자전기컴퓨터공학과G201249003 김영제Database Lab

Page 2: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Introduction• What is a decision tree?• Each node in the tree specifies a test for some attribute of the in-

stance• Each branch corresponds to an attribute value• Each leaf node assigns a classification

Decision Tree for PlayTennis

Page 3: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Cost Associated with Machine Learning

• Generation costs• Computational costs

• i.e. computer resource consumption• Give better solutions for provided resources

Page 4: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Cost Associated with Machine Learning• Application costs• First

• Models are accurate in average• This does not mean seamless, and confident

• Model can be highly accurate for frequent cases

• Extremely inaccurate for infrequent, critical situations• i.e. diagnosis, fault detection

Page 5: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Cost Associated with Machine Learning• Application costs• Second

• Even accurate models can be useless• If the purpose is to obtain some new knowledge• not expressed in the form of rules• the number of rules is too high

• The interpretation of results • significant costs• it may even be impossible

Page 6: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Construction of Decision Tree• Tree construction• Driven by a splitting criterion that selects the best split

• The selected split is applied to generate new branches• The rest of splits are discarded

• Algorithm stops when• the examples that fall into a branch belong to the same class

Page 7: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Construction of Decision Tree• Pruning• Removal of not useful parts of the tree in order to avoid over-fit-

ting

• Pre-pruning• performed during the construction of the tree

• Post-pruning• performed by analyzing the leaves once the tree has been built

Page 8: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Merit and Demerit of Decision Tree• Merit• Allows the quick construction of a model

• Because decision trees are built in a eager way (greedy)

• Demerit• It may produce bad models because of bad decisions

Page 9: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Multi-tree Structure• Rejected splits are not removed• But stored as suspended nodes

• Two new criteria required for the construction of a single deci-sion tree• Suspended node selection

• To populate a multi-tree, need to specify a criterion that selects one of the suspended nodes

• Selection of a model• Select one or more comprehensible models according to a selection

criterion

Page 10: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Multi-tree Structure

Page 11: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Shared Ensembles Combina-tion• Combination of a set of classifiers improves the accuracy of

simple classifiers

• Combination methods• Boosting, Bagging, Randomization, Stacking, Windowing

• The large amount of memory required to store• Shared the common parts of the components of the ensemble• Using the multi-tree

Page 12: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Shared Ensembles Combina-tion• Each node : • Number of classes : • Class vector : (• Leaf nodes, are the training cases of class that have fallen into

the leaf • Following, several fusion strategies that convert class vectors

into one combined vector • sum: • arithmetic mean: • product: • geometric mean: • maximum: • minimum: • median:

Page 13: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Shared Ensembles Combina-tion

• sum: {52, 28, 44}• arithmean: {13, 7, 11}• product: {0, 1200, 900}• geomean: {0, 5.89, 5.48}• maximum: {40, 10, 30}• minimum: {0, 2, 1}• median: {6, 8, 6.5}

Page 14: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Shared Ensembles Combina-tion• Some transformations to be done to the original vectors at the

leaves before its propagation• good loser

• and 0 otherwise• bad loser:

• and 0 otherwise• majority:

• and 0 otherwise• difference:

Original Good loser Bad loser Majority Difference

{40, 10, 30} {80, 0, 0} {40, 0, 0} {1, 0, 0} {0, -60, -20}

{7, 2, 10} {0, 0, 19} {0, 0, 10} {0, 0, 1} {-5, -15, 1}

Page 15: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Experiments# Dataset Size Classes Nom. Attr. Num. Attr.

1 Balance-scale 625 3 0 42 Cars 1728 4 5 03 Dermatology 358 6 33 14 Ecoli 336 8 0 75 Iris 150 3 0 46 House-votes 435 2 16 07 Monks1 566 2 6 08 Monks2 601 2 6 09 Monks3 554 2 6 0

10 New-thyroid 215 3 0 511 Post-operative 87 3 7 112 Soybean-small 35 4 35 013 Tae 151 3 2 314 Tic-tac 958 2 8 015 Wine 178 3 0 13

Information about datasets used in the experiments.

Page 16: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

ExperimentsArit. Sum. Prod. Max. Min.

# Acc. Dev. Acc. Dev. Acc. Dev. Acc. Dev. Acc. Dev.

1 80.69 5.01 81.24 4.66 76.61 5.04 83.02 4.76 76.61 5.042 91.22 2.25 91.25 2.26 83.38 3.65 90.9 2.09 83.38 3.653 94.17 4.06 94.34 3.87 89.06 5.19 94 4.05 89.06 5.194 80.09 6.26 79.91 6.13 76.97 7.14 80.09 6.11 76.97 7.145 95.63 3.19 95.77 3.18 93.28 3.71 95.93 2.81 93.28 3.716 94.53 5.39 94.2 5.66 94 5.34 94.47 5.45 94.4 5.347 99.67 1.3 99.71 1.18 81 8.6 99.89 0.51 81 8.68 73.35 5.86 73.73 5.82 74.53 5.25 77.15 5.88 74.53 5.259 97.87 2 97.91 1.8 97.58 2.45 97.62 1.93 97.58 2.45

10 94.52 4.25 93.76 5.1 92.05 5.71 92.57 5.43 92.05 5.7111 62.5 16.76 63.25 16.93 61.63 17.61 67.13 14.61 61.63 17.6112 97.5 8.33 97.5 9.06 97.75 8.02 94.75 11.94 97.75 8.0213 63.6 12.59 64.33 11.74 62 12.26 63.93 12.03 62 12.2614 81.73 3.82 82.04 3.78 78.93 3.73 82.68 3.97 78.93 3.7315 94.06 6 93.88 6.42 91.47 7.11 92.53 6.99 91.47 7.11

Geomean 85.83 4.72 85.99 4.71 82.53 5.93 86.4 4.52 82.55 5.93

Comparison between fusion techniques

Page 17: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

ExperimentsMax+Orig Max+Good Max+Bad Max+Majo. Max+Diff.

# Acc. Dev Acc. Dev Acc. Dev Acc. Dev Acc. Dev

1 83.02 4.76 83.02 4.76 83.02 4.76 67.84 6.61 83.02 4.762 90.9 2.09 90.9 2.09 90.9 2.09 81.48 3.22 90.9 2.093 94 4.05 94 4.05 94 4.05 79.97 7.98 94 4.054 80.09 6.11 80.09 6.11 80.09 6.11 78.21 6.07 80.09 6.115 95.93 2.81 95.93 2.81 95.93 2.81 89.44 4.84 95.93 2.816 94.47 5.45 94.47 5.45 94.47 5.45 91.47 6.9 94.47 5.457 99.89 0.51 99.89 0.51 99.89 0.51 77.58 6.29 99.89 0.518 77.15 5.88 77.15 5.88 77.15 5.88 83.42 5.06 77.15 5.889 97.62 1.93 97.62 1.93 97.62 1.93 90.4 4.02 97.62 1.93

10 92.57 5.43 92.57 5.43 92.57 5.43 89.14 6.74 92.57 5.4311 67.13 14.61 67.13 14.61 67.13 14.61 68.25 15.33 67 14.612 94.75 11.94 94.75 11.94 94.75 11.94 50.75 28.08 94.75 11.9413 63.93 12.03 63.87 12.14 63.93 12.03 60.93 11.45 65.13 12.5314 82.68 3.97 82.68 3.97 82.68 3.97 68.26 4.35 82.68 3.9715 92.53 6.99 92.53 6.99 92.53 6.99 78.41 11.25 92.53 6.99

Gmean 86.4 4.52 86.39 4.53 86.4 4.52 76.11 7.19 86.49 4.54

Comparison between vector transformation methods

Page 18: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Experiments1 10 100 1000

# Acc. Dev. Acc. Dev. Acc. Dev. Acc. Dev.

1 76.82 4.99 77.89 5.18 83.02 4.76 87.68 4.142 89.01 2.02 89.34 2.2 90.9 2.09 91.53 2.083 90 4.72 91.43 4.67 94 4.05 94 4.054 77.55 6.96 78.58 6.84 80.09 6.11 80.09 6.115 93.63 3.57 94.56 3.41 95.93 2.81 95.56 2.836 94.67 5.84 94.27 5.69 94.47 5.45 95 5.147 92.25 6.27 96.45 4.15 99.89 0.51 100 0.018 74.83 5.17 75.33 5.11 77.15 5.88 82.4 4.529 97.55 1.89 97.84 1.86 97.62 1.93 97.75 1.92

10 92.62 5.22 93.43 5.05 92.57 5.43 90.76 5.8911 60.88 17.91 63 15.88 67 14.6 68.13 15.1112 97.25 9.33 96 10.49 94.75 11.94 95.5 10.8813 62.93 12.51 65 12.19 65.13 12.53 65.33 12.9214 78.22 4.25 79.23 4.03 82.68 3.97 84.65 3.3415 93.12 6.95 93.29 6.31 92.53 6.99 92.99 5

Gmean 83.88 5.52 84.91 5.3 86.49 4.54 87.47 4.47

Influence of the size of the multi-tree

Page 19: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Experiments

Page 20: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Experiments

Page 21: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

References• http://

www.lsi.us.es/iberamia2002/confman/SUBMISSIONS/254-escicucrri.pdf

• Shared Ensemble Learning using Multi-trees• V. Estruch, C. Ferri, J. Hernandez-Orallo, M.J. Ramirez-Quintana

• Wikipedia

• http://ai-times.tistory.com/77

Page 22: Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab

Thank you for listening