a fast ensemble pruning algorithm based on pattern mining process

24
A fast ensemble pruning A fast ensemble pruning algorithm based on algorithm based on pattern mining process pattern mining process 17 July 2009 Springer Science+Business Media, LLC 2009 69821514 洪洪洪 69821516 洪洪洪

Upload: mab

Post on 13-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

A fast ensemble pruning algorithm based on pattern mining process. 17 July 2009 Springer Science+Business Media, LLC 2009. 69821514 洪佳瑜 69821516 蔣克欽. Outline. M otive Introduction Method Experiment Conclusion. M otive. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A fast ensemble pruning algorithm based on pattern mining process

A fast ensemble pruning A fast ensemble pruning algorithm based on algorithm based on pattern mining processpattern mining process

17 July 2009Springer Science+Business Media, LLC 2009

69821514 洪佳瑜69821516  蔣克欽

Page 2: A fast ensemble pruning algorithm based on pattern mining process

OutlineOutline

MotiveIntroductionMethodExperimentConclusion

Page 3: A fast ensemble pruning algorithm based on pattern mining process

Motivemost ensemble pruning methods in the literature need much pruning time, and are mainly used to the domains where time can be sacrificed in order to improve accuracy. This makes them unsuitable for the applications requiring fast learning process, such as on-line network intrusion detection.

Page 4: A fast ensemble pruning algorithm based on pattern mining process

IntroductionIntroductionpattern mining based ensemble

pruning (PMEP)The algorithm converts an ensemble

pruning problem into a special pattern mining problem,which enables a FP-Tree to store the prediction results of all base classifiers, then uses a new pattern mining method to select base classifiers.

The final output of our PMEP approach is the pruned ensemblewith the highest correct value.

Page 5: A fast ensemble pruning algorithm based on pattern mining process

Properties of PMEP (1/2) Properties of PMEP (1/2) Firstly, it uses a transaction

database to represent the prediction results of all base classifiers. This representation enables a FP-Tree to compact the results, and the ensemble pruning process becomes to pattern mining problems.

Secondly, PMEP uses majority voting principle to decide the candidate classifiers before pattern mining process. For a given k, PMEP only considers the paths with length of [k/2 + 1] in the FP-Tree.

Page 6: A fast ensemble pruning algorithm based on pattern mining process

Properties of PMEP (2/2)Properties of PMEP (2/2)Thirdly, the pattern mining

method greedily selects a set of classifiers, instead of one in each iteration, which saves pruning time further.

Page 7: A fast ensemble pruning algorithm based on pattern mining process

Method Method (1/7)(1/7)CID Itemset Num

X1 h1, h2, h3, h4, h5, h6, h7, h8 8

X2 h2, h3, h4, h5, h7 5

X3 h2, h5, h6 3

X4 0

X5 h1, h2, h6, h8 4

X6 h1, h2, h3, h4, h5, h6, h7, h8 8

X7 h5, h6, h7 3

X8 h2, h5, h7 3

X9 h3, h4, h5, h7 4

X10 h1, h2, h5, h6 4

X11 h2, h5, h6 3

X12 h1, h2, h4, h6, h7 5

Page 8: A fast ensemble pruning algorithm based on pattern mining process

Method Method (2/7)(2/7)

CID Itemset Sorted Itemset

X2 h2, h3, h4, h5, h7 h2, h5, h7, h4, h3

X3 h2, h5, h6 h2, h5, h6

X5 h1, h2, h6, h8 h2, h6, h1, h8

X7 h5, h6, h7 h5, h6, h7

X8 h2, h5, h7 h2, h5, h7

X9 h3, h4, h5, h7 h5, h7, h4, h3

X10 h1, h2, h5, h6 h2, h5, h6, h1

X11 h2, h5, h6 h2, h5, h6

X12 h1, h2, h4, h6, h7 h2, h6, h7, h1, h4

For any i (1 ≤ i ≤ n), if we have:Li = L 或 Li =0we delete their corresponding rows from table T to reduce computational cost.

Page 9: A fast ensemble pruning algorithm based on pattern mining process

MethodMethod FP-Tree FP-Tree (3/7)(3/7)

Page 10: A fast ensemble pruning algorithm based on pattern mining process

Method (4/7) Method (4/7)

suppose k = 5, we have:

Page 11: A fast ensemble pruning algorithm based on pattern mining process

Method (5/7)Method (5/7)

Then deleteS.set={ h2, h5, h6 }, S.correct=3.

the largest count value, and its classifier set is {h2, h5, h6}.We add these three classifiers into S.set, and set S.correct=3.

Page 12: A fast ensemble pruning algorithm based on pattern mining process

Method (6/7)Method (6/7)

S.set={ h2, h5, h6, h7}, S.correct=7.

the first row has the maximum count value, so the base classifier h7 is selected.

Page 13: A fast ensemble pruning algorithm based on pattern mining process

Method (7/7)Method (7/7)the classifier

sets {h1} and {h4} have the same count value. Considering that the path of h1 is constructed earlier in Path-Table than that of h4, we add h1 into S.set.

S.set={h2, h5, h6, h7, h1}, and S. correct=8

Page 14: A fast ensemble pruning algorithm based on pattern mining process

advantagesadvantagesthe classifiers with negative

effect for ensemble have low probability to be selected because of low count values

the selected classifiers come from multiple paths, which makes them have low error correlation.

Page 15: A fast ensemble pruning algorithm based on pattern mining process

ExperimentExperiment

We compared the performance of our approach, PMEP, against Bagging (Breiman

1996), GASEN (Zhou et al. 2002), and Forward Selection (FS) (Caruana et al. 2004) in our empirical study.

Test platform: AMD 4000+, 2G RAM C programming language Linux operating system

Page 16: A fast ensemble pruning algorithm based on pattern mining process

All the tests are performed on 15 All the tests are performed on 15 data sets from UCI machine data sets from UCI machine leaning repositoryleaning repository

Page 17: A fast ensemble pruning algorithm based on pattern mining process

Results of prediction accuracy

Page 18: A fast ensemble pruning algorithm based on pattern mining process

Sizes of pruned ensembles for Sizes of pruned ensembles for each data set, the last one is each data set, the last one is the average result of all 15 data the average result of all 15 data setssets

Avg : 20 7.43 3.77 5.70

Page 19: A fast ensemble pruning algorithm based on pattern mining process

Results of pruningResults of pruning time (s)time (s)

Page 20: A fast ensemble pruning algorithm based on pattern mining process

ConclusionConclusionThe experimental results have

shown that the proposed PMEP achieves the highest prediction accuracy, and costs much less pruning time than GASEN and forward selection.

The design of our PMEP algorithm is aimed at majority voting method, how to extend the algorithm to other combination strategies is the other of our works.

Page 21: A fast ensemble pruning algorithm based on pattern mining process

THANKTHANK

Page 22: A fast ensemble pruning algorithm based on pattern mining process

algorithm

Page 23: A fast ensemble pruning algorithm based on pattern mining process
Page 24: A fast ensemble pruning algorithm based on pattern mining process