comparing the parallel automatic composition of inductive applications with stacking methods hidenao...

35
Comparing the Parallel Comparing the Parallel Automatic Composition of Automatic Composition of Inductive Applications Inductive Applications with Stacking Methods with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN {hidenao,yamaguti}@ks.cs.inf.shizu oka.ac.jp

Upload: jean-ford

Post on 27-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

Comparing the Parallel Automatic Comparing the Parallel Automatic Composition of Inductive ApplicationsComposition of Inductive Applications with Stacking Methods with Stacking Methods

Hidenao Abe & Takahira Yamaguchi

Shizuoka University, JAPAN

{hidenao,yamaguti}@ks.cs.inf.shizuoka.ac.jp

Page 2: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 2

ContentsContents

• Introduction

• Constructive meta-learning based on method repositories

• Case study with common data sets

• Conclusion

Page 3: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 3

Overview of meta-learning schemeOverview of meta-learning scheme

学習学習学習Learning Algorithms

Meta-LearningExecutor

データセットData Set

Meta Knowledge

A Better result to the given data setthan its done

by each single learning algorithm

Meta-learningalgorithm

Page 4: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 4

Selective meta-learning schemeSelective meta-learning scheme

• Integrating base-level classifiers, which are learned with different training data sets generating by– “Bootstrap Sampling” (bagging)– weighting ill-classified instances (boosting)

• Integrating base-level classifiers, which are learned with different algorithms, with– simple voting (voting)– constructing meta-classifier with a meta data set and a

meta-level learning algorithm (stacking, cascading)

They don’t work well,They don’t work well, when no base-level learning when no base-level learningalgorithm works well to the given data set !!algorithm works well to the given data set !!

-> Because they don’t de-compose base-level learning algorithms.

Page 5: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 5

ContentsContents

• Introduction• Constructive meta-learning based on method

repositories– Basic idea of our constructive meta-learning– An implementation of the constructive meta-learning called

CAMLET– Parallel execution and search for CAMLET

• Case Study with common data sets• Conclusion

Page 6: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 6

De-composition&Organization

Search &Composition+

C4.5C4.5

VSVS

CSCS

AQ15AQ15

Analysis of RepresentativeInductive Learning Algorithms

Basic Idea of ourBasic Idea of ourConstructive Meta-LearningConstructive Meta-Learning

Page 7: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 7

CSCS

AQ15AQ15

Analysis of Representative Inductive Learning Algorithms

Basic Idea ofBasic Idea ofConstructive Meta-LearningConstructive Meta-Learning

De-composition&Organization

Search &Composition+

Page 8: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 8

Analysis of Representative Inductive Learning Algorithms

Automatic Composition of Inductive Applications

De-composition&Organization

Search &Composition+

Basic Idea ofBasic Idea ofConstructive Meta-LearningConstructive Meta-Learning

Organizing inductive learning methods,control structures and treated objects.

Page 9: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 9

Analysis of Representative Inductive Analysis of Representative Inductive Learning AlgorithmsLearning Algorithms

• Version Space• AQ15• ID3• C4.5• Boosted C4.5• Bagged C4.5• Neural Network with back propagation• Classifier Systems

We have analyzed the following 8 learning algorithms:

We have identified 22 specific inductive learning methods.

Page 10: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 10

Inductive Learning Method RepositoryInductive Learning Method Repository

Page 11: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 11

Data Type HierarchyData Type Hierarchy

Organization of input/output/reference data typesOrganization of input/output/reference data types for inductive learning methods for inductive learning methods

classifier-set

data-set

If-Then rulestree (Decision Tree)

network (Neural Net)

training data set

test data set

Objects

validation data set

Page 12: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 12

Identifying Inductive Learning Identifying Inductive Learning Methods and Control StructuresMethods and Control Structures

GeneratingGeneratingTraining & Training & ValidationValidationData SetsData Sets

GeneratingGeneratinga classifier a classifier

setset

Evaluating Evaluating a classifiera classifier

setset

Modifying Modifying training and training and

validation data setsvalidation data sets

Modifying Modifying a classifiera classifier

setset

EvaluatingEvaluating classifier setsclassifier sets

to test to test data setdata set

Modifying Modifying a classifiera classifier

setset

Page 13: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 13

CAMLETCAMLET

CAMLET:CAMLET:a Computer Aided Machine Learning Engineering Toola Computer Aided Machine Learning Engineering Tool

Method RepositoryData Type HierarchyControl Structures

Compile

Go & Test

Construction

Instantiation

Go to the goal accuracy?

No Yes

Data Sets, Goal Accuracy

Inductive application, etc.

Refinement

User

Page 14: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 14

CAMLETCAMLET

CAMLET with a parallel environmentCAMLET with a parallel environment

Method RepositoryData Type HierarchyControl Structures

Compile

Go & Test

ExecutionLevel

Construction

Instantiation

Go to the goal accuracy?

No Yes

Data Sets, Goal Accuracy

Inductive application, etc.

Composition Level

Refinement

SpecificationsData Sets

Results

User

Page 15: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 15

Parallel Executions of Inductive Parallel Executions of Inductive ApplicationsApplications

Constructing specs

Sending specs

Initializing

R: receiving ,Ex: executing, S: sending

Composition Level is necessary one process

Execution Level is necessary one or more process elementsR R R R

Waiting

Executing

Ex Ex Ex Ex

Receiving a result

Receiving

Ex Ex S Ex

Refining the specSending refined one

Refining & Sending

Ex Ex ExR

Page 16: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 16

Refinement of Inductive Applications Refinement of Inductive Applications with GAwith GA

t Generation

Selection

Parents

Crossover andMutation

Executed Specs &Their Results

X Generation

execute& add

1. Transform executed specifications to chromosomes

2. Select parents with “Tournament Method”

3. Crossover the parents and Mutate one of their children

4. Execute children’s specification, transforming chromosomes to specs

* If CAMLET can execute more than two I.A at the same time, some slow I.A will be added to t+2 or later Generation.

t+1 Generation

Children

Page 17: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 17

CAMLETCAMLET

CAMLET on a parallel GA refinementCAMLET on a parallel GA refinement

Method RepositoryData Type HierarchyControl Structures

Compile

Go & Test

ExecutionLevel

Construction

Instantiation

Go to the goal accuracy?

No Yes

Data Sets, Goal Accuracy

Inductive application, etc.

Composition Level

Refinement

SpecificationsData Sets

Results

User

Page 18: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 18

ContentsContents

• Introduction

• Constructive meta-learning based on method repositories

• Case Study with common data sets– Accuracy comparison with common data sets– Evaluation of parallel efficiencies in CAMLET

• Conclusion

Page 19: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 19

Set up of accuracy comparison Set up of accuracy comparison with common data setswith common data sets

• We have used two kinds of common data sets– 10 Statlog data sets– 32 UCI data sets (distributed with WEKA)

• We have input goal accuracies to each data set as the criterion.

• CAMLET has output just one specification of the best inductive application to each data set and its performance.– CAMLET has searched about 6,000 inductive applications for the best

one, executing up to one hundred inductive applications.

• 10 “Execution Level” PEs have been used to each data set.

• Stacking methods implemented in WEKA data mining suites

Page 20: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 20

Statlog common data setsStatlog common data sets

Dataset #Class #Att. (Nom:Num) #Instances Evaluationaustralian 2 14 (6:8) 690 10CVdiabetes 2 8 (0:8) 768 10CVdna 3 180 (180:0) 2,000/ 1,186 Testgerman 2 20 (17:3) 1,000 10CVheart 2 13 (6:7) 270 10CVletter- liacc 26 16 (0:16) 16,000/ 4,000 Testsatimage 6 36 (0:36) 4,435/ 2,000 Testsegment 7 18 (0:18) 2,317 10CVshuttle 7 9 (0:9) 43,500/ 14,500 Testvehicle 4 18 (0:18) 846 10CV

To generate 10-fold cross validation data sets’ pairs,We have used SplitDatasetFilter implemented in WEKA.

Page 21: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 21

Setting up of Stacking methods to StSetting up of Stacking methods to Statlog common data setsatlog common data sets

J4.8 (with pruning, C=0.25)

IBk (k=5)

Bagged J4.8 (Sampling rate=55%, 5 Iterations)

Bagged J4.8 (Sampling rate=55%, 10 Iterations)

Boosted J4.8 (5 Iterations) Boosted J4.8 (10 Iterations)

Part (with pruning, C=0.25)

Naïve Bayes

J4.8 (with pruning, C=0.25)

Naïve BayesClassification withLinear Regression

(with all of features)

Meta-level Learning Algorithms

Base-level Learning Algorithms

Page 22: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 22

Evaluation on accuracies to StatlEvaluation on accuracies to Statlog common data setsog common data sets

707580859095

100australian

diabetes

dna

german

heart

letter- liacc

satimage

segment

shuttle

vehicle

J 4.8 NB LR (Goal) CAMLET

Inductive applications composed by CAMLET have shown us asgood performance as that of stacking with Classification via Linear Regression.

Algorithm Average (%)J4.8 87.0NB 86.6LR 87.6CAMLET 87.4

Page 23: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 23

Setting up of Stacking methods to Setting up of Stacking methods to UCI common data setsUCI common data sets

J4.8 (with pruning, C=0.25)

IBk (k=5)

Bagged J4.8 (Sampling rate=55%, 10 Iterations)

Boosted J4.8 (10 Iterations)

Part (with pruning, C=0.25)

Naïve Bayes

J4.8 (with pruning, C=0.25)

Naïve BayesLinear Regression

(with all of features)

Meta-level Learning Algorithms

Base-level Learning Algorithms

Page 24: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 24

Evaluation on accuracies to UCI Evaluation on accuracies to UCI common data setscommon data sets

30.00

65.00

100.00anneal

audiologyautos

balance-scalebreast-cancer

breast-w

colic

credit-a

credit-g

diabetes

glass

heart-c

heart-hheart-statlog

hepatitishypothyroidionosphere

iriskr-vs-kp

laborletter

lymph

mushroom

primary-tumor

segment

sick

sonar

soybean

splicevehicle

votevowel

J 4.8

NB

LR

Max. of 3stackingmethods(Goal)CAMLET

Algorithm J4.8 NB LR Max. of 3 CAMLET

Average 81.92 81.97 81.07 83.79 83.23

Page 25: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 25

Analysis of parallel efficiency of Analysis of parallel efficiency of CAMLETCAMLET

• We have analyzed parallel efficiencies of executions of the case study with “Statlog common data sets”.– We have used 10 processing elements to execute each i

nductive applications.

• Parallel Efficiency is calculated as the following formula:

x

i

n

jije

utionTimeActualExecficiencyParallelEf

1 1

1<=x<100, n:#fold, e:Execution Time of Each PE

Page 26: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 26

Parallel Efficiency of the case stuParallel Efficiency of the case study with StatLog data setsdy with StatLog data sets

5 6 7 8 9 10

australian

diabetes

german

heart

segment

vehicle

5 6 7 8 9 10

dna

letter

satimage

shuttle

Parallel Efficiencies of 10-fold CV data sets

Parallel Efficiencies of training-test data sets

Page 27: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 27

Why the parallel efficiency of “satiWhy the parallel efficiency of “satimage” data set was so low?mage” data set was so low?

0

500

1000

1500

2000

2500

0 20 40 60 80 100

#Executions of Inductive Applications

Exec

utio

n Ti

me

(sec

.)

Because: - CAMLET could not find satisfactory inductive application to this data set within 100 executions of inductive applications.- In the end of search, inductive applications with large computational cost badly affected to parallel efficiency.

Page 28: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 28

ContentsContents

• Introduction

• Constructive meta-learning based on method repositories

• Case Study with common data sets

• Conclusion

Page 29: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 29

ConclusionConclusion• CAMLET has been implemented as a tool for

“Constructive Meta-Learning” scheme based on method repositories.

• CAMLET shows us a significant performance as a meta-learning scheme.– We are extending the method repository to construct data mining

application of whole data mining process.• The parallel efficiency of CAMLET has been

achieved greater than 5.93 times.– It should be improved

• with methods to equalize each execution time such as load balancing or/and three-layered architecture.

• with more efficient search method to find satisfactory inductive application faster.

Page 30: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 30

Thank you!Thank you!

Page 31: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 31

Stacking: training and predictionStacking: training and predictionTraining phase Prediction phase

Training Set

Training Set(Int.)

Test Set(Int.)

Training Set Test Set

Meta-levelTraining Set

Meta-levelclassifier

Meta-level learning

ベースレベル分類器

ベースレベル分類器

Base-levelclassifiers

Learning ofBase-level classifiers

Transformation

Repeatingwith CV

Meta-levelTest Set

Transformation

Prediction

Results of Preduction

ベースレベル分類器

ベースレベル分類器

Base-levelclassifiers

Meta-levelclassifiers

Learning ofBase-level classifiers

Page 32: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 32

How to transforme base-level attriHow to transforme base-level attributes to meta-level attributesbutes to meta-level attributes

Att. A Att. B Att. C Classesa1 0.3 c3 0a2 5 c2 1

… … … …

Classifier byAlgorithm 1

Learning with Algorithm 1

Prob. Of Predicting "0"Prob. Of Predicting "1"0.99 0.01 00.4 0.6 1

… … …

Algorithm 1Classes

1. Getting prediction probability to each class value with the classifier.2. Adding base-level class value of each base-level instance

A algorithmC class values#Meta-level Att. = AC

Page 33: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 33

Meta-level classifiers (Ex.)Meta-level classifiers (Ex.)

The prediction probability of”0”With classifier by algorithm 1

The prediction probability of”1”With classifier by algorithm 2

The prediction probability of”1”With classifier by algorithm 1

The prediction probability of”0”With classifier by algorithm 3

Prediction“1”

....

≦0.1> 0.1

≦0.95> 0.95 > 0.9≦0.9

Prediction“0”

Prediction“1”

Page 34: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 34

Accuracies of 8 base-level learning aAccuracies of 8 base-level learning algorithms to Statlog data setslgorithms to Statlog data sets

StackingBase-level algorithms

Algorithms Avg.(10 data sets)J 4.8 84.84IBk(5) 84.32Part 84.10NaiveBayes 76.53Bagging(5) 85.29Bagging(10) 86.17Boosting(5) 85.56Boosting(10) 85.94Max 87.58

Accuracy(%)

CAMLETBase-level algorithms

Algorithms Avg.(10 data sets)C4.5 DT 81.07ID3 DT 81.76NeuralNetwork 63.09ClassifierSystems 64.85Bagging(5) 84.28Bagging(10) 85.17Boosting(5) 84.52Boosting(10) 85.59Max 87.04

Accuracy(%)

Page 35: Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN

H.Abe & T. Yamaguchi Parallel and Distributed Computing for Machine Learning 2003 35

C4.5 Decision Tree without pruning C4.5 Decision Tree without pruning (reproduction of CAMLET)(reproduction of CAMLET)

Generating training and validation data setswith a void validation set

Generating a classifier set (decision tree)with entropy + information ratio

Evaluating a classifier setwith set evaluation

Evaluating classifier sets to test data setwith single classifier set evaluation

1. Select just one control structure2. Fill each method with specific methods3. Instantiate this spec.4. Compile the spec.5. Execute its executable code6. Refine the spec. (if needed)