실험데이터처리 data mining (ch. 9 - ch. 10) using weka

72
실실실실실실실 Data mining (Ch. 9 - Ch. 10) Using Weka Major: Interdisciplinary program of the integrated biotechnology Graduate school of bio- & information technology Youngil Lim (N110), Lab. FACS Youngil Lim (N110), Lab. FACS phone: +82 31 670 5200 (secretary), +82 31 670 5207 (direct) phone: +82 31 670 5200 (secretary), +82 31 670 5207 (direct) Fax: +82 31 670 5445, mobile phone: +82 10 7665 5207 Fax: +82 31 670 5445, mobile phone: +82 10 7665 5207 Email: Email: [email protected] , homepage: , homepage: http://facs.maru.net

Upload: lydie

Post on 14-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka. Major: Interdisciplinary program of the integrated biotechnology Graduate school of bio- & information technology Youngil Lim (N110), Lab. FACS phone: +82 31 670 5200 (secretary), +82 31 670 5207 (direct) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

실험데이터처리

Data mining (Ch. 9 - Ch. 10)

Using Weka

Major: Interdisciplinary program of the integrated biotechnology

Graduate school of bio- & information technology

Youngil Lim (N110), Lab. FACSYoungil Lim (N110), Lab. FACSphone: +82 31 670 5200 (secretary), +82 31 670 5207 (direct)phone: +82 31 670 5200 (secretary), +82 31 670 5207 (direct)

Fax: +82 31 670 5445, mobile phone: +82 10 7665 5207Fax: +82 31 670 5445, mobile phone: +82 10 7665 5207Email: Email: [email protected], homepage:, homepage:   http://facs.maru.net

Page 2: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

Overview of this lecture

- Machine learning = acquisition of structural descriptions automatically or semi-auto. (it is similar as the brain development from repeating experiences)- Weka written in JAVA (object-oriented programming language) (JAVA is free to OS and its calculation is 2-3 times slower than C, C++ and Fortran- Java compiler (Java virtual machine) translate the byte-code into machine code

Information (data, database)

Knowledge(understanding, application,

prediction)

Data mining(extraction of useful information)

input (ch. 2)

output(ch. 3)

Relationships?ModelingStructural patternsTechnical tools: machine learning

Page 3: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

Outline of this lecturePart I. Machine learning tools and techniques

- Level 1: Ch 1. Applications, common problems

Ch 2. Input, concepts, instances and attributes

Ch 3. Output, knowledge representation

- Level 2: Ch 4. Numerical algorithms, the basic methods

- Level 3: Ch 5-6 (advanced topics)

Part II. Weka manual (ftp://facs/lim/lecture_related/weka3.4.exe)

- Level 1: Ch 9. Introduction of Weka

Ch 10. Explorer

- Level 2: Ch 11-15 (advanced options in Weka)

But, you need to read those chapters to make a paper on data mining

Page 4: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

Ch. 9. Introduction to Weka

- No single scheme ML is appropriate to all DM problems

- DM is an experimental science.- Weka is a collection of state-of-the-art ML algorithms- Weka includes

1) input data preparation (ARFF)2) various learning algorithm evaluations3) input data visualization4) visualization of ML result

Introduction

Page 5: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

- Weka workbench includes methods for DM

1) regression (numerical prediction)

2) classification

3) clustering

4) association rule

5) attribute-selection

9.1 What’s in Weka

Page 6: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

- classifier: learning methods (or algorithms)

- object editor: adjustment of tunable parameters of the classifier

- filter: tools for data preparation (filtering algorithm)

- 4 graphical user interfaces (GUI) of Weka

1) Explorer (for small/medium data size): main GUI Ch. 10

2) Knowledge flow (for large data set): design of configurations for streamed data processing, incremental learning data. Ch. 11

3) Experimenter: automatic running of classifier and filter with different parameter settings, parallel computing. Ch. 12

4) command-line interface in JAVA. Ch. 13

Terminology and components

9.2 How do you use it?

Page 7: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

Weka GUI chooser

9.2 How do you use it?

4 graphical user interfaces (GUI) of Weka

1) Explorer (for small/medium data size): main GUI Ch. 10

2) Knowledge flow (for large data set): design of configurations for streamed data processing, incremental learning data. Ch. 11

3) Experimenter: automatic running of classifier and filter with different parameter settings, parallel computing. Ch. 12

4) CLI in JAVA. Ch. 13

Page 8: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

Ch. 10. The Explorer

10.1 Getting started

10.2 Exploring the explorer

10.3 Filtering algorithms

10.4 Learning algorithms

10.5 Meta-learning algorithms

10.6 Clustering algorithms

10.7 Association-rule learners

10.8 Attribute selection

OutlineThis lecture does not

cover Ch-10.6 and Ch-10.7

Page 9: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

Ch. 10. The Explorer

Build a decision tree from the data:• Prepare the data (comma separated value format)• Fire up Weka• Load data• Select a decision tree construction method• Build a tree• Interpret the output

Procedure of using Explorer

Page 10: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

- ARFF by default (comma-separated value format)

- tags:

1) @relation (data title)

2) @attribute (variables)

3) @data (instances)

10.1 Getting started

(1) Preparing the data

Page 11: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1 Getting started

(1) Preparing the data

Open <weather.arff> using MS word or other editors

Page 12: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.1 Loading the data into the Explorer

Page 13: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.1 Loading the data into the Explorer

Page 14: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.2 Building a decision tree

Class attribute (dependent variable)

Page 15: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.3 Test options

The result of applying the chosen classifier will be tested according to the options that are set by clicking in the Test options box. There are four test modes:

1) Use training set. The classifier is evaluated on how well it predicts the class of the instances it was trained on.

2) Supplied test set. The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set... button brings up a dialog allowing you to choose the file to test on.

3) Cross-validation. The classifier is evaluated by cross-validation, using the number of folds that are entered in the Folds text field.

4) Percentage split. The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field.

Page 16: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.4 Cross-validation

- Cross-validation (repeated holdout):1) fold: number of partitioning of the data.2) 10-fold cross-validation is generally used

for a single and fixed data.3) divide randomly into 10 parts from the data4) 9 parts are used for training and 1 part is used for testing5) measure its error rate6) repeat 10 times of cross-validation on different training sets7) the overall error is the average of the 10 error rates.

dataset

Random sampling is required !(stratification)

Training dataset(9/10 of data)

Validation dataset(testing, 1/10 of data)

Prediction dataset(new data)

Witten & Frank (2005), data mining, p149-151

Page 17: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.5 Examining the output

Page 18: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.6 Doing it again

Page 19: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.7 working with models

Exercise: analyze iris dataset !

- Load iris data into Weka

- Find the classification rule

- Visualize the decision tree

- Visualize threshold curve

Page 20: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.1.8 When things go wrong

To see the error message, click the Log

button.

What’s going on?(memory available)

Page 21: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2 Exploring the Explorer

- We have six tabs1) preprocess: choose the data set and modify it.2) classify: train learning schemes that perform classification, regression, and evaluate them.3) cluster: learn clusters for the dataset4) associate: learn associate rules for the data and evaluate them5) select attributes: select the most relevant aspects in the dataset.6) visualize: view different two-dimensional plots of the data and interact with them.

Page 22: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2 Exploring the Explorer

- The bird (Weka) dances when Weka is active.

- shows how many con-current processes are running.

- The bird sits when Weka is non-active.

- The bird is standing but stops moving, it’s sick (something has gone wrong, and the Explorer should be restarted).

2

Page 23: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.1 Converting files to ARFF

- 3 file converters to arff

1) .csv (comma-separated value) .arff

2) .name and .data (C4.5’s native file) .arff

3) .bsi (binary serialized instances) .arff

Page 24: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.2 Using filters

Unsupervised attribute filter

Page 25: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.3 Training and testing learning schemes

Open file: cpu.with.vendor.arf

f

Page 26: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.3 Training and testing learning schemes

Classifiers>trees>M5P

Page 27: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.3 Training and testing learning schemes

How many leaves?How many nodes?

Page 28: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.3 Training and testing learning schemes

It gives a single linear regression model rather than the two of “trees>M5P”

Classifiers>functions>LinearRegression

Page 29: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.4 Visualizing error

What is better between M5P and linear regression?

Page 30: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.5 Do it yourself: the user classifier

- Open data>segment-challenge.arff

- Segment the visual image data into classes (grass, sky, cement …)

Classifiers>Trees>UserClassifier

data>segment-test.arff

Page 31: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.5 Do it yourself: the user classifier

The goal is to find a combination that separates the classes as clearly as possible.

Change X and Y axis !

Page 32: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.5 Do it yourself: the user classifier

Specify a region in the graph 1. Select instance 2. Rectangle 3. Polygon 4. Polyline

- clear: clear the selection- save: save instances in the

current tree node as an ARFF file

Page 33: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.5 Do it yourself: the user classifier

Accept to tree (right-click of your mouse on any blank space

Building trees manually is very tedious

Correctly classified instances: 40%In-correctly ones: 60%

Page 34: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.6 Using a metalearnerMeta-learner means a powerful user who controls Weka very well

Boosting decision stumps up to 10 times

Adaptive boosting

Page 35: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.6 Using a metalearnerMeta-learner means a powerful user who controls Weka very well

Page 36: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.7 Clustering and association rules

We skip clustering and association rules.So, 10.6 and 10.7 are also skipped.

In Ch. 4, 4.5 and 4.8 are also skipped.

Page 37: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.8 Attribution selection

We will learn more Ch. 10.8

Page 38: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.2.9 Visualization of data

2D scattering plots of every pair of attributes

Page 39: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3 Filtering algorithms

- Filtering of data (= attributes + instances)- All filters transform the input dataset in a way.- Two kinds of filter:

1) supervised (Section 7.2): to be used carefully2) unsupervised:

- Each filter has two types of distinction between attribute filters and instance filters.1) attribute filter: it works on the attribute of data2) instance filter: it works on the instance of data

- See Section 7.3 1) PCA (principal component analysis)2) Random projections

Page 40: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.1 Adding and removing attributes

- Add: insert a new attribute, whose value is empty.- Copy: copy existing attributes and their values.- Remove: it is the same as <remove> tab- RemoveType: remove all instances of the same type such as

nominal, numeric, string, or date.- AddCluster: apply a clustering algorithm to the data before

filtering it (see Section 10.6)- AddExpression: create a new attribute by applying a mathematical

function to numeric attributes.e.g., a1^2*a5/log(4*a7)

- NumericTransform: performs an arbitrary transformation by applying a given JAVA function to selected numeric attributes.

- Normalize: scale all numeric values to lie between 0 and 1.- Standardize: transforms all numeric values to have zero mean and

unit variance.

Page 41: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.2 Changing values

- SwapValues: just change the position of two values of a nominal attribute (it does not affect learning at all).

- MergeTwoValues: merge values of a nominal attribute into a single category.

- ReplaceMissingValues: replace each missing value with the mean for numeric attributes and the mode for nominal.1) if a class is set, missing values of that attribute are not replaced.

Page 42: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.3 Conversions

- Discretize: change the numeric attribute to the nominal attribute (section 7.2).1) equal-width binning2) equal-frequency binning

- PKIDiscretize: discretize numeric attributes using equal-frequency binning where the number of bins is the square root of the number of values. 83 instances without missing value are binned by 9 bins.

- MakeIndicator: convert a nominal attribute into a binary indicator attribute it is necessary when the numeric attribute is required for a ML scheme.

- NorminalToBinary: transform all multivalued nominal attributes ina dataset into binary ones (k-value attribute k-binary attribute)

- NumericToBinary: convert all numeric attributes into binary ones (if numeric value = 0, then binary value = 0. otherwise, binary value = 1).

- FirstOrder: take a difference between two attribute values.

Page 43: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.4 String conversions

- StringToNominal: - StringToWordVector:

Page 44: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.5 Time series

For time-series data,

- TimeSeriesTranslate: replace attribute values in the current instance with the equivalent attribute values of some previous (or future) instance.

- TimeSeriesDelta: replace attribute values in the current instance with the difference between the current value and the equivalent attribute value of some previous (or future) instance.

Page 45: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.6 Randomizing

These filters change values of the data.

- AddNoise: it introduces a noise to the data. It takes a nominal attribute and changes its value to other value by a given percentage.

- Obfuscate: rename the attribute name and anonymize data.- RandomProjection: see section 7.3

Page 46: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.7 Unsupervised instance filters

- Attribute filers: it affects all values of attribute (column of data)

- Instance filters: if affects all values of instance (raw of data)

Page 47: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.8 Randomizing and subsampling

- Randomize: the order of instances are randomized - Normalize: all numeric attributes are treated as a vector and

normalized to a given length.- Resample: it produces a random sample by sampling by

replacement- RemoveFolds: it first splits the data into a given number of cross-

validation folds and then reduces the data just one of them. If random number seed is provided, the dataset will be shuffled before the subset is extracted.

- RemovePercentage: it removes a given percentage of instnaces.- RemoveRange: it removes a certain range of instance numbers.- RemoveWithValues: it removes all instances that have certain

values above or below a certain threshold.

Page 48: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.9 Sparse instances

- NonSparseToSparse: - SparseToNonSparse:

Page 49: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.3.10 Supervised filters

- Supervised filters are affected by the class attribute.- We have two categories of supervised filters

1) attribute2) instance

- You need to be careful with them because they are not really preprocessing operations.

- Discretize: see section 7.2- NormialToBinary: see section 6.5- ClassOrder: it changes the ordering of the class values

- Resample: it is like the unsupervised instance filter, except that it maintains the class distribution in the subsample.

…..

Page 50: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4 Learning algorithms

- We have 7 categories in classification.1) Bayesian: document classification (e.g., google search)2) Trees: decision trees, divide-and-conquer (stump, node, leaf, model tree)3) Rules: covering approach (or excluding instances), the decision tree is converted to a set of logical expression.4) Functions: linear model, nonlinear model.5) Lazy (instance-based learning): distance function.6) metalearning algorithms: more powerful learner.7) Miscellaneous: divers

- Ch. 4 and Ch. 6 covers those algorithms.

Page 51: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.2 Decision trees

- J48 (see section 6.1 and 6.2): reimplementation of program C4.5, an outcome of Quinlan (1993) over 20 years.

Confidence threshold for pruning

Minimum number of instances permissible at a leaf

Size of pruning set: data is divided equally into that number and the last one used for pruning

This algorithm is valid for nominal class

attribute

Page 52: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.2 Decision trees

- Id3 (see Ch. 4): basic divide-and-conquer decision tree algorithm, it allows the nominal class.

- Decision Stump: boosting method, it builds on-level binary decision trees.

- RandomTree: construct a tree that considers a given number of random features at each node, performing no pruning.

- RandomForest (see section 7.5): it constructs random forests by bagging ensembles of random trees.

- REPTree (see section 6.2): it builds a decision or regression tree using information gain/variance reduction

- NBTree: a hybrid between decision trees and Naïve bayes- M5P (see section 6.5, Quinlan, 1992): model tree learner with linear

models- LMT (see section 7.5): it builds logistic model tress for nominal

class.- ADTree (see section 7.5): alternating decision tree using boosting.

Page 53: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.3 Classification rules

- ConjunctiveRule - DecisionTable- JRip- M5Rules- Nnge- OneR- Part- Prisim.- Ridor- ZeroR: it predicts the test data’s majority class (if nominal) or

average value (if numeric).

Page 54: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.4 Functions- Bayesian has a simple mathematical formulation- Decision trees and rules have linear regression models.- Functions give us more complicated mathematical models

We focus on

1) linear regression model

2) Support vector machine algorithm

3) Neural network

Page 55: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.4 Functions

- SimpleLinearRegression: linear regression model based on a single class attribute.

- LinearRegression: linear regression model for all numeric attributes- LeastMedSq: Implements a least median squared linear regression

utilizing the existing Weka LinearRegression class to form predictions (the solution has the smallest median-squared error)

- SMO (see section 6.3): it implements the sequential minimal optimization algorithm for training a support vector classifier.

- SMOreg: Support vector machine algorithm for numeric class attribute

- Linear model: really linear relationship between attributes- Support vector machine: linear model for nonlinear class boundaries

Page 56: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.4 Functions

- VotedPerceptron (see section 6.3): voted perceptron algorithm, Globally replaces all missing values, and transforms nominal attributes into binary ones.

- Window (see section 4.6): it modifies the basic perceptron to use multiplicative updates.

- PaceRegression: it builds linear regression modes using the new technique of Pace Regression (Wang & Witten, 2002).

- SimpleLogistic (see section 4.6): it builds logistic regression models

- RBFNetwork (see section 6.3): it implements a Gaussian radial basis function network, one kind of linear regression models.

Page 57: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: artificial neural network (ANN)

Najjar, Y.M., I.A. Basheer and M.N. Hajmeer (1997), Computational neural networks for predictive microbiology: I. methodology, Int. Journal of Food Microbiology, 34, 27-49.

Page 58: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks

Input layer

f(n)

p1

p2

p3

p4

.

.pr

w1,1

w1,rb1

n1 a1

n2 a2w2,1

f(n)

Hidden layer output layer

b2

Input layer

f(n)

p1

p2

p3

p4

.

.pr

w1,1

w1,rb1

n1 a1

n2 a2w2,1

f(n)

Hidden layer output layer

b2

)( 111 bpwfa sigmoid

)( 2122 bawfa purelinear

neuron

Linear regression model

Page 59: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks

)( 111 bpwfa sigmoid

)( 2122 bawfa purelinear

0

1

n

a

1

0

1

n

a

0.5

]5.2[

]5.0,3.0,2.0,0[

]3,2,1,1[

b

w

p

4.1

5.235.023.012.010

1

4

11

bpwni

ii

1978.01

14.1

ea

nea

1

1

]5.0[

]2.0,6.0,2.0[

]2,0,1[

b

w

pa1=?

Page 60: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks- MultilayerPerceptron (see section 6.3): it

trains back-propagation and validates (or calculates) feed-forward,

Nonlinear algorithm.

Set GUI to True to see ANN

Adds and connects up hidden layers in the network.

Filtering the data

The amount the weights are updated.

Momentum applied to the weights during updating.

Page 61: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks- MultilayerPerceptron (see section 6.3): it

trains back-propagation and validates (or calculates) feed-forward,

Nonlinear algorithm.

Define the structure and node number of hidden layers: e.g., [4,5] two hidden layers with 4 and 5 neurons, respectively

The number of epochs to train through

The percentage size of the validation set

Used to terminate validation testing. The value here dictates how many times in a row the validation set error can get worse before training is terminated.

Page 62: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks –CPU problemUsing two hidden layers with 4 and 5 neurons, fine

the correlation coefficient of this ANN?

Options: 10-fold cross validation, 500 epochs Open file: CPU.arff

Page 63: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks –CPU problemUsing two hidden layers with 4 and 5 neurons, fine

the correlation coefficient of this ANN?

Options: 10-fold cross validation, 500 epochs

Classify: Functions>multilayer perceptron

10-fold cross validation

Page 64: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks –CPU problemUsing two hidden layers with 4 and 5 neurons, fine

the correlation coefficient of this ANN?

Options: 10-fold cross validation, 500 epochs

options

Page 65: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks –CPU problemUsing two hidden layers with 4 and 5 neurons, fine

the correlation coefficient of this ANN?

Options: 10-fold cross validation, 500 epochs

After start in the main Weka window, you can see the ANN GUI

1. Click “accept”2. Clikc “start” 3. Until 10-fold repeats

Page 66: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks –CPU problemUsing two hidden layers with 4 and 5 neurons, fine

the correlation coefficient of this ANN?

Options: 10-fold cross validation, 500 epochs

You can see the number of cross-validation increases.

The bird will dance until the 10-fold validation is over.

Page 67: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.4.5 Functions: neural networks –CPU problemUsing two hidden layers with 4 and 5 neurons, fine

the correlation coefficient of this ANN?

Options: 10-fold cross validation, 500 epochs

After 10-fold validation, the results appear in the Weka main window.

You can now analyze the results

Page 68: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.5 Meta-learning algorithms

For powerful usage of ML, please read this section !

We skip section 10.6 and 10.7

Page 69: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.8 Attribute selection

It works, selecting one of the attribute evaluators and, at the same time, one of the search methods

Page 70: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.8.1 Attribute subset evaluators

Subset evaluators take a subset of attributes and return a numeric measure that guides the search.

- CfsSubsetEval (see section 7.1): it finds the highest correlation attributes with the class, but they have low inter-correlation between the attributes, one of the filtering methods

- ConsistencySubsetEval: one of the filtering methods

- ClassifierSubsetEval: one of the wrapper methods, it uses a classifier and evaluate sets of attributes on the training data.

Page 71: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.8.2 Single-attribute evaluators

Single-attribute evaluators should be used with the Ranker search method.

- InfoGainAttributeEval: it evaluates attributes by measuring their information gain w.r.t. the class.

- ChiSquaredAttributeEval: it evaluates attributes by computing the chi-squared statistic w.r.t. the class.

- SVMAttributeEval: evalution with linear support vector machine.

- PrincipalComponents (see section 7.3): PCA

Page 72: 실험데이터처리 Data mining (Ch. 9 - Ch. 10) Using Weka

10.8.3 Search methods

- Search methods are optimization algorithms to find a good subset.

- Subset evaluators are object functions to be optimized

- GreedyStepwise:

- GeneticSearch:

- RankSearch: