data mining feature selection - university of sistan and...

69
م عل م ت بي ه ا دData Mining & Feature Selection M.M. Pedram pedram@tmu.ac.ir Faculty of Engineering, Tarbiat Moallem University The 11 th Iranian Confernce on Fuzzy systems, 5-7 July, 2011

Upload: others

Post on 04-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

دااگشنه رتبيت معلم

Data Mining

&

Feature Selection

M.M. Pedram

[email protected]

Faculty of Engineering, Tarbiat Moallem University

The 11th Iranian Confernce on Fuzzy systems, 5-7 July, 2011

Page 2: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Contents

History

KDD Data Mining

Classification

Estimation (Regression)

Clustering

Market Basket Analysis

Association Rule Mining

Sequence Mining

Feature Selection

Filter

Wrapper

Rough set Theory

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 2

Page 3: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

A Taxnomy

wisdom

knowledge

information

data

ignorance

“Data is not information; information is not knowledge; knowledge is not

wisdom.” Gary Flake, Principal Scientist & Head of Yahoo! Research Labs, July 2004.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 3

Page 4: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

History

The birth of Data Mining/KDD:

1989 IJCAI Workshop on Knowledge Discovery in Databases.

1991-1994 Workshops on Knowledge Discovery in Databases

1995 – date: International Conferences on Knowledge Discovery in

Databases and Data Mining (KDD)

2001 – date: IEEE ICDM

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 4

Page 5: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

KDD Process

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation Data mining: the core of

knowledge discovery

process

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 5

Page 6: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

KDD Process

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 6

Page 7: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Data Mining Definition

The search for interesting patterns and models,

in large data collections,

using statistical and machine learning methods,

and high-performance computational infrastructure.

Data Knowledge DM

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 7

Page 8: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Why mine data – commercial viewpoint

Lots of data is being collected and warehoused

Web data, e-commerce

purchases at department/grocery stores

Bank/Credit Card transactions

Computers have become cheaper and more powerful

Competitive Pressure is Strong

Provide better, customized services (e.g. in Customer Relationship

Management)

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 8

Page 9: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Why mine data – scientific viewpoint

Data collected and stored at enormous speeds (GB/hour)

remote sensors on a satellite

telescopes scanning the skies

microarrays generating gene expression data

scientific simulations generating terabytes of data

Traditional techniques infeasible for raw data

Data mining may help scientists

in classifying and segmenting data

in Hypothesis Formation

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 9

Page 10: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Data Mining, A Multidisciplinary Field

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 10

Page 11: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Data Mining Techniques

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 11

Page 12: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Data Mining Techniques

Prediction Methods

Use some variables to predict unknown or future values of other variables.

Description Methods

Find human-interpretable patterns that describe the data.

Data Mining Techniques

Descriptive Predictive

Clustering

Market Basket Analysis

Classification

Regression

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 12

Page 13: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Data Mining Techniques

Classification

Estimation (Regression)

Clustering

Market Basket Analysis:

Association Rule Discovery (Mining)

Sequential Pattern Discovery (Mining)

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 13

Page 14: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Model

Variable

numeric

categorical

Model

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 14

Page 15: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Classification

Decision Tree

Multi Layer Perceptron & other Neural Networks

Production System Rules/Fuzzy Extensions

Model

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 15

Page 16: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Estimation

Output variables are numeric.

Rules

Neural Networks

Time Series

Regression/Fuzzy Extensions

Model

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 16

Page 17: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Prediction

Classification or Estimation for future

Decision Trees

Neural Networks

Rules

Regression/Fuzzy Extension

Time Series

Model

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 17

Page 18: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Clustering

A number of similar individuals that occur together, for example

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 18

Page 19: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Clustering

Customer database segmentation based on similar buying

patterns.

Identify similar Web usage patterns

Group houses in a town into neighborhoods based on similar

features.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 19

Page 20: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Market Basket Data

Extracting the pattern of Items that frequently happened together:

Uses: Placement

Advertising

Objective: increase sales and reduce costs

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 20

Page 21: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Association Rule:

implication : A B where AB = {}

Support of A B:

Percentage of transactions that contain AB

Confidence of A B:

( )( ) ( | ) ; ( )

( )

5000( | ) 0.5 50%

10000

A B ASupp A B

Conf A B P B A supp AA supp A X

P B A

( ) ( )

50000.357 35.7%

14000

A BSupp A B P A B

D

Association Rule Definitions

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 21

Page 22: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Association Rules Example دااگشنه رتبيت معلم

A B Supp Conf

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 22

Page 23: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Sequences دااگشنه رتبيت معلم

<GAATTCTCTGTAACCTCA>

< (ABC) (DE) F >

Element interval

< ( , , ) ( , ) > Example of time interval:

A sequence is an ordered list of symbols.

like

Time interval

Location interval

Segment

(Gap)

interval interval

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 23

Page 24: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Sequences دااگشنه رتبيت معلم

The best breakfast

And maximum energy <( , , , , , , , , )>

<( , , , ) >

<( , , , , , )>

<( , , , , ) >

It provides about 80% of

our energy

It provides about 30%

of our energy

It provides about 50%

of our energy

Sequence 1:

Sequence 2:

Sequence 3:

Example

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 24

Page 25: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Sequential Pattern Mining

Discovery of frequent sequential patterns (subsequences) in a

database of sequences.

< 1 2 3 >

< 1 3 4 >

< 1 2 3 4 > < 1 5 6 3 4 > < 7 1 2 3 > < 1 8 2 3 >

min_sup ≥ 50%

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 25

Page 26: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

What Is Sequential Pattern Mining?

A sequence database

Given min_sup =2, <(ab)c> is a sequential pattern.

SID sequence

10 <a(abc)(ac)d(cf)>

20 <(ad)c(bc)(ae)>

30 <(ef)(ab)(df)cb>

40 <eg(af)cbc>

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 26

Page 27: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Example (newspaper articles) دااگشنه رتبيت معلم

< >

typhoon flood,

landslide typhoon flood,

landslide

A series of daily newspaper articles

<typhoon (flood, landslide)>

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 27

Page 28: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Feature Selection

The 11th Iranian Conference on Fuzzy systems, 5-7 July,

2011

M.M. Pedram, [email protected]

Page 29: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Feature Selection

Finding the most compact and informative set of features, to

improve the efficiency or data storage and processing.

Data Cleaning

Data Integration

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 29

Page 30: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Classify Fruits

(Øx, Øy, Øx/Øy, curvature, color, hardness, weight, smell, …)

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 30

Page 31: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Risk Functional

A function of the parameters of the learning machine, assessing

how much it is expected to fail on a given task.

Classification: the error rate

Regression: the mean square error

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 31

Page 32: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Overfitting

Fit / Robustness Tradeoff

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 32

Page 33: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Ockham’s Razor

Principle proposed by William of Ockham in

the fourteenth century.

Of two theories providing similarly good

predictions, prefer the simplest one.

Shave off unnecessary parameters of your

models.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 33

Page 34: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Why Feature Selection?

Lot of inputs Lots of parameters & Large input

space

Curse of dimensionality and risks of overfitting !

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 34

Page 35: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

FS Methods Classification

Unsupervised Linear Nonlinear

Selection Correlation between inputs

Mutual information between inputs

Projection PCA Kohonen maps

Supervised Linear Nonlinear

Selection

Correlation between inputs &

Outputs

Mutual information between inputs

and outputs, Greedy algorithms,

Genetic algorithms

Projection Linear Discriminant Analysis,

Partial Least Squares Projection pursuit

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 35

Page 36: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

FS Methods Classification

Selection: choosing among the original features

easy (+)

interpretability of the features (+)

Projection: creating new features from the original ones

more general →possibly more efficient (+)

more difficult (-)

features not interpretable (-)

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 36

Page 37: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Supervised selection: filter versus wrapper

Supervising does not necessarily mean to use the model!

Filter

Note the final quality criterion.

Wrapper

Many (non)linear models are designed.

Selection

(Filter)

Relevance

criterion

-

All

features Feature

Subset

z

Selection Model

-

All

features Feature

Subset

y

ŷ

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 37

Page 38: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Important Questions in FS

Question1 : Subset relevance assessment

Among all 2d-1 possible subsets, which is the best one?

Question2 : Optimal Subset search

How not to consider all 2d-1 possible subsets?

There are two feature qualities that must be considered by FS

methods: relevancy and redundancy.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 38

Page 39: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Subset relevance assessment

Relevance is difficult to define!

Filter approach (model free):

a variable (or set of ) is relevant if it is statistically dependent on y.

P(y|xi) P(y)

Wrapper approach (uses model f):

a variable (or set of ) is relevant if the model built on it shows good

performances.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 39

Page 40: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Subset relevance assessment

Correlation

Mutual information

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 40

Page 41: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Correlation

Correlation, a linear filter

Measures linear dependencies (between -1 and +1)

0 indicates no linear relation.

Definition:

correlation between random variable Xand random variable Y

Suppose a dataset {x

j,y

j}

ov( , ) ( ) ( ) ( )

( ) ( ) ( ) ( )

C x y E xy E x E y

Var x Var y Var x Var y

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 41

Page 42: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Correlation

Strong correlation

Weak correlation

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 42

Page 43: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Correlation

Note: Correlation does not measure nonlinear relations!

x

x2

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 43

Page 44: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Correlation

Correlation:

is linear.

is parametric (it makes the hypothesis of a ...linear model).

does not explain causality.

is almost impossible to define between more than 2 variables.

is sensitive to outliers.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 44

Page 45: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Mutual information

Mutual information between random variable x and random

variable y measures how the uncertainty on y is reduced when x is

known (and vice versa).

Relevance of a subset XS: mutual information MI(XS; y) between

this subset and the target variable y.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 45

Page 46: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Mutual information

Some properties:

If x and y are independent, MI(y;x) = 0

MI(y;y) = H(y)

MI(y;x) is always non negative and less than min(H(y), H(x))

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 46

Page 47: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Mutual information

Nonlinear:

MI(X, Y) = P(X,Y) log dX dY

= KL( P(X,Y) || P(X)P(Y) )

P(X,Y)

P(X)P(Y)

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 47

Page 48: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Taxonomy of FS approaches

Feature Selection

Transformation Based

Linear

PCA PP MDS

Nonlinear

Isomap LLE

Selection Based

Filter

Rough Set

Fuzzy

Rough Set

Wrapper Hybrid

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 48

Page 49: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Rough Set

The 11th Iranian Conference on Fuzzy systems, 5-7 July,

2011

M.M. Pedram, [email protected]

Page 50: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Rough Sets

Modeling imperfect knowledge.

It can be employed to reduce the dimensionality of datasets.

Zdzislaw Pawlak: Rough Set Theory (1982)

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 50

Page 51: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Rough Sets

A rough set is itself the approximation of a vague concept (set) by

a pair of precise concepts, called lower and upper approximations,

that are a classification of the domain of interest into disjoint

categories.

It works by exploring and exploiting the granularity structure of

the data only.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 51

Page 52: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Information and Decision Systems

Information system : a table of data, consisting of objects (rows in

the table) and attributes (columns) = database

Decision system : an information system may be extended by the

inclusion of decision attributes.

A decision system is consistent if for every set of objects whose

attribute values are the same, the corresponding decision

attributes are identical.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 52

Page 53: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Example #1

Decision system : consists of 4 conditional features (a, b, c, d), a

decision feature (e), and eight objects.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 53

Page 54: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Nomenclature

I = (U,A) : an information system,

U : a nonempty set of finite objects (the universe of discourse),

A : a nonempty finite set of attributes

a : U → Va : an attribute, a ∈ A.

Va : the set of values that attribute a may take.

A = {C ∪ D}: decision systems,

P : Feature subset, P ⊆ A,

C : the set of input features,

D : the set of class indexes.

d ∈ D : a class index, itself a variable

d : U → {0, 1} such that for x ∈ U, d(x) = 1 if x has class d and

d(x) = 0 otherwise.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 54

Page 55: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Indiscernibility

With any P ⊆ A there is an associated equivalence relation IND(P ):

IND(P) = {(x, y) ∈ U×U | ∀a∈ P, a(x) = a(y)}

Note that this relation corresponds to the equivalence relation for

which two objects are equivalent if and only if they have the same

vectors of attribute values for the attributes in P.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 55

Page 56: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Indiscernibility

The partition of U, determined by IND(P), is denoted U/IND(P) or

U/P, which is simply the set of equivalence classes generated by

IND(P):

U/IND(P) = ⊗{ U/IND({a}) | a ∈ P }

Where

A ⊗ B = {X ∩ Y |X ∈ A, Y∈B, X∩ Y∅ }

The equivalence classes of the indiscernibility relation with respect

to P are denoted [x]P , x ∈ U.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 56

Page 57: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Indiscernibility

Example: if P = {b, c}, then objects 1, 6, and 7 are indiscernible, as

are objects 0 and 4.

IND(P) creates the following partition of U:

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 57

Page 58: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Lower and Upper Approximations

Let X ⊆ U. X can be approximated using only the information

contained within P by constructing the P-lower and P-upper

approximations of the classical crisp set X:

The tuple is called a rough set.

,PX PX

| [ ]

| [ ]

P

P

PX x x X

PX x x X

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 58

Page 59: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Positive, Negative, and Boundary Regions

Let P and Q be sets of attributes inducing equivalence relations

over U, then the positive, negative, and boundary regions are

defined as:

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 59

Page 60: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Positive, Negative, and Boundary Regions دااگشنه رتبيت معلم

. x

Upper Approximation PX

Set X

Lower Approximation PX

Equivalence Class: [x]P (Granules)

[x]P : the set of all points which are indiscernible with point x in terms of feature

subset P. (set of all points belonging to the same granule as of the point x in

feature space P)

P U

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 60

Page 61: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Positive, Negative, and Boundary Regions

POSP (Q) : The positive region comprises all objects of U that can

be classified to classes of U/Q using the information contained

within attributes P.

BNDP (Q) : The boundary region is the set of objects that can

possibly, but not certainly, be classified in this way.

NEGP (Q) : The negative region is the set of objects that cannot be

classified to classes of U/Q.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 61

Page 62: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Example #1

Let P = {b,c} and Q = {e}, then

S R S T T T S S

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 62

Page 63: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Feature Dependency

A set of attributes Q depends totally on a set of attributes P,

denoted P ⇒ Q, if all attribute values from Q are uniquely

determined by values of attributes from P.

If there exists a functional dependency between values of Q and P,

then Q depends totally on P.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 63

Page 64: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Feature Dependency and Significance

Dependency In rough set theory:

For P, Q ⊂ A, it is said that Q depends on P in a degree k (0≤ k ≤1),

denoted P ⇒k Q, if

|.| : the cardinality.

If k = 1, Q depends totally on P,

if 0 < k < 1, Q depends partially (in a degree k) on P,

if k = 0, Q does not depend on P.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 64

Page 65: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Feature Dependency and Significance

Example: the degree of dependency of attribute {e} on the

attributes {b,c} is:

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 65

Page 66: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Feature Dependency and Significance

Given P, Q and a feature a ∈ P, the significance of feature a upon Q

is defined by:

If the significance of feature a is 0, then the feature a is dispensible.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 66

Page 67: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Feature Dependency and Significance

For example #1, if P = {a,b,c} and Q = e, then

And calculating the significance of the three attributes gives

Thus, the attribute a is indispensable, but attributes b and c can be

dispensed.

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 67

Page 68: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

Reducts

For many application problems, it is often necessary to maintain a

concise form of the information system.

One way to implement this is to search for a minimal

representation of the original dataset.

For this, the concept of a reduct is introduced and defined as a

minimal subset R of the initial attribute set C such that for a given set

of attributes D, γR(D) = γC(D).

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 68

Page 69: Data Mining Feature Selection - University of Sistan and ...seminars.usb.ac.ir/Files/ifs11/fa-ir/Document/Pkargah/pedram.pdf · ملعمتيبتر هنشگااد Data Mining & Feature

?

The 11th Iranian Conference on Fuzzy systems, 5-7 July, 2011 M.M. Pedram, [email protected] 69