grc2011(m1 大木基至)_11.11.08

31
Decision Rule Visualization for Knowledge Discovery by Means of Rough Set Approach Motoyuki Ohki, Masahiro Inuiguchi, Toshinobu Harada Graduate School of Engineering Science, Osaka University Faculty of Systems Engineering, Wakayama University 11/09/2011 GrC2011

Upload: ntt-communications

Post on 24-Jul-2015

237 views

Category:

Education


2 download

TRANSCRIPT

Page 1: GrC2011(M1 大木基至)_11.11.08

Decision Rule Visualization for Knowledge Discovery by Means of Rough Set Approach

Motoyuki Ohki, Masahiro Inuiguchi, Toshinobu Harada

Graduate School of Engineering Science, Osaka University

Faculty of Systems Engineering, Wakayama University

11/09/2011 GrC2011

Page 2: GrC2011(M1 大木基至)_11.11.08

00. Outline

01. Background and Purpose

02. Algorithm for Decision Rule Visualization

03. Visualization System

04. Evaluation Experiment

05. Summary and Future Work

1 / 25

Page 3: GrC2011(M1 大木基至)_11.11.08

01. Background

Rough Set Approach

- Attribute Reduction

- Induce Decision Rules

Application to various fields

2 / 25

Page 4: GrC2011(M1 大木基至)_11.11.08

01. Background 3 / 25

Sample Color (a) Shape (b)The number

of doors (c)Type (d) Preference

car1 colored (a1) nature (b1) two (c1) personal (d1) I'd like to buy (1)

car2 colored (a1) rounded (b2) four (c2) sporty (d2) I don't know (2)

car3 monochrome (a2) rounded (b2) four (c2) formal (d3) I don't know (2)

car4 monochrome (a2) nature (b1) four (c2) personal (d1) I'd like to buy (1)

car5 monochrome (a2) rounded (b2) two (c1) personal (d1) I don't know (2)

car6 colored (a1) rounded (b2) two (c1) sporty (d2) I'd like to buy (1)

A Decision Table Decision rule:If “b1” then “1”

Decision rule:If “a1 and d2” then “1”

We select useful decision rules among many rules.

We apply the rules to actual problems.

Page 5: GrC2011(M1 大木基至)_11.11.08

01. Background

Technical issue

- Difficulty of interpretation

- Depending on analysts

An example of inducing decision rules[1]

Difficulty of finding

usuful decision rules ...

[1] HOLON CREATE, Rough Sets Analysis Program, http://www.holon.com/program.html

4 / 25

Page 6: GrC2011(M1 大木基至)_11.11.08

01. Purpose

Proposing Algorithm for Visualization of

Decision Rule in Rough Set Approach

Supporting discovery of useful decision rule

[1] SOM Self-organization maps http://www.mindware-jp.com/Viscovery/self-organizing-maps.html

[2] Purple Insight MineSet http://journal.mycom.co.jp/news/2006/06/28/347.html

[3] Natto View http://www.holon.com/program.html

Examples of visual data mining [1,2,3]

5 /25

Page 7: GrC2011(M1 大木基至)_11.11.08

02. Methods used in the proposed visualization

Three Methods

(i) The decision matrix-based rule induction

(ii) Calculation of Co-occurrence Rates

(iii) Hayashi’s Quantification Method Ⅳ

We evaluate the dependencies between attribute values and conclusions quantitatively.

6 / 25

Page 8: GrC2011(M1 大木基至)_11.11.08

02. Co-occurrence Rate

Definition

- Degrees of the dependencies “between attribute values”

and “between attribute values and conclusion”

- Jaccard coefficient

, |X| : cardinality of set X

Formula

7 / 25

Page 9: GrC2011(M1 大木基至)_11.11.08

Sample Color (a) Shape (b)The number

of doors (c)Type (d) Preference

car1 colored (a1) nature (b1) two (c1) personal (d1) I'd like to buy (1)

car2 colored (a1) rounded (b2) four (c2) sporty (d2) I don't know (2)

car3 monochrome (a2) rounded (b2) four (c2) formal (d3) I don't know (2)

car4 monochrome (a2) nature (b1) four (c2) personal (d1) I'd like to buy (1)

car5 monochrome (a2) rounded (b2) two (c1) personal (d1) I don't know (2)

car6 colored (a1) rounded (b2) two (c1) sporty (d2) I'd like to buy (1)

02. Co-occurrence Rate

Calculation Example

the rate between “a1” and “b1”

8 / 25

Page 10: GrC2011(M1 大木基至)_11.11.08

02. Hayashi’s Quantification Method Ⅳ

Definition

- A kind of multi-dimensional scaling

- Plot all objects in the two dimensional coordinate system

Algorithm

:

9 / 25

Page 11: GrC2011(M1 大木基至)_11.11.08

02. Flow of the Decision Rule Visualization

An

alysis O

utp

ut

Inp

ut

A decision table

1. We obtain the locations of attribute values in X-Y coordinate.

Attribute values

Calculate Jaccard coefficients between attribute values

Apply Hayashi’s quantification method

10 / 25

Page 12: GrC2011(M1 大木基至)_11.11.08

02. Flow of the Decision Rule Visualization

A decision table

2. We obtain the location of attribute values in Z coordinates.

An

alysis O

utp

ut

Inp

ut

Calculate Jaccard coefficients

between attribute values and conclusion

11 / 25

Page 13: GrC2011(M1 大木基至)_11.11.08

02. Flow of the Decision Rule Visualization

A decision table

Decision Rule:a1b2

3. Decision rules are represented as links.

An

alysis O

utp

ut

Inp

ut

Induce decision rules by rough set approach

Calculate C.I values

a1

b2

12 / 25

Page 14: GrC2011(M1 大木基至)_11.11.08

03. Visualization System

c1 0.500

Decision table

- Attribute values : 16

- Induced decision rules : 31

Decision rule : c1d3

13 / 25

Strongly dependent with the conclusion

Candidate for the

useful decision rules

Page 15: GrC2011(M1 大木基至)_11.11.08

Two evaluation experiments

- We check the efficiency and usefulness of visualization method.

[1] Product evaluation experiment

- To check the advantage of visualization method

[2] Numerical experiment

- To check the usefulness of decision rules selected by

examinees utilizing the visualization system

04. Evaluation Experiment 14 / 25

Page 16: GrC2011(M1 大木基至)_11.11.08

Procedure 1

Samples and attribute values

- 24 digital cameras as samples

- 7 condition attributes

ex) Face shape, Position of lens … etc.

Procedure 2

We ask three examinees about buying motivation of these

digital cameras.

- conclusion 1 : “I want to buy it”

- conclusion 2 : “I will not buy it”

04. Product Evaluation Experiment 15 / 25

Page 17: GrC2011(M1 大木基至)_11.11.08

Procedure 3

We compare the advantage of selecting decision rules

by the following two methods.

- one : Proposed Visualization Method

- the other : Commercial Software provided by HOLON[1]

04. Product Evaluation Experiment

Comparison

[1] HOLON CREATE Rough Sets Analysis Program http://www.holon.com/program.html

16 / 25

Page 18: GrC2011(M1 大木基至)_11.11.08

Evaluation of Commercial Software

List of decision rules with C.I values

Difficulty in finding the useful

decision rules

The selected decision rules are different

among examinees.

04. Product Evaluation Experiment

Decision rules and C.I values induced by a commercial software

Decision Rules C.I value

e2f3 0.167

b2f2 0.167

a2d2 0.167

c1f1g2 0.167

b1c1f1 0.167

a2f2g1 0.167

a2b1e2 0.167

b2e1 0.083

d2f3 0.083

a1d3 0.083

17 / 25

Page 19: GrC2011(M1 大木基至)_11.11.08

Evaluation of Visualization System

1. It is easy to understand the

strength of dependencies

at one look.

Examples

- e2 (no dial, Z-value = 0.450)

- c1 (shape of face is straight line,

Z-value = 0.429)

- g2 (shape of edge strip is rounded,

Z-value = 0.412)

04. Product Evaluation Experiment 18 / 25

Page 20: GrC2011(M1 大木基至)_11.11.08

Evaluation of Visualization System

2. We can find a weakly related

condition attribute values.

Examples

- f1, f2, and f3 are located

lower position

- “f” (location of flash) is not very

influential for this examinee’s

preference.

04. Product Evaluation Experiment 19 / 25

Page 21: GrC2011(M1 大木基至)_11.11.08

Evaluation of Visualization System

3. The length of linkes can

express imbalanced influence

of attribute values.

Examples

- “e2f3” : long link

→ unreliable decision rule

- “a2b1e2” : short link

→ reliable decision rule

04. Product Evaluation Experiment

Decision rules composed by two attribute values Decision rules composed by three attribute values

f3

e2

b1 a2

20 / 25

Page 22: GrC2011(M1 大木基至)_11.11.08

04. Numerical Experiment

Procedure 1

Partion “car” data set into ten subsets randomly

- “car” data set : obtained from UCI web site*1+

1 2 3 10 [1] UCI Machine Learning Repository http://archive.ics.uci.edu/ml/

21 / 25

Page 23: GrC2011(M1 大木基至)_11.11.08

a1c3

a1c2

d2b1

b1d2

a1c3

d2b1

04. Numerical Experiment

Procedure 2

Ask each of three examinees to select three decision

rules to each subsets of “car” data set

a1d2

a1c3

b1d2

22 / 25

Page 24: GrC2011(M1 大木基至)_11.11.08

04. Numerical Experiment

Procedure 3

Compare the selected three decision rules(Rule Set 1) with non-

selected decision rules(Rule Set 2) having the same C.I values

a1d2, a1c3, b1d2

Calculation of Average Accuracy

c2d2, b3c3 …

Rule Set 2

1 2 9 1 2 9

Rule Set 1

23 / 25

Page 25: GrC2011(M1 大木基至)_11.11.08

04. Numerical Experiment

Results of Average Accuracy

By the paired t-test with significance level α = 0.05, we confirmed the

advantage of Rule Set 1

to Rule Set 2.

We confirmed the

usefulness of the

proposed method.

24 / 25

Page 26: GrC2011(M1 大木基至)_11.11.08

Summary 1. We proposed a method of visualizing decision rules

2. We developed a visualization system based on the proposed method

3. We conducted two experiments. We confirmed the effectiveness and usefulness of the visualization system.

Future Work 1. To conduct more experiments with many different decision

tables.

2. To improve the system in order to enhance the precision of analysis method.

05. Summary and Future Work 25 / 25

Page 27: GrC2011(M1 大木基至)_11.11.08

Thank you for listening !

Motoyuki Ohki

Graduate School of Engineering Science, Osaka University

E-Mail : [email protected]

Page 28: GrC2011(M1 大木基至)_11.11.08

Appendix

Page 29: GrC2011(M1 大木基至)_11.11.08

24 digital cameras

00. Samples and Attribute

7 attribute values

Page 30: GrC2011(M1 大木基至)_11.11.08

00. Conventional Research

*1+ Y. Tomoto, T. Ohira, T. Nakamura, M. Kanoh, and H. Itoh, “Applying Multi-valued Decision Diagram to Visualization of If-Then Rules” Kansei Engineering International Journal, vol.9, no.2, 2010, pp.259-267.

*2+ A. Ito, T. Yoshikawa, T. Furuhashi, S. Mitsumatsu,“Profiling by Association Analysis using Hierarchical

Visualization Method” Kansei Engineering International Journal, vol.10, no.2, 2011, pp.205-212.

Multi-valued decision diagrams [1]

- This method uses a multi-valued

decision diagram.

Hierarchical visualization method[2]

- This method uses a hierarchical

graph structure.

Page 31: GrC2011(M1 大木基至)_11.11.08

The reason of selecting Jaccard coefficient - Attribute value X and attribute value Y

For example

(1) |X| = 100, |Y| = 1, |X∩Y| = 1, |X∪Y| = 100

Jaccard = 1/100 Simpson = 1

Cosine = 1/10 Dice = 2/101

(2) |X| = 100, |Y| = 100, |X∩Y| = 50, |X∪Y| = 150

Jaccard = 1/3 Simpson = 1/2

Cosine = 1/2 Dice = 1/2

00. Co-occurrence Rate 30 / 14