grc2011(m1 大木基至)_11.11.08
TRANSCRIPT
Decision Rule Visualization for Knowledge Discovery by Means of Rough Set Approach
Motoyuki Ohki, Masahiro Inuiguchi, Toshinobu Harada
Graduate School of Engineering Science, Osaka University
Faculty of Systems Engineering, Wakayama University
11/09/2011 GrC2011
00. Outline
01. Background and Purpose
02. Algorithm for Decision Rule Visualization
03. Visualization System
04. Evaluation Experiment
05. Summary and Future Work
1 / 25
01. Background
Rough Set Approach
- Attribute Reduction
- Induce Decision Rules
Application to various fields
2 / 25
01. Background 3 / 25
Sample Color (a) Shape (b)The number
of doors (c)Type (d) Preference
car1 colored (a1) nature (b1) two (c1) personal (d1) I'd like to buy (1)
car2 colored (a1) rounded (b2) four (c2) sporty (d2) I don't know (2)
car3 monochrome (a2) rounded (b2) four (c2) formal (d3) I don't know (2)
car4 monochrome (a2) nature (b1) four (c2) personal (d1) I'd like to buy (1)
car5 monochrome (a2) rounded (b2) two (c1) personal (d1) I don't know (2)
car6 colored (a1) rounded (b2) two (c1) sporty (d2) I'd like to buy (1)
A Decision Table Decision rule:If “b1” then “1”
Decision rule:If “a1 and d2” then “1”
We select useful decision rules among many rules.
We apply the rules to actual problems.
01. Background
Technical issue
- Difficulty of interpretation
- Depending on analysts
An example of inducing decision rules[1]
Difficulty of finding
usuful decision rules ...
[1] HOLON CREATE, Rough Sets Analysis Program, http://www.holon.com/program.html
4 / 25
01. Purpose
Proposing Algorithm for Visualization of
Decision Rule in Rough Set Approach
Supporting discovery of useful decision rule
[1] SOM Self-organization maps http://www.mindware-jp.com/Viscovery/self-organizing-maps.html
[2] Purple Insight MineSet http://journal.mycom.co.jp/news/2006/06/28/347.html
[3] Natto View http://www.holon.com/program.html
Examples of visual data mining [1,2,3]
5 /25
02. Methods used in the proposed visualization
Three Methods
(i) The decision matrix-based rule induction
(ii) Calculation of Co-occurrence Rates
(iii) Hayashi’s Quantification Method Ⅳ
We evaluate the dependencies between attribute values and conclusions quantitatively.
6 / 25
02. Co-occurrence Rate
Definition
- Degrees of the dependencies “between attribute values”
and “between attribute values and conclusion”
- Jaccard coefficient
, |X| : cardinality of set X
Formula
7 / 25
Sample Color (a) Shape (b)The number
of doors (c)Type (d) Preference
car1 colored (a1) nature (b1) two (c1) personal (d1) I'd like to buy (1)
car2 colored (a1) rounded (b2) four (c2) sporty (d2) I don't know (2)
car3 monochrome (a2) rounded (b2) four (c2) formal (d3) I don't know (2)
car4 monochrome (a2) nature (b1) four (c2) personal (d1) I'd like to buy (1)
car5 monochrome (a2) rounded (b2) two (c1) personal (d1) I don't know (2)
car6 colored (a1) rounded (b2) two (c1) sporty (d2) I'd like to buy (1)
02. Co-occurrence Rate
Calculation Example
the rate between “a1” and “b1”
8 / 25
02. Hayashi’s Quantification Method Ⅳ
Definition
- A kind of multi-dimensional scaling
- Plot all objects in the two dimensional coordinate system
Algorithm
:
9 / 25
02. Flow of the Decision Rule Visualization
An
alysis O
utp
ut
Inp
ut
A decision table
1. We obtain the locations of attribute values in X-Y coordinate.
Attribute values
Calculate Jaccard coefficients between attribute values
Apply Hayashi’s quantification method
10 / 25
02. Flow of the Decision Rule Visualization
A decision table
2. We obtain the location of attribute values in Z coordinates.
An
alysis O
utp
ut
Inp
ut
Calculate Jaccard coefficients
between attribute values and conclusion
11 / 25
02. Flow of the Decision Rule Visualization
A decision table
Decision Rule:a1b2
3. Decision rules are represented as links.
An
alysis O
utp
ut
Inp
ut
Induce decision rules by rough set approach
Calculate C.I values
a1
b2
12 / 25
03. Visualization System
c1 0.500
Decision table
- Attribute values : 16
- Induced decision rules : 31
Decision rule : c1d3
13 / 25
Strongly dependent with the conclusion
Candidate for the
useful decision rules
Two evaluation experiments
- We check the efficiency and usefulness of visualization method.
[1] Product evaluation experiment
- To check the advantage of visualization method
[2] Numerical experiment
- To check the usefulness of decision rules selected by
examinees utilizing the visualization system
04. Evaluation Experiment 14 / 25
Procedure 1
Samples and attribute values
- 24 digital cameras as samples
- 7 condition attributes
ex) Face shape, Position of lens … etc.
Procedure 2
We ask three examinees about buying motivation of these
digital cameras.
- conclusion 1 : “I want to buy it”
- conclusion 2 : “I will not buy it”
04. Product Evaluation Experiment 15 / 25
Procedure 3
We compare the advantage of selecting decision rules
by the following two methods.
- one : Proposed Visualization Method
- the other : Commercial Software provided by HOLON[1]
04. Product Evaluation Experiment
Comparison
[1] HOLON CREATE Rough Sets Analysis Program http://www.holon.com/program.html
16 / 25
Evaluation of Commercial Software
List of decision rules with C.I values
Difficulty in finding the useful
decision rules
The selected decision rules are different
among examinees.
04. Product Evaluation Experiment
Decision rules and C.I values induced by a commercial software
Decision Rules C.I value
e2f3 0.167
b2f2 0.167
a2d2 0.167
c1f1g2 0.167
b1c1f1 0.167
a2f2g1 0.167
a2b1e2 0.167
b2e1 0.083
d2f3 0.083
a1d3 0.083
17 / 25
Evaluation of Visualization System
1. It is easy to understand the
strength of dependencies
at one look.
Examples
- e2 (no dial, Z-value = 0.450)
- c1 (shape of face is straight line,
Z-value = 0.429)
- g2 (shape of edge strip is rounded,
Z-value = 0.412)
04. Product Evaluation Experiment 18 / 25
Evaluation of Visualization System
2. We can find a weakly related
condition attribute values.
Examples
- f1, f2, and f3 are located
lower position
- “f” (location of flash) is not very
influential for this examinee’s
preference.
04. Product Evaluation Experiment 19 / 25
Evaluation of Visualization System
3. The length of linkes can
express imbalanced influence
of attribute values.
Examples
- “e2f3” : long link
→ unreliable decision rule
- “a2b1e2” : short link
→ reliable decision rule
04. Product Evaluation Experiment
Decision rules composed by two attribute values Decision rules composed by three attribute values
f3
e2
b1 a2
20 / 25
04. Numerical Experiment
Procedure 1
Partion “car” data set into ten subsets randomly
- “car” data set : obtained from UCI web site*1+
1 2 3 10 [1] UCI Machine Learning Repository http://archive.ics.uci.edu/ml/
21 / 25
a1c3
a1c2
d2b1
b1d2
a1c3
d2b1
04. Numerical Experiment
Procedure 2
Ask each of three examinees to select three decision
rules to each subsets of “car” data set
a1d2
a1c3
b1d2
22 / 25
04. Numerical Experiment
Procedure 3
Compare the selected three decision rules(Rule Set 1) with non-
selected decision rules(Rule Set 2) having the same C.I values
a1d2, a1c3, b1d2
Calculation of Average Accuracy
c2d2, b3c3 …
Rule Set 2
1 2 9 1 2 9
Rule Set 1
23 / 25
04. Numerical Experiment
Results of Average Accuracy
By the paired t-test with significance level α = 0.05, we confirmed the
advantage of Rule Set 1
to Rule Set 2.
We confirmed the
usefulness of the
proposed method.
24 / 25
Summary 1. We proposed a method of visualizing decision rules
2. We developed a visualization system based on the proposed method
3. We conducted two experiments. We confirmed the effectiveness and usefulness of the visualization system.
Future Work 1. To conduct more experiments with many different decision
tables.
2. To improve the system in order to enhance the precision of analysis method.
05. Summary and Future Work 25 / 25
Thank you for listening !
Motoyuki Ohki
Graduate School of Engineering Science, Osaka University
E-Mail : [email protected]
Appendix
24 digital cameras
00. Samples and Attribute
7 attribute values
00. Conventional Research
*1+ Y. Tomoto, T. Ohira, T. Nakamura, M. Kanoh, and H. Itoh, “Applying Multi-valued Decision Diagram to Visualization of If-Then Rules” Kansei Engineering International Journal, vol.9, no.2, 2010, pp.259-267.
*2+ A. Ito, T. Yoshikawa, T. Furuhashi, S. Mitsumatsu,“Profiling by Association Analysis using Hierarchical
Visualization Method” Kansei Engineering International Journal, vol.10, no.2, 2011, pp.205-212.
Multi-valued decision diagrams [1]
- This method uses a multi-valued
decision diagram.
Hierarchical visualization method[2]
- This method uses a hierarchical
graph structure.
The reason of selecting Jaccard coefficient - Attribute value X and attribute value Y
For example
(1) |X| = 100, |Y| = 1, |X∩Y| = 1, |X∪Y| = 100
Jaccard = 1/100 Simpson = 1
Cosine = 1/10 Dice = 2/101
(2) |X| = 100, |Y| = 100, |X∩Y| = 50, |X∪Y| = 150
Jaccard = 1/3 Simpson = 1/2
Cosine = 1/2 Dice = 1/2
00. Co-occurrence Rate 30 / 14