intelligent database systems lab advisor : dr. hsu graduate : chien-ming hsiao author : bing...

22
Intelligent Database Systems Lab Advisor Dr. Hsu Graduate Chien-Ming Hsiao Author Bing Liu Yiyuan Xia Philp S. Yu 國國國國國國國國 National Yunlin University of Science and Technology Clustering Through Decision Tree Construction Pattern Recognition, 35 (2002), pp. 2783-2790.

Upload: audra-craig

Post on 04-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Advisor: Dr. Hsu

Graduate: Chien-Ming Hsiao

Author: Bing Liu

Yiyuan Xia

Philp S. Yu

國立雲林科技大學National Yunlin University of Science and Technology

Clustering Through Decision Tree Construction

Pattern Recognition, 35 (2002), pp. 2783-2790.

Page 2: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Outline

Motivation Objective Introduction Building Cluster Trees Experimental results Conclusion Personal Opinion

N.Y.U.S.T.

I.M.

Page 3: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Motivation

Decision tree algorithm uses a purity function to partition the data space into different class regions but the technique is not directly applicable to clustering.

N.Y.U.S.T.

I.M.

Page 4: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Objective

Propose a novel clustering technique, which is based on a supervised learning method called decision tree construction. Called CLTree (CLustering based on decision Trees)

N.Y.U.S.T.

I.M.

Page 5: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Introduction (1/3)

Decision tree Uses a purity function to partition the data space

into different class regions.

The technique is not directly applicable to clustering.

N.Y.U.S.T.

I.M.

Page 6: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Introduction (2/3)

If there are clusters in the data, the data points cannot be uniformly distributed in the entire space

Page 7: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Introduction (3/3)

The CLTree technique consists of two steps:

Cluster tree construction Use a modified decision tree algorithm with a new purity functi

on

Cluster tree pruning To find meaningful/useful clusters.

Page 8: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Building Cluster Trees

Page 9: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Decision TreeN.Y.U.S.T.

I.M.

Page 10: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Introducing N points (1/2)

We do not physically add these N points to the original data, but only assume their existence.

We now determine how many N points to add.

N.Y.U.S.T.

I.M.

Page 11: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Introducing N points (2/2)

The reason that we increase the number of N points of a node if it has more inherited Y points than N points

To avoid the situation where there may be too few N points left after some cuts or splits.

Page 12: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Two modifications to the Decision Tree Algorithm Compute the number of N points on the fly Evaluate on both sides of data points

The space has 25 data (Y) points and N points

N = 25 * 4/10 = 10

N = 15-10

Page 13: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

The New Criterion

There are two main problems with the gain criterion The cut with the best information gain tends to cut into clusters.

This results in severe loss of data points in clusters.

The gain criterion does not look ahead in deciding the best cut.

Page 14: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

The New Criterion

Find the initial cuts Look ahead to find better cuts Select the overall best cut

Page 15: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Page 16: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

User-Oriented Pruning of Clustering Trees Browsing

The user simply explores the tree him/himself to find meaningful clusters.

User-oriented pruning The tree is pruned using two user-specify parameters

min_y, min_rd

Page 17: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Page 18: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Merging adjacent Y regions and simplifying clusters

Page 19: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Experiments

Page 20: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Page 21: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

Conclusion & Personal Opinion

The author proposed a novel clustering technique, called CLTree, which is based on decision trees in classification research. Partitioning the data space into dense and sparse regions

The results show that it is both effective and efficient.

N.Y.U.S.T.

I.M.

Page 22: Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University

Intelligent Database Systems Lab

reviewN.Y.U.S.T.

I.M.