a fuzzy k-modes algorithm for clustering categorical data advisor : dr. hsu graduate :...

29
A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor Dr. Hsu Graduate Chien-Ming Hsiao Author Zhexue Huang and Michae l K. Ng 國國國國國國國國 National Yunlin University of Science and Technology

Upload: tamsin-shields

Post on 13-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

A Fuzzy k-Modes Algorithm for Clustering Categorical Data

Advisor : Dr. Hsu

Graduate : Chien-Ming Hsiao

Author : Zhexue Huang and Michael K. Ng

國立雲林科技大學National Yunlin University of Science and Technology

Page 2: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Outline Motivation Objective Introduction Notation Hard and fuzzy k-means algorithms Hard and fuzzy k-Modes algorithms Experimental Results Conclusions Personal Opinion

Page 3: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Motivation

Working only on numeric data limits the use of these k-means-type algorithms in data mining.

Most algorithms for clustering categorical data suffer from a common efficiency problem when applied to massive categorical-only data sets.

Page 4: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Objective

To tackle the problem of clustering large categorical data sets in data mining

Page 5: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Introduction

Fuzzy versions of k-means algorithm

Each pattern is allowed to have membership functions to all clusters.

Working only on numeric data limits the use of these k-means-type algorithms in such areas data mining.

Page 6: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Introduction

To cluster categorical data methods

the k-means algorithm [Ralambondrainy, 1995] hierarchical clustering methods [Gower, 1991] the PAM algorithm [Kaufman et al, 1990] the fuzzy-statistical algorithms [Woodbury, 1974] The conceptual clustering methods [Michalski, 1983]

Page 7: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Notation The set of objects to be clustered is stored in a database

table T defined by a set of attributes A1, A2,…, Am.

objects. ofset a be XLet 21 n,X,,XX n

.,,, as drepresente is Object 21i i,mi,i, xxxX

Page 8: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-means algorithms

Let X be a set of n objects described by m numeric attributes.

Page 9: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-means algorithms

The usual method toward optimization of F is to use partial optimization for Z and W

fix Z and find necessary conditions on W to minimize F Fix W and minimize F with respect to Z

Page 10: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-means algorithms

Theorem 1 Let be fixed and consider Problem (P1)

Page 11: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-means algorithms

Theorem 2 Let be fixed and consider Problem (P2)

Page 12: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-means algorithms

The complexity of the algorithm O(tkmn)

The space of the algorithm O(n(m+k) + km)

Page 13: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-Modes algorithms

Using a simple matching dissimilarity measure for categorical objects

Replacing the means of clusters with the modes

Using a frequency-based method to find the modes

Page 14: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-Modes algorithms

Let X and Y be two categorical objects X = Y =

The simple matching dissimilarity measure between X and Y is defined as follows:

Page 15: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-Modes algorithms

Using a frequency-based method to update Z

The Hard k-modes Update Method

The Fuzzy k-modes Update Method

Page 16: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-Modes algorithms

Theorem 3 : The Hard k-modes Update Method The category of attribute Aj of the cluster mode Zl is det

ermined by the mode of categories of attribute Aj in the set of objects belonging to cluster l

the quantity

Page 17: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學
Page 18: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-Modes algorithms

Theorem 4 : The Fuzzy k-modes Update Method The category of attribute Aj of the cluster mode Zl is giv

en by the category that achieves the maximum of the summation of wli to cluster l over all categories.

the quantity

Page 19: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學
Page 20: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-Modes algorithms

Theorem 5

.iterations ofnumber finite ain converges

algorithm modes-kfuzzy The 1. Let

Page 21: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Hard and fuzzy k-Modes algorithms

Page 22: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Experimental Results To evaluate the performance and efficiency of the

fuzzy k-modes algorithm

To compare the fuzzy k-modes algorithm with the conceptual k-means algorithm and the hard k-modes algorithm

Use real and artificial data Soybean disease data set.

Page 23: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Experimental Results

Page 24: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Experimental Results

Page 25: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Experimental Results

Page 26: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Experimental Results

Page 27: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Experimental Results

Page 28: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Conclusions

Introduced the fuzzy k-modes algorithm for clustering categorical objects based on extensions to the fuzzy k-means algorithm.

The consequence of Theorem 4 that allows the k-means paradigm to be used in generating the fuzzy partition matrix from categorical data

Page 29: A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Zhexue Huang and Michael K. Ng 國立雲林科技大學

Intelligent Database Systems Lab

Personal Opinion

The fuzzy partition matrix provides more information to help the user to determine the final clustering and to identify the boundary objects