presenter : lin, shu -han authors : jeen-shing wang, jen- chieh chiang

Post on 22-Feb-2016

65 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm. Presenter : Lin, Shu -Han Authors : Jeen-Shing Wang, Jen- Chieh Chiang. PR (2008 ). Outline. Introduction of SVC Motivation Objective Methodology Experiments - PowerPoint PPT Presentation

TRANSCRIPT

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

A cluster validity measure with a hybrid parameter search method

for the support vector clustering algorithm

Presenter : Lin, Shu-HanAuthors : Jeen-Shing Wang, Jen-Chieh Chiang

PR (2008)

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC

SVC is from SVMs SVMs is supervised clustering technique

Fast convergence Good generalization performance Robustness for noise

SVC is unsupervised approach1. Data points map to HD feature space using a Gaussian kernel.

2. Look for smallest sphere enclose data.

3. Map sphere back to data space to form set of contours.

4. Contours are treated as the cluster boundaries.

3

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

To find the minimal enclose sphere with soft margin:

To solve this problem, the Lagrangian function:

4

a

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

5

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

Karush-Kuhn-Tucker complementarity:

6

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC -Sphere Analysis

To find the minimal enclose sphere with soft margin:

C : existence of outliers allowed

7

Wolfe dual optimization

problem a

Bound SV; Outlier

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC -Sphere Analysis

The distance (similarity) between x and a:

q : |clusters| & the smoothness/tightness of the cluster boundaries.

8

Mercer kernelKernel: Gaussian

a

Gaussian function:

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

9

Drawbacks of Cluster validation Compactness

Different densities or size As the # of clusters increases, it will monotonic decrease

Separation Irregular cluster structures

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

10

Their previous study Can handle

Different sizes Different densities Arbitrary shape

But…

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Objectives – A cluster validity method and a parameter search algorithm for SVC

Auto determine the two parameter: Increasing q lead to increasing # of clusters C regulates the existence of outliers and overlapping clusters

To Identify the optimal structure

11

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology- Idea

12

q is related to the densities of the clusters Each cluster structure corresponds to an interval of q Identify the optimal structure is equivalent to finding the

largest interval

N=64, max # of cluster = , 8 N

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology- Problem

13

How to locate overall search range of q How to detect outliers/noises How to identify the largest interval

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Locate range of q

14

Lower bound

Upper bound: Employ K-Means to get clusters, and get variance of each clusters vi

N

Ascending order: cluster size

n =3, the biggest 3 clusters’ variance

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Outlier Detection

Set q = qmax ,the tightest of q

15

outliersingleton

And we get Copt, remove these outlier

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – the largest interval

16

qopt

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – the largest interval

17

Fibonacci search: locate the interval wherethe cluster structure is the same

Bisection search

n: iteration

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Overview

18

Locate range of q

Outlier Detection

the largest interval

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments - Benchmark and Artificial Examples

19

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments - Outlier

20

Copt

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

21

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

22

Conclusions

A new measure: Inspired from the observations of q

Determine the optimal cluster structure with its corresponding range of q and C

qC

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

23

Comments

Advantage Inspired from observation of parameter

Drawback …

Application SVC DBSCAN: MinPts / Eps

top related