exploiting data topology in visualization and clustering of self-organizing maps

13
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps Kadim Tas¸demir and Erzsébet Merényi TNN, Vol.20, No. 4, 2009, pp. 549-562. Presenter : Wei-Shen Tai 2009/5/21

Upload: jessie

Post on 22-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps. Kadim Tas¸demir and Erzsébet Merényi TNN, Vol.20, No. 4, 2009, pp. 549-562. Presenter : Wei- Shen Tai 200 9 / 5/21. Outline. Introduction Previous work on visualization of SOM knowledge - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Exploiting Data Topology in Visualization andClustering of Self-Organizing Maps

Kadim Tas¸demir and Erzsébet Merényi

TNN, Vol.20, No. 4, 2009, pp. 549-562.

Presenter : Wei-Shen Tai

2009/5/21

Page 2: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

2

Outline

Introduction Previous work on visualization of SOM knowledge Topology visualization through connectivity matrix

of SOM prototypes Clustering through CONNVIS

Discussions and conclusion Comments

Page 3: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

3

Motivation Exploit underutilized component of the SOM’s knowledge:

data topology Inclusion of data topology in the SOM visualization provides more

sophisticated clues to cluster structure than existing SOM visualization approaches.

Page 4: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

4

Objective Integrate the data topology to the visualization of SOM

It can improve the cluster extraction of SOM map via “connectivity matrix” and its specific rendering over the SOM.

Page 5: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

5

Visualization for SOM

SOM is a topology preserving mapping Ideally, prototypes(neurons) those are neighbors in SOM map are also

neighbors (centroids of neighboring Voronoi polyhedra) in data space and vice versa.

Growing SOM It appears less robust than the Kohonen SOM because of the large

number of parameters needing adjustment. ViSOM

it requires a relatively large number of prototypes even for small data sets.

Page 6: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

6

Topology visualization through connectivity matrix of SOM prototypes

Induced Delaunay triangulation It can be determined from the relationships of the best

matching units (BMUs) and the second BMUs. CONN

It is a weighted analog of A, where the weights indicate the density distribution of the input data among the prototypes adjacent in M.

where, RFij means wi is the BMU and wj is the second BMU.

jiij RFRFjiCONN ),(

N

j iji RFRF1

Page 7: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

7

CONNvis: visualization of the connectivity matrix

Line width The strength of the connection and reflects the density

distribution among the connected units. Line colors

A ranking of the connectivity strengths of wi .

Reveals most-to-least dense regions local to wi in data space.

Page 8: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

8

Assessment of topology preservation with CONNvis

Topology violations connected neural units that are not immediate neighbors

in map (forward topology violations); unconnected neural units that are immediate neighbors in

map (backward topology violations).

Page 9: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

9

Clustering through CONNVIS

Remove weak connections that link any two coarse clusters X and Y at their boundary Step 1) Remove all weak connections to cluster X if the

number of weak connections to X is less than the number of weak connections to the other cluster Y.

Step 2) Remove the weakest connection if the connections of the prototype to the two clusters have different widths.

Step 3) Remove the lowest ranking connection if the number of weak connections to both clusters is the same and all connections at the boundary of these clusters are weak.

Step 4) Repeat Steps 1)–3) until this prototype has been disconnected from one of the clusters.

Step 5) Repeat Steps 1)–4) for all prototypes at this boundary.

Page 10: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

10

A Real-Data Application

A real remote sensing spectral image of Ocean City

Page 11: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

11

Compare to U-matrix and ISOMAP

Page 12: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

12

Discussion and conclusions

CONNVIS

Integrates data distribution into the customary Delaunay triangulation.

Shows both forward and backward topology violations on the SOM grid.

Makes cluster extraction more efficiently.

Page 13: Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

13

Comments Advantage

This proposed method improves the visualization of SOM via combining induced Delaunay triangulation with connection strength.

It adopts the training processed of conventional SOM, but renders the resulting map via those connections between neurons after removing weak connection and boundary neurons.

Drawback In this paper, most of terminology are not as same as general used ones in

SOM, such as data vectors. If one connection, connects two neuron in the same cluster, cross over an

unrelated neuron (because it is not a boundary neuron for this cluster, so it is not removed by this propose method), it will makes the user confuse in the relation of these three neurons.

Application Data clustering.