exploiting data topology in visualization and clustering of self-organizing maps
DESCRIPTION
Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps. Kadim Tas¸demir and Erzsébet Merényi TNN, Vol.20, No. 4, 2009, pp. 549-562. Presenter : Wei- Shen Tai 200 9 / 5/21. Outline. Introduction Previous work on visualization of SOM knowledge - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Exploiting Data Topology in Visualization andClustering of Self-Organizing Maps
Kadim Tas¸demir and Erzsébet Merényi
TNN, Vol.20, No. 4, 2009, pp. 549-562.
Presenter : Wei-Shen Tai
2009/5/21
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
2
Outline
Introduction Previous work on visualization of SOM knowledge Topology visualization through connectivity matrix
of SOM prototypes Clustering through CONNVIS
Discussions and conclusion Comments
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
3
Motivation Exploit underutilized component of the SOM’s knowledge:
data topology Inclusion of data topology in the SOM visualization provides more
sophisticated clues to cluster structure than existing SOM visualization approaches.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
4
Objective Integrate the data topology to the visualization of SOM
It can improve the cluster extraction of SOM map via “connectivity matrix” and its specific rendering over the SOM.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
5
Visualization for SOM
SOM is a topology preserving mapping Ideally, prototypes(neurons) those are neighbors in SOM map are also
neighbors (centroids of neighboring Voronoi polyhedra) in data space and vice versa.
Growing SOM It appears less robust than the Kohonen SOM because of the large
number of parameters needing adjustment. ViSOM
it requires a relatively large number of prototypes even for small data sets.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
6
Topology visualization through connectivity matrix of SOM prototypes
Induced Delaunay triangulation It can be determined from the relationships of the best
matching units (BMUs) and the second BMUs. CONN
It is a weighted analog of A, where the weights indicate the density distribution of the input data among the prototypes adjacent in M.
where, RFij means wi is the BMU and wj is the second BMU.
jiij RFRFjiCONN ),(
N
j iji RFRF1
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
7
CONNvis: visualization of the connectivity matrix
Line width The strength of the connection and reflects the density
distribution among the connected units. Line colors
A ranking of the connectivity strengths of wi .
Reveals most-to-least dense regions local to wi in data space.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
8
Assessment of topology preservation with CONNvis
Topology violations connected neural units that are not immediate neighbors
in map (forward topology violations); unconnected neural units that are immediate neighbors in
map (backward topology violations).
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
9
Clustering through CONNVIS
Remove weak connections that link any two coarse clusters X and Y at their boundary Step 1) Remove all weak connections to cluster X if the
number of weak connections to X is less than the number of weak connections to the other cluster Y.
Step 2) Remove the weakest connection if the connections of the prototype to the two clusters have different widths.
Step 3) Remove the lowest ranking connection if the number of weak connections to both clusters is the same and all connections at the boundary of these clusters are weak.
Step 4) Repeat Steps 1)–3) until this prototype has been disconnected from one of the clusters.
Step 5) Repeat Steps 1)–4) for all prototypes at this boundary.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
10
A Real-Data Application
A real remote sensing spectral image of Ocean City
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
11
Compare to U-matrix and ISOMAP
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
12
Discussion and conclusions
CONNVIS
Integrates data distribution into the customary Delaunay triangulation.
Shows both forward and backward topology violations on the SOM grid.
Makes cluster extraction more efficiently.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
13
Comments Advantage
This proposed method improves the visualization of SOM via combining induced Delaunay triangulation with connection strength.
It adopts the training processed of conventional SOM, but renders the resulting map via those connections between neurons after removing weak connection and boundary neurons.
Drawback In this paper, most of terminology are not as same as general used ones in
SOM, such as data vectors. If one connection, connects two neuron in the same cluster, cross over an
unrelated neuron (because it is not a boundary neuron for this cluster, so it is not removed by this propose method), it will makes the user confuse in the relation of these three neurons.
Application Data clustering.