Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
Dynamic hierarchical algorithms for document clustering
Presenter : Wei-Hao Huang Authors : Reynaldo Gil-García, Aurora Pons-Porrata
PRL, 2010
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines Motivation Objectives Hierarchical clustering Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
3
Motivation· The World Wide Web and the number of text
documents managed in organizational intranets continue to grow at an amazing speed.
· In dynamic information environments is usually desirable to apply adaptive methods for document organization such as clustering.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives
4
· Static clustering methods mainly rely on having the whole collection ready before applying the algorithm.
· dynamic algorithms able to update the clustering without perform complete reclustering.
· Independent on the data order.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Hierarchical clustering· Agglomerative and divisive· Provide data-views at different levels
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
6
Methodology
Dynamic hierarchical agglomerative framework
Specific algorithm:
Dynamic hierarchical compact (DHC)
Create disjoint hierarchies of clusters
Dynamic hierarchical star (DHS)
Produce overlapped hierarchies
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
7
Dynamic hierarchical agglomerative framework
i is β-similarity j, if their similarity >= β
β-similarityβ is minimum similarity threshold
i is a β-isolated cluster if its similarity with all clusters < β
i j
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
8
Dynamic hierarchical agglomerative framework
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
9
Updating of the max-S graph
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
10
Dynamic hierarchical compact: Connected component cover
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
11
Dynamic hierarchical star:Star cover updating
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Using 15 benchmark text collection.· Clustering quality· Sensitivity to parameters· Balance· Efficiency
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Clustering quality- Overall F1 measure
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Clustering quality- FCubed measure
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Clustering quality- HF1
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Sensitivity to parameters
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Depth and width of the hierarchies
17
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Efficiency
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
19
Conclusions
· Methods are suitable for producing hierarchical
clustering solutions in dynamic environments
effectively and efficiently.
· Better balance between depth and width.
· Offer hierarchies easier to browse than traditional
algorithms.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
20
Comments· Advantages
─ Deal with dynamic data sets.─ Effectiveness and the efficiency of the clustering.
· Applications─ Hierarchical clustering