dynamic hierarchical algorithms for document clustering

20
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 Dynamic hierarchical algorithms for document clustering Presenter : Wei-Hao Huang Authors : Reynaldo Gil-García, Aurora Pons-Porrata PRL, 2010

Upload: darius

Post on 18-Feb-2016

43 views

Category:

Documents


2 download

DESCRIPTION

Dynamic hierarchical algorithms for document clustering. Presenter : Wei- Hao Huang Authors : Reynaldo Gil- García , Aurora Pons- Porrata PRL, 2010. Outlines. Motivation Objectives Hierarchical clustering Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Dynamic hierarchical algorithms for document clustering

Presenter : Wei-Hao Huang  Authors : Reynaldo Gil-García, Aurora Pons-Porrata

PRL, 2010

Page 2: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines Motivation Objectives Hierarchical clustering Methodology Experiments Conclusions Comments

Page 3: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

3

Motivation· The World Wide Web and the number of text

documents managed in organizational intranets continue to grow at an amazing speed.

· In dynamic information environments is usually desirable to apply adaptive methods for document organization such as clustering.

Page 4: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives

4

· Static clustering methods mainly rely on having the whole collection ready before applying the algorithm.

· dynamic algorithms able to update the clustering without perform complete reclustering.

· Independent on the data order.

Page 5: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Hierarchical clustering· Agglomerative and divisive· Provide data-views at different levels

5

Page 6: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

6

Methodology

Dynamic hierarchical agglomerative framework

Specific algorithm:

Dynamic hierarchical compact (DHC)

Create disjoint hierarchies of clusters

Dynamic hierarchical star (DHS)

Produce overlapped hierarchies

Page 7: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

7

Dynamic hierarchical agglomerative framework

i is β-similarity j, if their similarity >= β

β-similarityβ is minimum similarity threshold

i is a β-isolated cluster if its similarity with all clusters < β

i j

Page 8: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

8

Dynamic hierarchical agglomerative framework

Page 9: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

9

Updating of the max-S graph

Page 10: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

10

Dynamic hierarchical compact: Connected component cover

Page 11: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

11

Dynamic hierarchical star:Star cover updating

Page 12: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Using 15 benchmark text collection.· Clustering quality· Sensitivity to parameters· Balance· Efficiency

12

Page 13: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Clustering quality- Overall F1 measure

13

Page 14: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Clustering quality- FCubed measure

14

Page 15: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Clustering quality- HF1

15

Page 16: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Sensitivity to parameters

16

Page 17: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Depth and width of the hierarchies

17

Page 18: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Efficiency

18

Page 19: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

19

Conclusions

· Methods are suitable for producing hierarchical

clustering solutions in dynamic environments

effectively and efficiently.

· Better balance between depth and width.

· Offer hierarchies easier to browse than traditional

algorithms.

Page 20: Dynamic hierarchical algorithms for document clustering

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

20

Comments· Advantages

─ Deal with dynamic data sets.─ Effectiveness and the efficiency of the clustering.

· Applications─ Hierarchical clustering