dynamic hierarchical algorithms for document clustering

Post on 18-Feb-2016

43 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dynamic hierarchical algorithms for document clustering. Presenter : Wei- Hao Huang Authors : Reynaldo Gil- García , Aurora Pons- Porrata PRL, 2010. Outlines. Motivation Objectives Hierarchical clustering Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Dynamic hierarchical algorithms for document clustering

Presenter : Wei-Hao Huang  Authors : Reynaldo Gil-García, Aurora Pons-Porrata

PRL, 2010

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines Motivation Objectives Hierarchical clustering Methodology Experiments Conclusions Comments

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

3

Motivation· The World Wide Web and the number of text

documents managed in organizational intranets continue to grow at an amazing speed.

· In dynamic information environments is usually desirable to apply adaptive methods for document organization such as clustering.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives

4

· Static clustering methods mainly rely on having the whole collection ready before applying the algorithm.

· dynamic algorithms able to update the clustering without perform complete reclustering.

· Independent on the data order.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Hierarchical clustering· Agglomerative and divisive· Provide data-views at different levels

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

6

Methodology

Dynamic hierarchical agglomerative framework

Specific algorithm:

Dynamic hierarchical compact (DHC)

Create disjoint hierarchies of clusters

Dynamic hierarchical star (DHS)

Produce overlapped hierarchies

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

7

Dynamic hierarchical agglomerative framework

i is β-similarity j, if their similarity >= β

β-similarityβ is minimum similarity threshold

i is a β-isolated cluster if its similarity with all clusters < β

i j

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

8

Dynamic hierarchical agglomerative framework

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

9

Updating of the max-S graph

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

10

Dynamic hierarchical compact: Connected component cover

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

11

Dynamic hierarchical star:Star cover updating

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Using 15 benchmark text collection.· Clustering quality· Sensitivity to parameters· Balance· Efficiency

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Clustering quality- Overall F1 measure

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Clustering quality- FCubed measure

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Clustering quality- HF1

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Sensitivity to parameters

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Depth and width of the hierarchies

17

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Efficiency

18

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

19

Conclusions

· Methods are suitable for producing hierarchical

clustering solutions in dynamic environments

effectively and efficiently.

· Better balance between depth and width.

· Offer hierarchies easier to browse than traditional

algorithms.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

20

Comments· Advantages

─ Deal with dynamic data sets.─ Effectiveness and the efficiency of the clustering.

· Applications─ Hierarchical clustering

top related