multilingual document mining and navigation using self-organizing maps

Post on 29-Jan-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Multilingual document mining and navigation using self-organizing maps. Presenter : Keng -Yu Lin Author : Hsin -Chang Yang , Han-Wei Hsiao , Chung-Hong Lee IPM .2011. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Multilingual document mining and navigation using self-organizing maps

Presenter : Keng-Yu LinAuthor : Hsin-Chang Yang , Han-Wei Hsiao , Chung-Hong Lee

IPM .2011

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines· Motivation· Objectives· Methodology· Experiments· Conclusions· Comments

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

· Monolingual interface may limit the spread of users who unfamiliar with the language.

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

4

Objectives

· To propose an approach that could automatically arrange multilingual Web pages into a multilingual Web directory to break the language barriers in Web navigation.

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Preprocessing Word segmentation Stopword elimination Stemming Keyword selection

· Encoding All keywords of all documents are collected to build a vocabulary VE.

A document is encoded into a binary vector according to those keywords that occurred in it.

Ex: Xi=[0,1,1,0,1,0,1,1]

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

6

SOM Algorithm

=> document cluster map (DCM)=> keyword cluster map (KCM)

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Determining dominating clusters algorithm

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Evaluation of quality of generated hierarchies

8

(C1,C3)=4(C3,C5)=3(C1,C5)=3PK=(4+3+3)/3=3.33

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Multilingual web directory generation Semantic similarity

Structural similarity

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions

· The approach is fully automated and requires no human intervention.

· The result of the alignment can be applied to tackle tasks such as multilingual information retrieval.

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

· Advantage The research result can help people to break

language barrier.

· Applications Multilingual information retrieval.

12

top related