presenter: chun-ping wu

16
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-Nyoung Seon, Songwook Lee, Jungynu Seo IPM 2007 國國國國國國國國 National Yunlin University of Science and Technology

Upload: river

Post on 23-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary. Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-Nyoung Seon , Songwook Lee, Jungynu Seo. 國立雲林科技大學 National Yunlin University of Science and Technology. IPM 2007. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph

using corpus and dictionary

Presenter: Chun-Ping Wu  Authors: Yeohoon Yoon, Choong-Nyoung Seon, Songwook Lee, Jungynu Seo

IPM 2007

國立雲林科技大學National Yunlin University of Science and Technology

Page 2: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation Objective Methodology Experiments Conclusion Comments

2

Page 3: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

The Word Sense Disambiguation is a common problem in natural language processing.

Traditional approaches only consider the co-occurrence probability alone.

3

Sample: I deposit some money in the bank.

Options:bank = 銀行?bank = 堤 ; 岸?bank = ( 一 ) 排; ( 一 ) 組

Page 4: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

To construct a WSD system, which can be easily implemented by learning all polysemous words at once, while covering all polysemous words which are listed in MRD.

To consider relation between each sense of context words and the sense of the target word.

4

Sample: I deposit some money In the bank.

Ans:bank = 銀行

Page 5: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Learning step Similarity matrix Word vector Vector representations of sense definitions in MRD

Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic weighted digraph

5

Page 6: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Learning step

Similarity matrix Word vector Vector representations of sense definitions

in MRD

6

Page 7: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Learning step

Similarity matrix Word vector Vector representations of sense definitions

in MRD.

7

Page 8: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Learning step

Similarity matrix Word vector Vector representations of sense definitions

in MRD

8

Page 9: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic

weighted digraph

9

Page 10: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic

weighted digraph

10

Page 11: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic

weighted digraph

11

Page 12: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Disambiguation step The definition of acyclic weighted digraph. Selecting context words Constructing the acyclic weighted digraph Searching the optimal path on the acyclic

weighted digraph

12

Page 13: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

System results

13

Page 14: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

Experiment on English The accuracy of the system is 30.7% on average. The result is very low; there are some reasons as follows.

Context words are not appropriate although context words are very important in that they decide which sense of the target word might be the best.

Mapping English senses to Korean for using English-Korean dictionary leads to some loss of information.

The errors of the stemming process disturbed us to search the right root of the verb in the MRD.

14

Page 15: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion

1515

To consider the relationship between each sense of context words and the sense of the target word

By using Viterbi algorithm to reduce computational complexity.

The system showed bad results on English (30.7), but it resulted in suitable performances, 76.4% by accuracy, over the semantically ambiguous Korean words.

To apply this method to other languages by studying language characteristics.

Page 16: Presenter: Chun-Ping Wu

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

1616

Advantage To consider the relationship between each sense of context words and

the sense of the target word. By using Viterbi algorithm to reduce computational complexity.

Drawback The performance of this system is better in Korean.

Application Word Sense Disambiguation