latent semantic transliteration using dirichlet mixture

23
Latent Semantic Transliteration using Dirichlet Mixture Masato Hagiwara, Satoshi Sekine Rakuten Institute of Technology, New York NEWS 2012, July 12 2012

Upload: rakuten-inc

Post on 13-Jul-2015

273 views

Category:

Technology


2 download

TRANSCRIPT

Latent Semantic Transliteration

using Dirichlet Mixture

Masato Hagiwara, Satoshi Sekine

Rakuten Institute of Technology, New York

NEWS 2012, July 12 2012

2

Background

• Transliteration

– Phonetic translation between languages with

different writing systems

e.g., flextime / furekkusutaimu フレックスタイム

– Major way to import words to different languages

• Transliteration models

– Phonetic-based re-writing models

(Knight and Jonathan 1998)

– Spelling-based supervised models

(Li et al. 2004) (Finch and Sumita 2008)

3

Alpha-Beta Model [Brill and Moore 2000]

Edit distance

substitution, insertion, deletion = cost 1

Alpha-Beta Model

flextime

furekkusutaimu

Generalization of edit distance string-to-string substitution α→β

P(flextime→furekkusutaimu)

= P(f→fu)×P(le→re)×P(x→kkusu)×P(ti→tai)×P(me→mu)

Transliteration Probability

= Product of “Transliteration Unit (TU)” Probs.

Maximum re-writing probability over all possible partitions

α

β

flextime

furekkusutaimu フレックスタイム

4

Joint Source Channel Model [Li et al. 05]

P(flextime→furekkusutaime)

= P(f→fu|BOW)×P(le→re|f→fu)×P(x→kkusu|le→re)× …

JSC Model

fl ext im e

frek ku suta imu

p i aget

pi a j e

Transliteration Prob. =Prod. of TU n-gram probs.

TU Probability Estimation

Training

Corpus

TU Probability Table

P( fl→flek |・) = XXX

P( ext→ku |・) = YYY

P( p→pi |・) = ZZZ

EM Algorithm

Freq.→Prob. Random Initial Alignment

Viterbi Algorithm

5

Joint Source Channel Model [Li et al. 05]

P(flextime→furekkusutaime)

= P(f→fu|BOW)×P(le→re|f→fu)×P(x→kkusu|le→re)× …

JSC Model

fl ext im e

frek ku suta imu

p i aget

pi a j e

Transliteration Prob. =Prod. of TU n-gram probs.

TU Probability Estimation

Training

Corpus

TU Probability Table

P( fl→flek |・) = XXX

P( ext→ku |・) = YYY

P( p→pi |・) = ZZZ

EM Algorithm

Freq.→Prob. f le x ti me

fu re kkusu tai me

pi a get

pi a je

Random Initial Alignment

Viterbi Algorithm

6

Multiple Language Origins

亚历山大 Yalishanda / Alexander 山本 Yamamoto / Yamamoto

Explicit language detection

Requires a training set annotated with language origins

piaget / piaje ピアジェ

target / taagetto ターゲット

French origin

English origin

French model

English model

Indo-European origin

Japanese origin

Chinese Transliteration Model

Japanese Reading Model

Class Transliteration Model (Li et al. 07)

7

Issues on Class Transliteration Model

• Requires training sets tagged with language origins

– Rare especially for proper nouns

• Language origins ≠ transliteration models

– e.g., spaghetti / supageti スパゲティ

Italian origins but can be found in English dictionaries

– e.g., Carl Laemmle / kaaru remuri カール・レムリ

German immigrant but listed as an “American” film

producer

→ An English transliteration model doesn’t work

8

Issues on Class Transliteration Model

• Requires training sets tagged with language origins

– Rare especially for proper nouns

• Language origins ≠ transliteration models

– e.g., spaghetti / supageti スパゲティ

Italian origins but can be found in English dictionaries

– e.g., Carl Laemmle / kaaru remuri カール・レムリ

German immigrant but listed as an “American” film

producer

→ An English transliteration model doesn’t work

Model source language origins as latent classes

9

Latent Class Transliteration (LCT) Model

[Hagiwara & Sekine 11] • Models the “source language origins” as latent classes

• “latent classes” correspond to sets of words with similar

transliteration characteristics

• Trained via the EM algorithm from transliteration pairs

Class transliteration [Li et al. 04]

Latent Class Transliteration [Hagiwara&Sekine 11]

Explicit language detection

Latent class distribution

s: source

t: target

z: latent class

K: # of latent classes

(determined using dev. sets)

10

Iterative Learning via EM Algorithm

piaget → piaje target → taaget

Training Pairs

P(pi→pi ピ) P(ar→aa アー) P(get→je ジェ) P(get→getto ゲット) …

Transliteration Model

Lx Ly Lz

11

Iterative Learning via EM Algorithm

piaget → piaje target → taaget

p/i/a/get→pi/a/j/e

t/ar/get→taa/ge/tto

Lx Ly Lz

Training Pairs

P(pi→pi ピ) P(ar→aa アー) P(get→je ジェ) P(get→getto ゲット) …

Transliteration Model

Lx Ly Lz

pi/a/get→pi/a/je

tar/get→taa/getto

Lx Ly Lz

E step Transliteration probability

Based on viterbi search

12

Iterative Learning via EM Algorithm

piaget → piaje target → taaget

p/i/a/get→pi/a/j/e

t/ar/get→taa/ge/tto

Lx Ly Lz

Update

M step

Σγ*f(get→je ジェ)

Training Pairs

P(pi→pi ピ) P(ar→aa アー) P(get→je ジェ) P(get→getto ゲット) …

Transliteration Model

Lx Ly Lz

pi/a/get→pi/a/je

tar/get→taa/getto

Lx Ly Lz

E step Transliteration probability

Based on viterbi search

13

Iterative Learning via EM Algorithm

piaget → piaje target → taaget

p/i/a/get→pi/a/j/e

t/ar/get→taa/ge/tto

Lx Ly Lz

Update

M step

Σγ*f(get→je ジェ)

Training Pairs

P(pi→pi ピ) P(ar→aa アー) P(get→je ジェ) P(get→getto ゲット) …

Transliteration Model

Lx Ly Lz P(pi→pi ピ) P(ar→aa アー) P(get→je ジェ) P(get→getto ゲット) …

TransliterationModel

Lx Ly Lz

pi/a/get→pi/a/je

tar/get→taa/getto

Lx Ly Lz

E step Transliteration probability

Based on viterbi search

14

Iterative Learning via EM Algorithm

piaget → piaje target → taaget

p/i/a/get→pi/a/j/e

t/ar/get→taa/ge/tto

Lx Ly Lz

Update

M step

Σγ*f(get→je ジェ)

Training Pairs

P(pi→pi ピ) P(ar→aa アー) P(get→je ジェ) P(get→getto ゲット) …

Transliteration Model

Lx Ly Lz P(pi→pi ピ) P(ar→aa アー) P(get→je ジェ) P(get→getto ゲット) …

TransliterationModel

Lx Ly Lz

pi/a/get→pi/a/je

tar/get→taa/getto

Lx Ly Lz

E step Transliteration probability

Based on viterbi search

Sensitive to noise in the training data because of maximum likelihood estimation

15

Transliteration Models vs Topic Models

Transliteration Models Document Topic Models

Transliteration Unit (Atomic units of substitution )

e.g., pia / pia ピア get / je ジェ Word

Transliteration Pair (Sequence of transliteration units)

e.g., pia get / pia je ピア ジェ Document

Alpha-Beta Model Word Unigram Language Model

Joint Source Channel Model Word n-gram Language Model

Class Transliteration Model [Li et al. 04] Classification + Switch LMs

Latent Class Transliteration Model [Hagiwara&Sekine 11]

Unigram Mixture [Nigam et al. 00]

Proposed Dirichlet Mixture

[Yamamoto & Sadamitsu 03]

Introduce a Dirichlet mixture prior to alleviate overfitting

16

Polya distribution [Yamamoto,Mochihashi 06]

Proposed Method

Latent Semantic Transliteration Model

using Dirichlet Mixture (DM-LST)

𝑃(𝑢|𝑧1)

𝑃𝐷𝑖𝑟(𝑝; 𝛼1) 𝑃𝐷𝑖𝑟(𝑝; 𝛼2)

𝑃𝐷𝑖𝑟(𝑝; 𝛼3)

𝑢1=get/je

Latent Class Transliteration [Hagiwara&Sekine 11] 𝑃(𝑢|𝑧2)

𝑃(𝑢|𝑧3) 𝑢2 =get/getto

Latent Semantic Transliteration using Dirichlet Mixture (Proposed)

Estimate Dirichlet Mixture Parameters via the EM Algorithm [Yamamoto et al. 03]

𝑢3

French

English

Multinomial Dirichlet Mixture

17

Transliteration Generation Pipeline

Training Corpus

(flextime, flekkusutaimu)

(piaget, piaje)

DM-LST (Proposed)

smith

Input

JSC Model

+Stack Decoder

sumisu スミス

zumisu ズミス

sumaisu スマイス

sumaizu スマイズ

Candidate List

sumisu スミス

Output

Generation Re-ranking

(target, taagetto)

18

Experiments

• Alpha-Beta Model (AB)

• Joint Source Channel (JSC)

• Latent Class Transliteration (LST)

• Latent Semantic Transliteration using Dirichlet Mixture

(DM-LST; Proposed)

19

Experimental Settings

• Evaluation Data

– Translation pairs En-Ja, En-Ch, En-Ko in NEWS2009 [Li et al. 09]

• Evaluation Metric

– ACC: Averaged Top-1 Accuracy

– MFS: Mean F-Score

– MRR: Mean Reciprocal Rank

• Parameters

– Fixed: Stack beam width B=32, EM Iteration=15

– Number of latent classes M = Tuned using the dev. set for each set

Set Train Dev. Test

En-Ja 23,225 1,492 1,489

En-Ch 31,961 2,896 2,896

En-Ko 4,785 987 989

20

Results

Set Model ACC MFS MRR

En-Ja AB 0.293 0.755 0.378

JSC 0.326 0.770 0.428

LCT 0.345 0.768 0.437

DM-LST 0.349 0.776 0.444

En-Ch AB 0.358 0.741 0.471

JSC 0.417 0.761 0.527

LCT 0.430 0.764 0.532

DM-LST 0.445 0.779 0.546

En-Ko AB 0.145 0.537 0.211

JSC 0.151 0.543 0.221

LCT 0.079 0.483 0.167

DM-LST 0.174 0.556 0.237

21

Results

Set Model ACC MFS MRR

En-Ja AB 0.293 0.755 0.378

JSC 0.326 0.770 0.428

LCT 0.345 0.768 0.437

DM-LST 0.349 0.776 0.444

En-Ch AB 0.358 0.741 0.471

JSC 0.417 0.761 0.527

LCT 0.430 0.764 0.532

DM-LST 0.445 0.779 0.546

En-Ko AB 0.145 0.537 0.211

JSC 0.151 0.543 0.221

LCT 0.079 0.483 0.167

DM-LST 0.174 0.556 0.237

22

Examples

Input Conventional Methods Proposed Method

dijon (En-Ja)

☓ diyon ディヨン ○ dijon ディジョン

goldenberg (En-Ja)

☓ gōrudenberugu ゴールデンベルグ

○ gōrudenbāgu ゴールデンバーグ

covell (En-Cn)

☓kefuer 科夫尔 ○keweier 科维尔

netherwood (En-Cn)

☓neitehewude 内特赫伍德 ○neisewude 内瑟伍德

darling (En-Ko)

☓dareuling 다르링 ○dalling 달링

gutheim (En-Cn)

○ gutehaimu 古特海姆 ○ gutehaimu 古特海姆

martina (En-Ko)

○mareutina 마르티나 ○mareutina 마르티나

23

Conclusion

• Proposed Latent Semantic Transliteration

based on Dirichlet Mixture (DM-LST)

– Formalized conventional transliteration models

by document topic models

– Introduced a Dirichlet Mixture prior to alleviate

overfitting

– Superior transliteration performance

to the conventional methods

• Future Works

– Deal with transliteration unit N-grams (N≧2)

– Context-dependent transliteration • e.g., Charles → チャールズ chāruzu or シャルル sharuru