forest-to-string smt for asian language translation: naist ...travatar! same as [neubig & duh,...

Post on 13-Sep-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

NAIST at WAT 2014

Forest-to-String SMT for Asian Language Translation:NAIST at WAT 2014

Graham NeubigNara Institute of Science and Technology (NAIST)

2014-10-4

2

NAIST at WAT 2014

Features of ASPEC

● Translation between languages with different grammatical structures

流動 プラズマ を 正確 に 測定 する ため に 画像 を 再 構成 した 。

an image was reconstituted in order to measure flowing plasma correctly .

● We all know: Phrase-based MT is not enough

for the accurate measurement of plasma flow image was reconstructed .

3

NAIST at WAT 2014

Solution?: 2-step Translation Process

● Pre-ordering [Weblio, SAS_MT, NII, TMU, NICT]

● RBMT+Statistical Post Editing [TOSHIBA, EIWA]

我々 は 科学論文 を 翻訳する

我々 翻訳する 科学論文

we translatescientific papers

我々 は 科学論文 を 翻訳する

we translatescience thesis

we translatescientific papers

4

NAIST at WAT 2014

This is a lot of work... :(How do I make good

Japanese-Englishpreordering rules?!

How do I make goodJapanese-Chinese

preorderering rules?!

What about error propagation?

What if better preorderingaccuracy doesn't equal better

translation accuracy?

5

NAIST at WAT 2014

Evidence

6

NAIST at WAT 2014

Our Solution: Tree-to-String Translation [Liu+ 06]

友達 と ご飯 を 食べ た

SUF5

VP0-5

PP0-1

VP2-5

PP2-3

N2

P3

V4

N0

P1

VP4-5

a meal

a meal

x1 x0

x1 x0

my friend

my friend

x1 with x0

x1 x0

with

ate

ate

7

NAIST at WAT 2014

Requirements for aTree-to-String Model

This is a test . It uses data .

これはテストです。 データを使用します。ParallelCorpus

Source SentenceParser

Alignments

Rule ExtractionRule ScoringOptimization

Tree-to-StringModel

8

NAIST at WAT 2014

Reducing our work load.How do I make good

Japanese-Englishpreordering rules?!

How do I make goodJapanese-Chinese

preorderering rules?!

What about error propagation?

What if better preorderingaccuracy doesn't equal better

translation accuracy?

XX

X

9

NAIST at WAT 2014

Forest-to-string Translation[Mi+ 08]

I saw a girl with a telescope

PRP0,1

VBD1,2

DT2,3

NN3,4

IN4,5

DT5,6

NN6,7

NP5,7

NP2,4

PP4,7

VP1,7

S0,7

NP2,7

NP0,1

10

NAIST at WAT 2014

Travatar Toolkit

● Forest-to-string translation toolkit

● Supports training, decoding

● Includes preprocessing scripts for parsing, etc.

● Many other features (optimization, Hiero, etc...)

Available open source!http://phontron.com/travatar

11

NAIST at WAT 2014

NAIST WAT System

12

NAIST at WAT 2014

WAT Results

en-ja ja-en zh-ja ja-zh0

10

20

30

40

50

BLEU

en-ja ja-en zh-ja ja-zh0

20

40

60

HUMAN

OtherNAIST

First place in all tasks!

+2.2

+2.7

+3.6+1.8

+13.0

+15.0

+28.3

+3.8

13

NAIST at WAT 2014

System Elements

Travatar!Same as [Neubig & Duh, ACL2014]

Recurrent NeuralNet Language Model

Pre/post Processing(UNK splitting, transliteration)

Dictionaries

14

NAIST at WAT 2014

Recurrent Neural Network LM

● Vector representation → robustness

● Recurrent architecture → longer context

I can eat an apple </s>

15

NAIST at WAT 2014

Pre/post processingUNK segmentation (ja-en)

球内部

球 内部

試験 管立て

試験 管 立て

Kanji Normalization (ja-zh, zh-ja)

イチョウ黄叶

イチョウ黄葉

臭気鉴定师

臭気鑑定師

Transliteration (ja-en)

Japan インテック

Japan Intekku

Dictionary addition (ja-en)

膿瘍

apostema

典型

archetype

16

NAIST at WAT 2014

Conclusion

17

NAIST at WAT 2014

Future Work

LOSE at next year's WAT.(Make Travatar so easy to use that others can use it to make really good MT systems for Asian languages.)

Starting soon! Training scripts to be available:http://phontron.com/project/wat2014

top related