kyoto-u: syntactical ebmt system for ntcir-7 patent translation task kyoto university toshiaki...

32
Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi

Upload: adele-farmer

Post on 13-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent

Translation Task

Kyoto University

Toshiaki Nakazawa     Sadao Kurohashi

Overview of Kyoto-U SystemTranslation Examples

J: 図書館で新聞を読むE: I read a newspaper in the library

J: 政治の本が売れ残っているE: A book in politics was left on the shelf

・・・・・

・・・・・           ・・・・・

本 が売れ残って いる

政治 の a book

in politicswas left

on the shelf

図書館 で新聞 を

読む

I

read

a newspaper

in the library

library in

newspaper ACC

read

politics in

book NOM

left unsold

Overview of Kyoto-U SystemTranslation Examples

Input:図書館で政治の本を読む。

Output:I read a book in politicsin the library

本 が売れ残って いる

政治 の a book

in politicswas left

on the shelf

図書館 で新聞 を

読む

I

read

a newspaper

in the library

・・・・・           ・・・・・

図書館 で

本 を読む

政治 の

read

book ACC

politics in

library in

a book

in politics

in the library

I

read

Overview of Kyoto-U SystemTranslation Examples

Alignment

Alignment

J: 交差点で、突然あの車が

飛び出して来たのです。

E : The car came at me from

the side at the intersection.

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

J: JUMAN/KNPE: Charniak’s nlparser → Dependency tree

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

2. Detection of word(s) correspondences

Finding Correspondences• Bilingual dictionaries (500K entries)

• Substring co-occurrence (Cromieres 2006)

• Numeral normalization

二百十六万 →  2,160,000 ← 2.16 million

• Transliteration (Katakana words, NEs)  ローズワイン → rosuwain ⇔ rose wine

(similarity:0.78)新宿 → shinjuku ⇔ shinjuku (similarity:1.0)

)()(

),(

ecountjcount

ejcount

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

2. Detection of word(s) correspondences

3. Disambiguation of correspondences

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

2. Detection of word(s) correspondences

3. Disambiguation of correspondences

4. Handling of remaining phrasesExtension to leaf-nodes

Alignment

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

1. Transformation into dependency structure

2. Detection of word(s) correspondences

3. Disambiguation of correspondences

4. Handling of remaining phrases

5. Registration to translation example database

Alignment Ambiguities

you

will have to file

insurance

an claim

insurance

with the office

in Japan

日本 で

保険

会社 に 対して

保険

請求 の

申し立て が

可能です よ

[in Japan]

[insurance]

[insurance]

[of claim]

[to the company]

[file]

[be able to]

Alignment: Consistency

Near

Far

• For each pair of candidates ai and aj

calculate the J-side distance dJ and

the E-side distance dE

• Give a consistency score to the pair based on dJ and dE

• Calculate consistency scores for all the pairs in a possible set of alignment candidates

2/)1(

),(),,(maxarg 1 1

nn

aadaadcsn

i

n

ij jiEjiJ

alignment

Baseline

Distance of Each Branch: 1

  Consistency Score:

EJEJ dd

ddcs11

,

1/1+1/2=1.5…

……

Consistency Score• The frequency of distance pair in gold-standard

alignment data (Mainichi newspaper 40K sentence pairs) [Uchimoto04]

Frequency (log)

Dist of J-Side Dist of E-Side

Distance based on Dependency Type

you

will have to file

insurance

an claim

insurance

with the office

in Japan

日本 で

保険

会社 に 対して

保険

請求 の

申し立て が

可能です よ

デ格

文節内

連用

文節内

ノ格

ガ格

NP

NP

NN

PP

NN

PP

[in Japan]

[insurance]

[insurance]

[of claim]

[to the company]

[file]

[be able to]

you

will have to file

insurance

an claim

insurance

with the office

in Japan

日本 で

保険

会社 に 対して

保険

請求 の

申し立て が

可能です よ

デ格

文節内

連用

文節内

ノ格

ガ格

NP

NP

NN

PP

NN

PP

[in Japan]

[insurance]

[insurance]

[of claim]

[to the company]

[file]

[be able to]

Distance based on Dependency Type

you

will have to file

insurance

an claim

insurance

with the office

in Japan

日本 で

保険

会社 に 対して

保険

請求 の

申し立て が

可能です よ

デ格

文節内

連用

文節内

ノ格

ガ格

NP

NP

NN

PP

NN

PP

[in Japan]

[insurance]

[insurance]

[of claim]

[to the company]

[file]

[be able to]

Distance based on Dependency Type

Example of Alignment Improvement

Proposed model Word-base alignment

Translation

Input:図書館で政治の本を読む。

Output:I read a book in politicsin the library

本 が売れ残って いる

政治 の a book

in politicswas left

on the shelf

図書館 で新聞 を

読む

I

read

a newspaper

in the library

・・・・・           ・・・・・

図書館 で

本 を読む

政治 の

read

book ACC

politics in

library in

a book

in politics

in the library

I

read

TranslationTranslation Examples

Selection of Translation Examples

• Score for an example

1. Size of an example

2. Similarity of neighboring nodes

3. Translation probability

• Beam search from the root of the input

[Sato 91]

Input:

図書館 で

本 を読む

政治 の

read

book ACC

politics in

library in

2sizew

読む

a newspaper

I

read

a newspaper

in the library

I

study

in the library

I

read

a newspaper

in the library

7.0 simw3

2 transw

0.7

Translation example:

新聞 を図書館 で

Input:図書館で政治の本を読む。

本 が売れ残って いる

政治 の a book

in politicswas left

on the shelf

図書館 で新聞 を

読む

I

read

a newspaper

in the library

・・・・・           ・・・・・

図書館 で

本 を読む

政治 の

read

book ACC

politics in

library in

a book

in politics

in the library

I

read

Combination of TMsTranslation Examples

      ┌ 記録    ┌ 領域 で の    ├ 変形  ┌ 形状 と ,  │ ┌ 記録  ├ 特性 の┌  関係 を調べた 。

┌ the relationship ││   ┌ deformation ││┌ shape and │││ │  ┌ recording │││ └ in the region ││├ recording │└ between characteristics was examined

InputDependency Tree

Input :記録領域での変形形状と,記録特性の関係を調べた。

OutputDependency Tree┌  状況 を

調べた 。┌ the situationwas examined

    ┌ 相互  ┌ 作用 と  │┌ 記録  ├ 特性 の┌  関係 を調べた 。

┌ the relationship ││┌ interaction and ││├ recording │└ between characteristics was investigated

      ┌ 大変    ┌ 形  ┌ 領域 で の  ├ 断面┌ 形状 を模擬 した

  ┌  cross-sectional ┌ shape ││   ┌   large ││┌ deformation │└ in the region was └ simulated

┌  記録領域 の

┌ recording of the areas

┌  変形パターン を

┌ deformation the pattern

Translation Examples

Output :The relationship between deformation shape in the recording region and recording characteristics was examined .

Evaluation Resultsand

Discussion

BLEU Adequacy Fluency Average

27.20 NTT 3.81 tsbmt 4.02 Japio 3.88 tsbmt

27.14 moses 3.71 Japio 3.94 tsbmt 3.86 Japio

27.14 MIT 3.15 MIT 3.66 MIT 3.40 MIT

25.48 NAIST-NTT 2.96 NTT 3.65 NTT 3.30 NTT

24.79 NICT-ATR 2.85 Kyoto-U 3.55 moses 3.18 moses

24.49 KLE 2.81 moses 3.44 tori 3.10 Kyoto-U

23.10 tsbmt 2.66 NAIST-NTT 3.43 NAIST-NTT 3.04 NAIST-NTT

22.29 tori 2.59 KLE 3.35 Kyoto-U 3.01 tori

21.57 Kyoto-U 2.58 tori 3.28 HIT2 2.94 KLE

19.93 mibel 2.47 NICT-ATR 3.28 KLE 2.86 HIT2

19.48 HIT2 2.44 HIT2 3.09 mibel 2.78 NICT-ATR

19.46 Japio 2.38 mibel 3.08 NICT-ATR 2.74 mibel

15.90 TH 1.87 TH 2.42 FDU-MCandWI 2.13 TH

9.55 FDU-MCandWI 1.75 FDU-MCandWI 2.39 TH 2.08 FDU-MCandWI

1.41 NTNU 1.08 NTNU 1.04 NTNU 1.06 NTNU

Intrinsic J-E Evaluation Result

BLEU Adequacy Fluency Average

30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt

29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses

28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT

22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR

17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U

Intrinsic E-JEvaluation Result

• Not caring whether a child node is a pre-child or post-child– Resulting target structure goes wrong

• After resolving this defect, BLEU score in EJ translation rose to 24.02 from 22.65

Critical Defect in EJ Translation

BLEU Adequacy Fluency Average

30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt

29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses

28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT

22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR

17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U

24.02 ? ? ?

• Kyoto-U Fully Syntactic EBMT system:1. Alignment: Consistency

2. Alignment: Extension

3. Translation: Discontinuous example

4. Translation: Easy combination

• By using syntactic information, we could achieve reasonably high quality translation

• For patent translation, we may need some pre-processings to handle special expressions which cause parsing errors

Conclusion