![Page 1: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/1.jpg)
Insertion Position Selection Model for Flexible Non-Terminals
in Dependency Tree-to-TreeMachine Translation
Toshiaki NakazawaJapan Science and Technology Agency
(JST )John Richardson Sadao Kurohashi
Kyoto University4/11/2016 @ EMNLP2016
![Page 2: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/2.jpg)
Where to insert?
I found Pikachu by chance
yesterdayinsertion positions
0.70.25 0.02 0.01prob. 0.010.01
2
![Page 3: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/3.jpg)
Where to insert?
I found Pikachu by chance yesterday
in the parkinsertion positions
0.20.1 0.6 0.010.01
@Texas State Capitol
0.010.1
3
![Page 4: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/4.jpg)
Pikachu
Dependency Tree-to-Tree Translation
私は昨日
公園で
ピカチュウを
見つけた
私は
を見つけた
I
found
by
Input Translation Rules Output
ピカチュウ Pikachu
偶然 [X7][X7]
偶然
chance
I
found
by
[X7]
chance
公園 thepark
昨日 yesterday
で4
![Page 5: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/5.jpg)
Dependency Tree-to-Tree Translation
私は昨日
公園で
ピカチュウを
見つけた
私は
を見つけた
Input Translation Rules Output
ピカチュウ Pikachu
偶然
公園 thepark
[X7]偶然
昨日 yesterday
で
[X]
[X]
[X]
[X]
found
by
chance
[X]I
[X7]found
Pikachu
by
I
chance
yesterday
the
park
in
found
Pikachu
by
I
chance
yesterday
Pikachu
I
found
by
chance
Flexible Non-terminals[Richardson+, 2016]
floatingsubtreefloatingsubtree
5
![Page 6: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/6.jpg)
Translation Quality and Decoding Speedw/ and w/o Flexible Non-terminals
• Using ASPEC (Asian Scientific Paper Excerpt Corpus) JE and JC
• Time is a relative decoding time
Ja->En En->Ja Ja->Zh Zh->JaBLEU Time BLEU Time BLEU Time BLEU Time
w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28
6
![Page 7: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/7.jpg)
Appropriate Insertion Position Selection• roughly half of all translation rules were
augmented with flexible non-terminals [Richardson+, 2016]
• flexible non-terminals make the search space much bigger -> slower decoding speed, increased search error
• reduce the number of possible insertion positions in translation rules by a Neural Network model
7
![Page 8: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/8.jpg)
Insertion Position Selection Model for Flexible Non-Terminals
in Dependency Tree-to-TreeMachine Translation
Toshiaki NakazawaJapan Science and Technology Agency
John Richardson Sadao KurohashiKyoto University
4/11/2016 @ EMNLP2016
![Page 9: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/9.jpg)
INSERTION POSITION SELECTION MODEL
9
![Page 10: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/10.jpg)
Insertion Position Selection Model• For each insertion position:–predict• scores of the insertion positions
– given• input: the floating word (I) and its parent word
(Ps) with the distance (Ds)• target: previous (Sp) and next (Sn) sibling words
of the insertion position and the parent (Pt) with the distance (Dt)
10
![Page 11: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/11.jpg)
Information for Selection Model
私は昨日
公園で
ピカチュウを
見つけた
私は
を見つけた
Input Translation Rules
偶然[X7]
偶然 found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=4
[X]
Dt=-2
Non-terminals:reverted to the original word in the parallel corpus
11
[yesterday]
[found]
![Page 12: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/12.jpg)
Information for Selection Model
私は昨日
公園で
ピカチュウを
見つけた
私は
を見つけた
Input Translation Rules
偶然[X7]
偶然 found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=4
[X]
Dt=-3
= [POST-BOTTOM]
12
[yesterday]
[found]
![Page 13: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/13.jpg)
Neural Network Model
220
I
Ps
Pt
Sp1
Sn1
Ds
Dtk
100100
220220
220220
100
word to be inserted
parent of I
distance from PS
previous sibling
next sibling
parent of the insertion position
distance from Pt
fully-connectedfeed-forward network
( )
・・・11
1
・・・
insertion position 2
insertion position N
scores
0.10.6・・・0.1
01・・・0
( )
softmax gold
loss =softmax cross-entropy
insertion position 1
13
![Page 14: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/14.jpg)
Training Data Creation• Training data for the NN model can be
automatically created from the word-aligned parallel corpus– consider each alignment as the floating word and
remove it from the target tree
14
私は
を見つけた
I
found
byピカチュウ Pikachu
偶然
chance
[X][X][X]
[X]label
0
00
1
![Page 15: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/15.jpg)
EXPERIMENTS
15
![Page 16: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/16.jpg)
Insertion Position Selection Experiment• Parallel corpus: ASPEC-JE/JC (2M/680K
sentences)• Data size
• Comparison– L2-regularized logistic regression (using Multi-core
LIBLINEAR)
Ja->En
En->Ja
Ja->Zh
Zh->Ja
Training 15.7M 5.7M
Development 160K 58K
Test 160K 58K
Ave. # IP 3.39 3.15 3.72 3.41
16
![Page 17: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/17.jpg)
Experimental ResultsJa->En En->Ja Ja->Zh Zh->Ja
Training 15.7M 5.7MDevelopment 160K 58KTest 160K 58KAve. # IP 3.39 3.15 3.72 3.41Mean loss 0.089 0.058 0.105 0.056Top 1 Accuracy (%) 97.08 97.72 96.51 97.99Top 2 Accuracy (%) 98.94 99.52 98.97 99.56Logit Accuracy (%) 55.00 89.03 68.04 83.16
17
![Page 18: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/18.jpg)
Translation Experiment• Parallel corpus: ASPEC-JE/JC (2M/680K
sentences)• Decoder: KyotoEBMT [Richardson+, 2014]• 5 Settings– Phrase-based and hierarchical phrase-based SMTs – w/o Flex: not using flexible non-terminals– w/ Flex: baseline with flexible non-terminals– Prop: using insertion position selection (only top 1)
• BLEU and relative decoding time
18
![Page 19: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/19.jpg)
Translation Experimental Results
Ja->En En->Ja Ja->Zh Zh->JaBLEU Time BLEU Time BLEU Time BLEU Time
PBSMT 18.45 - 27.48 - 27.96 - 34.65 -HPBSMT 18.72 - 30.19 - 27.71 - 35.43 -w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28Prop 22.07 2.25 30.50 1.27 29.83 2.21 34.71 1.89
19
![Page 20: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/20.jpg)
20
Conclusion• Proposed insertion position selection model to
reduced the number of insertion positions for flexible non-terminals in the translation rules
• Automatic evaluation scores and decoding speed are improved
![Page 21: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/21.jpg)
21
Future Work• Use grand-children’s info– Recursive NN [Liu et al., 2015] or Convolutional
NN [Mou et al., 2015]
• Shift to NMT!!– Actually, we’ve already shifted and participated
WAT2016 shared tasks• However, NMT is still far from perfect
![Page 22: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/22.jpg)
J->E Adequacy in WAT2016
22
3.76 3.710%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
21.75 2137.25
51.75 46.7530.5
20.75 26.7516.25
4.75 510
1 0.5 6
12345
3.83Average adequacy
BLEU 26.22 26.39 25.41
Kyoto-U(NMT)
NAIST/CMU(NMT)
NAIST(2015 best, F2T)
Team name
![Page 23: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree](https://reader036.vdocuments.pub/reader036/viewer/2022081514/586fe06d1a28ab18428b733b/html5/thumbnails/23.jpg)
23
Thank You!AD I’m co-organizing
The 3rd Workshop on Asian Translation(WAT2016)
in conjunction with COLING 2016Invited talk by Google about GNMT!
Please come to the workshop!
http://lotus.kuee.kyoto-u.ac.jp/WAT/