esr10 joachim daiber - expert summer school - malaga 2015

..

Joachim Daiber

..

Institute for Logic, Language and ComputationUniversity of Amsterdam

.

On Using Syntactic Preordering Models toDelimit Morphosyntactic Search Space.

Introduction Word order Morphology Conclusion

Introduction Motivation

Introduction

Project title: Exploiting hierarchical alignments for linguistically-informedSMTmodels tomeet the hybrid approaches that aim at compositionaltranslation

▶ ESR 10▶ University of Amsterdam▶ Supervisor: Prof. Khalil Sima'an

1/19



Motivation

▶ Current MT models work well if languages are structurally similar▶ Difficulties with morphologically rich languages:

− freer word order− more productive morphological in ections− agreement over long distances

2/19



Motivation

..

..der ..Mann ..schlug ..Peter

..the ..man ..punched ..Peter

..derx ..Mannx ..schlugx ..Peterx

..

..den ..Mann ..schlug ..Peter

..Peter ..punched ..the ..man

..Peter ..schlug ..den ..Mann

3/19


Source preordering Preordering spaces Evaluation

Part I: Word Order

4/19



Preordering source trees

....Peter ..escaped ..from ..the ..police.

Root

.

Sb

.

AuxP

.

Adv

.

AuxA

.

Peter

.

entkam

.

dercase=dat

.

Polizeicase=dat

▶ Source dependency trees are well tted for preordering:− Lerner and Petrov (2013) present two classi er-based dep. tree

preordering models− Jehl et al. (2014) and de Gispert et al. (2015) preorder dep. trees via

branch-and-bound search

5/19



Preordering source trees

▶ Lerner and Petrov (2013) preoder trees starting at the root▶ Order all children (model 1) or left and right children (model 2)


Root

.

Sb

.

AuxP

.

Adv

.

AuxA

6/19



Generating a preordering space

▶ Both Lerner and Petrov (2013) and Jehl et al. (2014) make onlysingle-best predictions

▶ We want:− ALL REASONABLE predictions instead of SINGLE BEST− more exible model

7/19



Multiple predictions and more exible model

▶ Multiple predictions− Mistakes in order decisions propagate− Extract n-best decisions from the model to pass to later models

▶ Making the model more exible− Bad: order decisions are local to tree families− Non-local features would help (e.g. LM)

→ integration via cube pruning

8/19



Making the model more exible

▶ Use standard log-linear model (Och and Ney, 2002)

s′ = argmaxs′

∑i

λi logϕi(s′)

▶ Where to get the weights?− PRO: tuning as ranking (Hopkins and May, 2011)− Scoring functions:

1. Kendall's τ coefficient2. Simulate word level MT system, score by BLEU

9/19



Do non-local features help?

Model Kendall's τ BLEU (s′ → s′)First-best−LM 92.16 68.1First-best+LM (cube) 92.27 68.7

10/19



Quality of the preordering space

▶ Experiments with top 10 preordering outputs of this model

Distortion BLEU MTR TER

Baseline 7 15.2 35.4 66.6Oracle (k = 10) 17.26 37.97 62.64

11/19


Motivation Prediction on source trees Learning what to predict

Part II: Morphology

12/19



Morphology

▶ Word order is only one part of the problem for MRLs▶ Many linguistic properties are not expressed via word order▶ Three questions:

− Does knowing morphological target properties help?− Can we predict these on source trees?− Which properties should we predict?

13/19



Does knowing morphological target properties help?

▶ Perform morph. tagging of target side of translation▶ Project morphological attributes via the alignments

Decoration Morph. attributes Tags BLEU

None - - 15.12

GoldAll attributes 846 15.96Manual selection 77 15.86Automatic selection 225 15.73

14/19



Predicting target morphology on source trees

▶ Prediction based on dependency chains instead of linear chains▶ Can take into account full syntactic context


Root

.

Sb

.

AuxP

.

Adv

.

AuxA

.

Peter

.

entkam

.

dercase=dat

.

Polizeicase=dat

15/19



Learning what to predict

Idea: Only include attr. if it leads to better lexical selection

Learning Procedure (sketch):

1. Decorate the source with all attributes2. Calc. likelihood of heldout set with word-based system (IBM model 1)3. As long as the likelihood increases:

− Find worst attribute by merging tags + recal. liklihood− Remove attribute, re-align− Repeat

16/19



Learning what to predict (English–German)

Part of speech Manual selection Automatic selection

noungender†

numbercase

gendernumbercase

adj

gender†

number‡

case‡

declension

gendernumbercase

synposdegree

verb

number‡*

person‡*

tense*

mode*

-

Additionally only in automatic: part:negativeness, part:subpos, punc:type, num:type.

17/19



Learning what to predict

Manual Automatic All

Training 50k 36m 45m 77mTraining 100k 58m 82m 2h51mTraining 200k 1h54m 3h5m 6h44m

Best F1 72.67 74.67 62.18

18/19


Conclusion

Our work so far:

Question 1: Can we make syntactic preordering models more exibleand generate a space of possible preorderings?

Question 2: Can we predict target morphology on the source?

Current and future work:

Question 3: Can we combine both ideas to exploit interactions?

19/19

References

Thank You!

Any questions?

19/19

References

References

de Gispert, A., Iglesias, G., and Byrne, W. (2015). Fast and accurate preordering for smt using neuralnetworks. In Proceedings of the Conference of the North American Chapter of the Association forComputational Linguistics - Human Language Technologies (NAACL HLT 2015).

Hopkins, M. and May, J. (2011). Tuning as ranking. In Proceedings of the 2011 Conference on EmpiricalMethods in Natural Language Processing, pages 1352--1362, Edinburgh, Scotland, UK. Association forComputational Linguistics.

Jehl, L., de Gispert, A., Hopkins, M., and Byrne, B. (2014). Source-side preordering for translation usinglogistic regression and depth- rst branch-and-bound search. In Proceedings of the 14th Conference ofthe European Chapter of the Association for Computational Linguistics, pages 239--248, Gothenburg,Sweden. Association for Computational Linguistics.

Lerner, U. and Petrov, S. (2013). Source-side classi er preordering for machine translation. In Proceedingsof the 2013 Conference on Empirical Methods in Natural Language Processing, pages 513--523, Seattle,Washington, USA. Association for Computational Linguistics.

Och, F. J. and Ney, H. (2002). Discriminative training and maximum entropy models for statistical machinetranslation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL'02, pages 295--302, Stroudsburg, PA, USA. Association for Computational Linguistics.

19/19

esr10 joachim daiber - expert summer school - malaga 2015

Data & Analytics