![Page 1: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/1.jpg)
Statistical Automatic Post Editing
Santanu Pal * The work has been carried out in Translated
1
![Page 2: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/2.jpg)
§ Introduction § Motivations § System Description
§ Preprocessing § Improved Word Alignment § Hierarchical PB-SMT § Advantage of using Hierarchical PB-SMT
§ Experiments § Evaluations § Conclusions
Outline
2
![Page 3: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/3.jpg)
§ The translation provided by current MT systems often fail to deliver perfect translation output
§ to achieve sufficient quality output, translations often need to be corrected or post-edited by human translators
§ “Post-Editing” (PE) is defined as the correction by human over the translation produced by an MT system (Veale and Way 1997).
§ Often the process of improving translation provided by an MT system with a minimum of manual labor (TAUS report, 2010).
Introduction
3
![Page 4: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/4.jpg)
§ The major goal of using APE system is § Post-editing MT output instead of translating from scratch § Timesaving § Cost-effective § reduce the effort of the human post-editor
§ In some cases, recent studies have even shown that § The quality of MT plus PE can exceed the quality of human
translation (Fiederer and O‘Brien 2009, Koehn 2009, DePalma and Kelly, 2009)
§ Productivity (Zampieri and Vela, 2014).
Introduction(Cond…)
4
![Page 5: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/5.jpg)
§ Many studies regarding the impacts of various factors and methods in PE § Those were examined against the volume of PE effort. § instead in a commercial work environment.
§ The overall purpose of the present study is to answer two fundamental questions: § What would be the optimal design of a PE system? which
is ultimately determined by the quality of MT output in a commercial work environment
§ How can human involvement to be optimized to reduce post editing effort in a commercial work environment?
Introduction(Cond…)
5
![Page 6: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/6.jpg)
§ The advantage of an APE system is that it can adapt any Black-box MT engine output as an input and provide possible automatic PE output with out having to retrain or re–implement the first-stage MT engine.
§ PB-SMT (Koehn et al., 2003) can be applied as APE system. (Simard et al., 2007). § The SMT system trained on the output RBMT output and reference
human translations. § This PB-SMT based APE system is able to correct the systematic errors
produced by RBMT system. § This approach achieved large improvements in performance
Motivation
6
![Page 7: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/7.jpg)
§ Translations provided by MT including: § wrong lexical choice § Incorrect word ordering § word insertion § word deletion
§ The proposed APE system based on HPB-SMT and improved hybrid alignments that is able to handle above errors
§ This method also able to correct word ordering error to some extent.
Motivation(Cond…)
7
![Page 8: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/8.jpg)
§ APE system based on monolingual SMT system trained on MT output as the source language and reference human translations as the target language.
§ The proposed APE system is designed as follows: § Preprocess data
§ Parallel text [MT-output and PE output] § Monolingual data
§ Improved word alignment (Hybrid) § Hierarchical PB-SMT
System Design
8
![Page 9: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/9.jpg)
§ Parallel Text and Monolingual data § Some sentences are Noisy and mixed with other
languages § some sentences contain URLs § The preprocessor cleans this noise by using a language
identification tool
Preprocessing
9
![Page 10: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/10.jpg)
§ Statistical Word Aligner § Giza++
§ implements maximum likelihood estimators for all the IBM 1-5 models and a HMM alignment model as well as Model 6
§ SymGiza++ § Computes symmetric word alignment models with the capability to
take advantage of multi-processor systems. § Alignment quality improves more than 17% compared to Giza++.
§ Berkeley Aligner § Cross Expectation Maximization word aligner § its jointly trained HMM models as a result AER reduce by 29%.
§ Edit distance based Aligner § TER Alignment § METEOR Alignment
Improved Word Alignment
10
![Page 11: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/11.jpg)
§ Edit distance based Aligner § TER Alignment
§ TER is an evaluation metric which measures the ratio between the number of edit operations required for a hypothesis H and the corresponding reference R to the total number of words in the R.
§ Can also be used as monolingual word alignment § METEOR Alignment
Improved Word Alignment (Cond…)
11
![Page 12: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/12.jpg)
§ METEOR Alignment § The alignment mapping method between words of H an R, has been
build incrementally by sequence of word-mapping modules: § Exact: maps if they are exactly the same. § Porter stem: maps if they are the same after they are stemmed § WN synonymy: maps if they are considered synonyms in
WordNet. § If multiple alignment exists, Meteor selects the alignment with fewest
cross alignment links. § The final alignment has been produced between H and R, as the
union of all stage alignments (e.g. Exact, Porter Stem and WN synonymy).
Improved Word Alignment (Cond…)
12
![Page 13: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/13.jpg)
§ Hybridization § Union: In union method, we consider all alignments
correct. All the alignment tables are unioned together and duplicate entries are removed.
Improved Word Alignment (Cond…)
13
![Page 14: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/14.jpg)
§ ADD additional Alignments: § Consider one of the alignments generated by GIZA++
GDFA (A1) or Berkeley aligner (A2), SymGiza++ (A3) as the standard alignment (SA)
§ ALGORITHM: § Step 1: Choose a standard alignment (SA) from A1, A2
and A3. § Step 2: Correct the alignment of SA by looking at the
alignment table of A4 and A5. § Step 3: Find additional alignments from A2 , A3. A4 and
A5. using intersection method (A2∩A3∩A4∩A5) if A1 as SA.
§ Step 4: Add additional entries with SA.
Improved Word Alignment (Cond…)
14
![Page 15: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/15.jpg)
§ Hierarchical PB-SMT is based on Synchronous Context Free Grammar (SCFG) (Aho and Ullman, 1969). SCFG rewrites rules on right-hand side with aligned pairs (Chiang, 2005). § X →< γ, α, ∼>
§ X represents nonterminal, γ, α represents both terminal and nonterminal strings and
§ ∼ represents one-to-one correspondence between occurrences of nonterminal in γ and α.
Hierarchical PB-SMT
15
![Page 16: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/16.jpg)
§ There exist two additional rules called “glue rule” or “glue grammar” : § S →< SX,SX > § S →< X, X >
§ These rules are used when § no rules could match or § the span exceeds a certain length
§ These rules simply monotonically connect translations of two adjacent blocks together.
Hierarchical PB-SMT
16
![Page 17: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/17.jpg)
§ The hybrid word alignment provided better quality good alignment table
§ During phrase extraction, the system can automatically handle and estimate § Word insertion error (by considering one-to-many alignment
link) § Word deletion error (by considering many-to-one alignment
link) § Lexical error ( estimating high lexical weighting during model
estimation) § Word ordering (Using Hierarchical model facilitates word
ordering, because it uses formally hierarchical phrases)
Benefits
17
![Page 18: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/18.jpg)
§ Dataset § MateCat data contains 312K data § After cleaning 213,795 Parallel MT-PE data
§ Training data: 211,795 § Development set Data: 1000 § Test set Data: 1000
§ Monolingual Data consist of PE data and Europarl clean data
Experiments
18
![Page 19: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/19.jpg)
§ Experimental Setup § 5-gram Language Model [KenLM] § Phrase length 7 § Hierarchical PB-SMT [Moses]
§ Maximum Chart Span 100 § Minimum Chart Span 20 § use Good Turing discounting of the phrase translation
probabilities § Filtered Phrase table for faster decoding
Experiments(Cond…)
19
![Page 20: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/20.jpg)
§ System Tuning with Development data § MERT
§ 60% TER and 40% BLEU § Maximum Iteration 25
§ MIRA § Batch-mira
§ Tuned parameter for monolingual APE System MT-ITà APE-IT: § Language Model = 0.0569997 § Word Pnalty = 0.118199 § PhrasePenalty = 0.127955 § Translation Model0 = 0.148562 -0.0700695 0.275438 0.0861944 § Translation Model1 = 0.116582
Experiments(Cond…)
20
![Page 21: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/21.jpg)
§ The evaluation process has been carried out into two directions § Automatic and § Manual evaluation with 4 expert translators § The automatic evaluation provides significant improvement
using three automatic evaluation metric such as § BLEU, § TER and § METEOR.
§ Our evaluation using human judgments shows that the APEs always improve the overall translation adequacy and it improved 7% of post-edited sentences.
Evaluations
21
![Page 22: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/22.jpg)
§ Automatic sentence level evaluation over 145 Sentences
Evaluations
22
Metric APE System better than Google translation
Google Translation better than APE System
% Improvement over 1000 sentences
% Loss over 1000 sentences
Sentence BLEU 91 54 9.1% 5.4%
![Page 23: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/23.jpg)
§ Automatic evaluation over 1000 sentences
Evaluations
23
Metric APE System Google Translation % Relative Improvements
BLEU 63.87 61.26 4.2%
TER 28.67 30.94 7.9%
METEOR 73.63 72.73 1.2%
![Page 24: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/24.jpg)
§ Manual Evaluation with 4 expert translators for 145 sentences, (EN=English, DE= German, FR= French, ES = Spanish, CA= Catalan, IT= Italian)
Human Evaluations
24
Qualifications of Translators Expertise Experience APE System
Google Translation
Uncertain
Translator 1 Degree in Translation EN,FR à IT. 1 years 91 22 32
Translator 2 Degree in Linguistic and Cultural Studies
EN, FR, ES, CA à IT
2 years 57 17 71
Translator 3 Degree in European Languages and Cultures
EN, FR, ES, DE à IT
1 years 72 37 36
Translator 4 Degree in Business & Administration EN à IT 1 years 65 23 58
Average 71 25 49
% Improvements
7.1% 2.5% 4.9%
![Page 25: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/25.jpg)
0
20
40
60
80
100
Translator 1
Translator 2
Translator 3
Translator 4
APE System
Google Translation
Uncertain
Human Evaluations
25
![Page 26: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/26.jpg)
Over all Evaluation of the 1000 sentences in total
Equal output given by MT and APE APE Choosen by Human MT Choosen by Human Uncertain
Human Evaluations
26
![Page 27: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/27.jpg)
Human Evaluation
27
0
1
2
3
4
5
6
7
8
APE Choosen by Human
MT Choosen by Human
Uncertain
% of 145 sentences over 1000 sentence, the rest are ties between MT and APE
% of 145 sentences over 1000 sentence, the rest are ties between MT and APE
![Page 28: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/28.jpg)
§ From 145 sentences evaluated by 4 translators the overall measure is (based on at lease one translator agree to vote a particular system) § APE: 105 § UN: 94 § GT: 62
Human Evaluation
28
![Page 29: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/29.jpg)
§ The proposed APE system was successful in improving over the baseline MT system performance.
§ Although some APE translations were deemed as worse than the original MT output by the human evaluators, however, they were very few in numbers.
§ Manual inspection revealed that these lower quality APE translations are very similar to the original MT translations.
§ These worse translations can be avoided by adding more features (e.g., syntactic or semantic) which can also improve the overall performance of the post-editing system.
§ The presented system can easily be plugged into any state-of-the-art system and the runtime complexity is similar to that of other statistical MT systems.
Conclusions
29
![Page 30: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/30.jpg)
§ In future, we will try bootstrapping strategies for further tuning the model and add more sophisticated features beyond lexical level.
§ We will improve our hybrid word alignment algorithm by incorporating additional word aligners such as fastaligner, Anymaligner, etc.
§ We also want to extend the system by incorporating source knowledge as well as improving word ordering by using Kendall reordering method.
§ To consolidate the user evaluation, we will measure inter-annotator agreement.
§ We will also evaluate our system in a real-life setting in commercial environment to analyse time gain and productivity gain provided by automatic post-editing.
Future Works
30
![Page 31: ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015](https://reader031.vdocuments.pub/reader031/viewer/2022030311/58ee87e61a28ab527b8b4579/html5/thumbnails/31.jpg)
§ Thank you!
31