02 qe lucia specia

62
8/12/2019 02 Qe Lucia Specia http://slidepdf.com/reader/full/02-qe-lucia-specia 1/62 Translation Quality  QTLaunchPad Project  Quality Estimation  State of the art in QE  Open issues  Conclusions Translation Quality Assessment: Evaluation and Estimation Lucia Specia University of Sheffield [email protected] 9 September 2013 Translation Quality Assessment: Evaluation and Estimation  1 / 33

Upload: alsharawi

Post on 03-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 1/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Translation Quality Assessment:

Evaluation and Estimation

Lucia Specia

University of [email protected]

9 September 2013

Translation Quality Assessment: Evaluation and Estimation   1 / 33

Page 2: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 2/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

“MT Evaluation is better understood than MT” (Carbonelland Wilks, 1991)

Translation Quality Assessment: Evaluation and Estimation   2 / 33

Page 3: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 3/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Outline

1   Translation Quality

2   QTLaunchPad Project

3   Quality Estimation

4   State of the art in QE

5   Open issues

6   Conclusions

Translation Quality Assessment: Evaluation and Estimation   3 / 33

Page 4: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 4/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Outline

1   Translation Quality

2   QTLaunchPad Project

3   Quality Estimation

4   State of the art in QE

5   Open issues

6   Conclusions

Translation Quality Assessment: Evaluation and Estimation   4 / 33

Page 5: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 5/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

What does  quality  mean?

Fluent?Adequate?Easy to post-edit?

Translation Quality Assessment: Evaluation and Estimation   5 / 33

Page 6: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 6/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

What does  quality  mean?

Fluent?Adequate?Easy to post-edit?

Quality for  whom?

End-userMT-system (tuning)Post-editorOther applications (e.g. CLIR)

Translation Quality Assessment: Evaluation and Estimation   5 / 33

T l i Q li QTL hP d P j Q li E i i S f h i QE O i C l i

Page 7: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 7/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

What does  quality  mean?

Fluent?Adequate?Easy to post-edit?

Quality for  whom?

End-userMT-system (tuning)Post-editorOther applications (e.g. CLIR)

Quality for  what?

Internal communicationsDissemination (publishing)Gisting (Google Translate)Draft translations (light vs heavy post-editing)MT system improvement (diagnosis)

Translation Quality Assessment: Evaluation and Estimation   5 / 33

T l ti Q lit QTL hP d P j t Q lit E ti ti St t f th t i QE O i C l i

Page 8: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 8/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

ref   Do  not  buy this product, it’s their craziest invention!

sys  Do buy this product, it’s their craziest invention!

Translation Quality Assessment: Evaluation and Estimation   6 / 33

Translation Quality QTLaunchPad Project Quality Estimation State of the art in QE Open issues Conclusions

Page 9: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 9/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

ref   Do  not  buy this product, it’s their craziest invention!

sys  Do buy this product, it’s their craziest invention!

Severe  if end-user does not speak source language

Trivial to post-edit by translators

Translation Quality Assessment: Evaluation and Estimation   6 / 33

Translation Quality QTLaunchPad Project Quality Estimation State of the art in QE Open issues Conclusions

Page 10: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 10/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

ref   Do  not  buy this product, it’s their craziest invention!

sys  Do buy this product, it’s their craziest invention!

Severe  if end-user does not speak source language

Trivial to post-edit by translators

ref   The battery   lasts  6 hours and it can be  fully recharged

in  30 minutes.

sys   Six-hours battery,  30 minutes  to   full charge last.

Translation Quality Assessment: Evaluation and Estimation   6 / 33

Translation Quality QTLaunchPad Project Quality Estimation State of the art in QE Open issues Conclusions

Page 11: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 11/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

ref   Do  not  buy this product, it’s their craziest invention!

sys  Do buy this product, it’s their craziest invention!

Severe  if end-user does not speak source language

Trivial to post-edit by translators

ref   The battery   lasts  6 hours and it can be  fully recharged

in  30 minutes.

sys   Six-hours battery,  30 minutes  to   full charge last.

Ok  for gisting - meaning preserved

Very costly  for post-editing if style is to be preserved

Translation Quality Assessment: Evaluation and Estimation   6 / 33

Translation Quality QTLaunchPad Project Quality Estimation State of the art in QE Open issues Conclusions

Page 12: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 12/62

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Overview

How do we  measure  quality?

Human metrics: error counts (which?), ranking,acceptability, 1-N fluency/adequacy

Automatic metrics based on human  references: (BLEU,METEOR, TER, etc.

Semi-automatic metrics based on   post-editions: HTER,PE time, eye-tracking, etc.

Translation Quality Assessment: Evaluation and Estimation   7 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 13: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 13/62

Q y Q j Q y Q p

Overview

How do we  measure  quality?

Human metrics: error counts (which?), ranking,acceptability, 1-N fluency/adequacy

Automatic metrics based on human  references: (BLEU,METEOR, TER, etc.

Semi-automatic metrics based on   post-editions: HTER,PE time, eye-tracking, etc.

Automatic metrics without references:   qualityestimation

Translation Quality Assessment: Evaluation and Estimation   7 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 14: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 14/62

y j y p

Outline

1   Translation Quality

2   QTLaunchPad Project

3   Quality Estimation

4   State of the art in QE

5   Open issues

6   Conclusions

Translation Quality Assessment: Evaluation and Estimation   8 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 15: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 15/62

QTLaunchPad project

http://www.qt21.eu/launchpad/

Multidimensional Quality Metrics (MQM) based on aspecification

Machine and human translation quality

Manual and (semi-)automatic assessmentTakes quality of  source text  into account

MT system improvement, gisting, dissemination, etc.

Translation Quality Assessment: Evaluation and Estimation   9 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 16: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 16/62

Multidimensional Quality Metrics (MQM)

Issues selected based on a given  specification  (dimensions):Language/locale

Subject field/domain

Terminology

Text TypeAudience

Purpose

Register

Style

Content correspondence

Output modality, ...

Translation Quality Assessment: Evaluation and Estimation   10 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 17: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 17/62

Multidimensional Quality Metrics (MQM)

Issue types  (core):

Translation Quality Assessment: Evaluation and Estimation   11 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 18: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 18/62

Multidimensional Quality Metrics (MQM)

Issue types:   http://www.qt21.eu/launchpad/content/high-level-structure-0

Combining issue types:TQ  = 100 − AccP  − (FluP T  − FluP S ) − (VerP T  − VerP S )

Translation Quality Assessment: Evaluation and Estimation   12 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 19: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 19/62

Outline

1   Translation Quality

2   QTLaunchPad Project

3   Quality Estimation

4   State of the art in QE

5   Open issues

6   Conclusions

Translation Quality Assessment: Evaluation and Estimation   13 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 20: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 20/62

Overview

Quality estimation  (QE): metrics that provide an

estimate  on the  quality  of unseen translations

Translation Quality Assessment: Evaluation and Estimation   14 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 21: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 21/62

Overview

Quality estimation  (QE): metrics that provide an

estimate  on the  quality  of unseen translations

Quality defined by the  data

Translation Quality Assessment: Evaluation and Estimation   14 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 22: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 22/62

Overview

Quality estimation  (QE): metrics that provide an

estimate  on the  quality  of unseen translations

Quality defined by the  data

Quality =   Can we publish it as is?

Translation Quality Assessment: Evaluation and Estimation   14 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 23: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 23/62

Overview

Quality estimation  (QE): metrics that provide an

estimate  on the  quality  of unseen translations

Quality defined by the  data

Quality =   Can we publish it as is?

Quality =  Can a reader get the gist?

Translation Quality Assessment: Evaluation and Estimation   14 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Page 24: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 24/62

Overview

Quality estimation  (QE): metrics that provide an

estimate  on the  quality  of unseen translations

Quality defined by the  data

Quality =   Can we publish it as is?

Quality =  Can a reader get the gist?

Quality =   Is it worth post-editing it?

Translation Quality Assessment: Evaluation and Estimation   14 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

O

Page 25: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 25/62

Overview

Quality estimation  (QE): metrics that provide an

estimate  on the  quality  of unseen translations

Quality defined by the  data

Quality =   Can we publish it as is?

Quality =  Can a reader get the gist?

Quality =   Is it worth post-editing it?

Quality =  How much effort to fix it?

Translation Quality Assessment: Evaluation and Estimation   14 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

O i

Page 26: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 26/62

Overview

Quality estimation  (QE): metrics that provide an

estimate  on the  quality  of unseen translations

Quality defined by the  data

Quality =   Can we publish it as is?

Quality =  Can a reader get the gist?

Quality =   Is it worth post-editing it?

Quality =  How much effort to fix it?

Quality =  What’s this translation’s MQM score?

Translation Quality Assessment: Evaluation and Estimation   14 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

F k

Page 27: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 27/62

Framework

Translation Quality Assessment: Evaluation and Estimation   15 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

F k

Page 28: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 28/62

Framework

Translation Quality Assessment: Evaluation and Estimation   15 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

F k

Page 29: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 29/62

Framework

Main components to build a QE system:

1 Definition of quality:   what to predict

2 (Human) labelled  data  (for quality)3 Features

4 Machine learning  algorithm

Translation Quality Assessment: Evaluation and Estimation   16 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

D fi iti f lit

Page 30: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 30/62

Definition of quality

Predict 1-N  absolute  scores for adequacy/fluency

Predict 1-N  absolute  scores for post-editing effort

Predict average post-editing  time  per word

Predict  relative  rankings

Predict  relative  rankings for same source

Predict  percentage of edits  needed for sentence

Predict word-level  edits  and its types

Predict  BLEU, etc. scores for document

Translation Quality Assessment: Evaluation and Estimation   17 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

D t s ts

Page 31: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 31/62

Datasets

SHEF  (several):   http://staffwww.dcs.shef.ac.uk/people/L.Specia/resources.html

LIG  (10K, fr-en):  http://www-clips.imag.fr/geod/User/marion.potet/index.php?page=download

LMSI  (14K, fr-en, en-fr, 2 post-editors):http://web.limsi.fr/Individu/wisniews/

recherche/index.html

Translation Quality Assessment: Evaluation and Estimation   18 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Features

Page 32: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 32/62

Features

Translation Quality Assessment: Evaluation and Estimation   19 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

QuEst

Page 33: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 33/62

QuEst

Goal: framework to explore features for QE

Feature extractors  for 150+ features of all types: Java

Machine learning: Gaussian Processes & scikit-learntoolkit (Python), with wrappers for a number of algorithms, grid search, feature selection

Open source:http://www.quest.dcs.shef.ac.uk/

Translation Quality Assessment: Evaluation and Estimation   20 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Outline

Page 34: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 34/62

Outline

1   Translation Quality

2   QTLaunchPad Project

3   Quality Estimation

4   State of the art in QE

5   Open issues

6   Conclusions

Translation Quality Assessment: Evaluation and Estimation   21 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Shared Task

Page 35: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 35/62

Shared Task

WMT12-13  – with Radu Soricut & Christian Buck

Translation Quality Assessment: Evaluation and Estimation   22 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Shared Task

Page 36: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 36/62

Shared Task

WMT12-13  – with Radu Soricut & Christian Buck

Sentence-  and  word-level  estimation of  PE effort

Translation Quality Assessment: Evaluation and Estimation   22 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Shared Task

Page 37: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 37/62

Shared Task

WMT12-13  – with Radu Soricut & Christian Buck

Sentence-  and  word-level  estimation of  PE effortDatasets and   language pairs:

Quality Year Languages

1-5 subjective scores WMT12   en-es

Ranking all sentences best-worst WMT12/13   en-esHTER scores WMT13   en-es

Post-editing time WMT13   en-es

Word-level edits: change/keep WMT13   en-es

Word-level edits: keep/delete/replace WMT13   en-es

Ranking 5 MTs per source WMT13   en-es; de-en

Translation Quality Assessment: Evaluation and Estimation   22 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Shared Task

Page 38: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 38/62

Shared Task

WMT12-13  – with Radu Soricut & Christian Buck

Sentence-  and  word-level  estimation of  PE effortDatasets and   language pairs:

Quality Year Languages

1-5 subjective scores WMT12   en-es

Ranking all sentences best-worst WMT12/13   en-esHTER scores WMT13   en-es

Post-editing time WMT13   en-es

Word-level edits: change/keep WMT13   en-es

Word-level edits: keep/delete/replace WMT13   en-es

Ranking 5 MTs per source WMT13   en-es; de-en

Evaluation metric:

MAE =

i =1 |H (s i ) − V (s i )|

N Translation Quality Assessment: Evaluation and Estimation   22 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Baseline system

Page 39: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 39/62

Baseline system

Features:

number of tokens in the source and target sentences

average source token length

average number of occurrences of words in the target

number of punctuation marks in source and target sentences

LM probability of source and target sentences

average number of translations per source word

% of source 1-grams, 2-grams and 3-grams in frequencyquartiles 1 and 4

% of seen source unigrams

Translation Quality Assessment: Evaluation and Estimation   23 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Baseline system

Page 40: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 40/62

Baseline system

Features:

number of tokens in the source and target sentences

average source token length

average number of occurrences of words in the target

number of punctuation marks in source and target sentences

LM probability of source and target sentences

average number of translations per source word

% of source 1-grams, 2-grams and 3-grams in frequencyquartiles 1 and 4

% of seen source unigrams

SVM regression  with RBF kernel with the parameters   γ ,    and  C 

optimised using a grid-search and 5-fold cross validation on thetraining set

Translation Quality Assessment: Evaluation and Estimation   23 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Results - scoring sub-task (WMT12)

Page 41: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 41/62

Results scoring sub task (WMT12)

System ID MAE RMSE•  SDLLW M5PbestDeltaAvg 0.61 0.75

UU best 0.64 0.79SDLLW SVM 0.64 0.78

UU bltk 0.64 0.79Loria SVMlinear 0.68 0.82

UEdin 0.68 0.82TCD M5P-resources-only* 0.68 0.82

Baseline bb17 SVR   0.69 0.82Loria SVMrbf 0.69 0.83SJTU 0.69 0.83

WLV-SHEF FS 0.69 0.85PRHLT-UPV 0.70 0.85

WLV-SHEF BL 0.72 0.86DCU-SYMC unconstrained 0.75 0.97

DFKI grcfs-mars 0.82 0.98DFKI cfs-plsreg 0.82 0.99

UPC 1 0.84 1.01DCU-SYMC constrained 0.86 1.12

UPC 2 0.87 1.04TCD M5P-all 2.09 2.32

Translation Quality Assessment: Evaluation and Estimation   24 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Results - scoring sub-task (WMT13)

Page 42: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 42/62

Results scoring sub task (WMT13)

System ID MAE RMSE•  SHEF FS 12.42 15.74

SHEF FS-AL 13.02 17.03CNGL SVRPLS 13.26 16.82

LIMSI 13.32 17.22DCU-SYMC combine 13.45 16.64DCU-SYMC alltypes 13.51 17.14

CMU noB 13.84 17.46CNGL SVR 13.85 17.28

FBK-UEdin extra 14.38 17.68FBK-UEdin rand-svr 14.50 17.73

LORIA inctrain 14.79 18.34Baseline bb17 SVR   14.81 18.22

TCD-CNGL open 14.81 19.00

LORIA inctraincont 14.83 18.17TCD-CNGL restricted 15.20 19.59

CMU full 15.25 18.97UMAC 16.97 21.94

Translation Quality Assessment: Evaluation and Estimation   25 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Outline

Page 43: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 43/62

1   Translation Quality

2   QTLaunchPad Project

3   Quality Estimation

4   State of the art in QE

5   Open issues

6   Conclusions

Translation Quality Assessment: Evaluation and Estimation   26 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Agreement between annotators

Page 44: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 44/62

g

Absolute value judgements: difficult to achieveconsistency even in highly controlled settings

30% of initial dataset discardedRemaining annotations had to be scaled

Translation Quality Assessment: Evaluation and Estimation   27 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Agreement between annotators

Page 45: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 45/62

g

Absolute value judgements: difficult to achieveconsistency even in highly controlled settings

30% of initial dataset discardedRemaining annotations had to be scaled

More objective absolute scores

Post-editing time, HTER, editsAlso subject to huge variance (WPTP 2013, Wisniewskiet. al, MT-Summit 2013)Multi-task learning to address this variance (Cohn andSpecia, ACL 2013)

Translation Quality Assessment: Evaluation and Estimation   27 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Agreement between annotators

Page 46: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 46/62

g

Absolute value judgements: difficult to achieveconsistency even in highly controlled settings

30% of initial dataset discardedRemaining annotations had to be scaled

More objective absolute scores

Post-editing time, HTER, editsAlso subject to huge variance (WPTP 2013, Wisniewskiet. al, MT-Summit 2013)Multi-task learning to address this variance (Cohn andSpecia, ACL 2013)

Relative scores

Different task altogetherWMT13: better results than reference-based metrics

Translation Quality Assessment: Evaluation and Estimation   27 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Annotation costs

Page 47: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 47/62

Active learning  to select subset of instances to be annotated(Beck et al., ACL 2013)

Translation Quality Assessment: Evaluation and Estimation   28 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Curse of dimensionality

Page 48: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 48/62

Feature selection  to identify relevant info for dataset (Shah

et al., MT Summit 2013)   ! "   #   $   %   & 

   ! "   (   $   )   * 

   ! "   (   (   !   & 

   ! "   )   +   &   + 

   ! "   (   #   &   ( 

   ! "   )   +   +   , 

   ! "   +   )   ,   & 

   ! "   )   (   #   % 

   ! "   )   )   ( 

   ! "   #   *   &   * 

   ! "   (   *   &   , 

   ! "   (   %   ,   % 

   ! "   )   %   #   ) 

   ! "   (   *   (   &     !

 "   )   (   +   * 

   ! "   +   )   *   $ 

   ! "   )   +   ,   , 

   ! "   )   (   !   & 

   ! "   #   %   , 

   !

 "   (   (   , 

   ! "   (   &

   $   + 

   ! "   )   &   %   + 

   !

 "   (   (   ,   +     !

 "   )   &   &   + 

   ! "   +   (   !   & 

   ! "   )   +   !   & 

   ! "   )   (   !   & 

   ! "   #   +   %   ( 

   ! "   (   (   *   & 

   ! "   (   &   #   , 

   ! "   )   &   !   , 

   ! "   (   #   !   ,     !

 "   )   +   !   , 

   ! "   +   (   !   , 

   ! "   )   %   (   , 

   ! "   )   %   (   , 

   ! "   #   &   +   & 

   ! "   (

   +   % 

   ! "   (   &   &

 

   ! "   )   !   %   ) 

   ! "   (   (   & 

   ! "   )   !   # 

   ! "   +   +   * 

   ! "   )   %   & 

   ! "   )   &   ,   ( 

!"+

!"+)

!"(

!"()

!")

!"))

!"#

!"#)

!"*

   -   .   /  &

   % 

   0  1   .   /  2  &  &    3 

  4  5  2  4  6   7 

   0  1   .   /

  2  &  &    3    8  9  2

  4  5   7 

   0  1   .

   /  2   !   ,  2

  6  & 

   0  1   .

   /  2   !   ,  2

  6   % 

   0  1   .

   /  2   !   ,  2

  6   + 

   0  1   .

   /  2   !   ,  2

  6  ( 

  :  1   ;   0

  &  &  2  6  &

 

  :

  1   ;   0  &

  &  2  6   %

 

<; 1= <;>?@ 1=>?@ =A

Translation Quality Assessment: Evaluation and Estimation   29 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Curse of dimensionality

Page 49: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 49/62

Feature selection  to identify relevant info for dataset (Shah

et al., MT Summit 2013)   ! "   #   $   %   & 

   ! "   (   $   )   * 

   ! "   (   (   !   & 

   ! "   )   +   &   + 

   ! "   (   #   &   ( 

   ! "   )   +   +   , 

   ! "   +   )   ,   & 

   ! "   )   (   #   % 

   ! "   )   )   ( 

   ! "   #   *   &   * 

   ! "   (   *   &   , 

   ! "   (   %   ,   % 

   ! "   )   %   #   ) 

   ! "   (   *   (   &     !

 "   )   (   +   * 

   ! "   +   )   *   $ 

   ! "   )   +   ,   , 

   ! "   )   (   !   & 

   ! "   #   %   , 

   !

 "   (   (   , 

   ! "   (   &

   $   + 

   ! "   )   &   %   + 

   !

 "   (   (   ,   +     !

 "   )   &   &   + 

   ! "   +   (   !   & 

   ! "   )   +   !   & 

   ! "   )   (   !   & 

   ! "   #   +   %   ( 

   ! "   (   (   *   & 

   ! "   (   &   #   , 

   ! "   )   &   !   , 

   ! "   (   #   !   ,     !

 "   )   +   !   , 

   ! "   +   (   !   , 

   ! "   )   %   (   , 

   ! "   )   %   (   , 

   ! "   #   &   +   & 

   ! "   (

   +   % 

   ! "   (   &   &

 

   ! "   )   !   %   ) 

   ! "   (   (   & 

   ! "   )   !   # 

   ! "   +   +   * 

   ! "   )   %   & 

   ! "   )   &   ,   ( 

!"+

!"+)

!"(

!"()

!")

!"))

!"#

!"#)

!"*

   -   .   /  &

   % 

   0  1   .   /  2  &  &    3 

  4  5  2  4  6   7 

   0  1   .   /

  2  &  &    3    8  9  2

  4  5   7 

   0  1   .

   /  2   !   ,  2

  6  & 

   0  1   .

   /  2   !   ,  2

  6   % 

   0  1   .

   /  2   !   ,  2

  6   + 

   0  1   .

   /  2   !   ,  2

  6  ( 

  :  1   ;   0

  &  &  2  6  &

 

  :  1   ;   0

  &  &  2  6   %

 

<; 1= <;>?@ 1=>?@ =A

Common feature set identified, but  nuanced subsets  forspecific datasets

Translation Quality Assessment: Evaluation and Estimation   29 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

How to use estimated PE effort scores?

Page 50: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 50/62

Do users prefer  detailed estimates  (sub-sentence level) or anoverall estimate  for the complete sentence or  not seeing

bad sentences at all?

Too much information vs hard-to-interpret scores

Translation Quality Assessment: Evaluation and Estimation   30 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

How to use estimated PE effort scores?

Page 51: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 51/62

Do users prefer detailed estimates

 (sub-sentence level) or anoverall estimate  for the complete sentence or  not seeing

bad sentences at all?

Too much information vs hard-to-interpret scores

IBM’s Goodness 

 metric

Translation Quality Assessment: Evaluation and Estimation   30 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

How to use estimated PE effort scores?

Page 52: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 52/62

Do users prefer detailed estimates

 (sub-sentence level) or anoverall estimate  for the complete sentence or  not seeing

bad sentences at all?

Too much information vs hard-to-interpret scores

IBM’s Goodness 

 metric

MATECAT project investigating it

Translation Quality Assessment: Evaluation and Estimation   30 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Feature engineering

Page 53: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 53/62

Two families of features missing in current work:

Can we benefit from contextual,  document-wide

information?

Can we predict  human translation quality?

Translation Quality Assessment: Evaluation and Estimation   31 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Feature engineering

Page 54: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 54/62

Two families of features missing in current work:

Can we benefit from contextual,  document-wide

information?

Can we predict  human translation quality?

Don’t panic, you can help!

Two (sub-) QuEst projects at MTM-2013 :-)

Translation Quality Assessment: Evaluation and Estimation   31 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Outline

Page 55: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 55/62

1   Translation Quality

2   QTLaunchPad Project

3   Quality Estimation

4   State of the art in QE

5   Open issues

6   Conclusions

Translation Quality Assessment: Evaluation and Estimation   32 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Conclusions

Page 56: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 56/62

(Machine) Translation evaluation is still an open problem

Translation Quality Assessment: Evaluation and Estimation   33 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Conclusions

Page 57: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 57/62

(Machine) Translation evaluation is still an open problemDifferent purposes/users, different needs, different notionsof  quality

Translation Quality Assessment: Evaluation and Estimation   33 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Conclusions

Page 58: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 58/62

(Machine) Translation evaluation is still an open problemDifferent purposes/users, different needs, different notionsof  quality

Quality estimation: learning of these different notions

Translation Quality Assessment: Evaluation and Estimation   33 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Conclusions

Page 59: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 59/62

(Machine) Translation evaluation is still an open problemDifferent purposes/users, different needs, different notionsof  quality

Quality estimation: learning of these different notions

Estimates have been used in  real applicationsRanking translations: filter out bad quality translationsSelecting translations from multiple MT systems

Translation Quality Assessment: Evaluation and Estimation   33 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Conclusions

Page 60: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 60/62

(Machine) Translation evaluation is still an open problemDifferent purposes/users, different needs, different notionsof  quality

Quality estimation: learning of these different notions

Estimates have been used in  real applicationsRanking translations: filter out bad quality translationsSelecting translations from multiple MT systems

Commercial  interest

Translation Quality Assessment: Evaluation and Estimation   33 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Conclusions

Page 61: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 61/62

(Machine) Translation evaluation is still an open problemDifferent purposes/users, different needs, different notionsof  quality

Quality estimation: learning of these different notions

Estimates have been used in  real applicationsRanking translations: filter out bad quality translationsSelecting translations from multiple MT systems

Commercial  interest

SDL LW: TrustScoreMultilizer: MT-Qualifier

Translation Quality Assessment: Evaluation and Estimation   33 / 33

Translation Quality   QTLaunchPad Project   Quality Estimation   State of the art in QE   Open issues   Conclusions

Conclusions

Page 62: 02 Qe Lucia Specia

8/12/2019 02 Qe Lucia Specia

http://slidepdf.com/reader/full/02-qe-lucia-specia 62/62

(Machine) Translation evaluation is still an open problemDifferent purposes/users, different needs, different notionsof  quality

Quality estimation: learning of these different notions

Estimates have been used in  real applicationsRanking translations: filter out bad quality translationsSelecting translations from multiple MT systems

Commercial  interest

SDL LW: TrustScoreMultilizer: MT-Qualifier

Interesting  open issues: join the QuEst projects!

Translation Quality Assessment: Evaluation and Estimation   33 / 33