Measuring MT Quality
What we aim to cover today?
What is KantanMT.com?
State of the Nation
Current MT Quality Measurements
Comparative Quality Measurement
Future Directions
Predictive Quality Measurements
Conclusions
Q&A
45 minutes
What is KantanMT.com? Cloud-based SMT/Hybrid
Highly scalable
Inexpensive to operate
Quick to access, learn and deploy
Our Vision To put Machine Translation…
Customization
Improvement
Deployment… into your hands
Your Benefits Faster Project Turn-arounds
Increased Productivity
Lower Costs
Increased Production Capacity
Active KantanMT Engines
6,341
Training Words Uploaded
46,051,110,634
Member Words Translated
538,291,925
Fully Operational 16 months
Measuring MT Quality
The Quality & MT Relationship
Let’s agree a model for defining quality!
Taking into consideration quality of MT outputs and level of quality defined by your clients.
Quality Target (defined by client)
No Quality (baseline)
Measuring MT Quality
Attributes of Quality
Fluency
Adequacy
Productivity
Acceptability
Language Attributes Task-oriented Attributes
Language Task
Attributes of Quality – Model
Adequacy Meaning of generated texts
expressed in source/target
Fluency Comprehensibility & readability
Factors include
Grammar errors
word selection
syntax
Productivity Post-editing speed
Acceptability Fit-for-purpose measurement
Usable translations within the context of the end user/client
Measuring MT Quality
Attributes of Quality
Fluency
Adequacy
Productivity
Acceptability
Language Attributes Task-oriented Attributes
Language Task
Attributes of Quality – Model
Adequacy Meaning of generated texts
expressed in source/target
Fluency Comprehensibility & readability
Factors include
Grammar errors
word selection
syntax
Productivity Post-editing speed
Acceptability Fit-for-purpose measurement
Usable translations within the context of the end user/client
Translation Style Business Model
Measuring MT Quality
Attributes of Quality
Fluency
Adequacy
Productivity
Acceptability
Language Task
Attributes of Quality – Model
Translation Style Business Model
FuzzyMatch
Language Attributes Task-oriented Attributes
Measuring MT Quality
Types of MT Quality Measurement
Comparative Measurements
Uses a reference translation to calculate:-
Word recall & precision
Text Similarities
Word Order correlations
Linguistic similarities
Approach
Comparing MT output to a reference known translation
Measuring MT Quality
Comparative Measurements
F-Measure Recall & Precision Metric
Flaw: no penalty for reordering
Reference Translation
MT Output
Precision
correctMT-Len
66%
Recall
correctRef-Len
80%
F-Measure
Precision * Recall(Precision + Recall) /2
73%
Measuring MT Quality
Comparative Measurements
TER (Tranlsation Error Rate) Min number of edits to transform output to match reference
Levenshtein distance measure
General indicator of Post-Editing Effort
Reference Translation
MT Output
TER
Substitutions + insertions + deletionsReference-length
Measuring MT Quality
Comparative Measurements
BLEU Score
Put simply – measures how many words overlap, giving higher scores to sequential words
High correlation between BLEU and human judgement of translation quality
Reference Translation
MT Output
Measuring MT Quality
F-Measure Score
Recall & Precision calculation
Closely linked to the relevancy of word systems
Comparative Measurements
Kantan BuildAnalytics™
Measuring MT Quality
BLEU Score
Improvement upon F-Measure
Takes word-order into consideration
Linked to a sense of translation ‘fluency’
Comparative Measurements
Kantan BuildAnalytics™
Measuring MT Quality
Comparative Measurements
TER Score
A method to help predict the post-editing effort
TER is quick to use and correlates highly with actual post-editing effort
Kantan BuildAnalytics™
Measuring MT Quality
Comparative Measurements
Fluency
Adequacy
Productivity
Acceptability
Language Task
F-Measure TER
NIST
GTM
BLEU
METEOR
Attributes of Quality – Model
Translation Style Business Model
Language Attributes Task-oriented Attributes
Measuring MT Quality
Conclusions: Comparative
Measurements
Useful for
Engine Development Baseline measurements
Determination of ‘possible’ engine quality and relevancy
Reference set of comparative translations required Does not work on unseen translations
Of limited use in determining PE effort
Resources
Costs Kantan BuildAnalytics™
Measuring MT Quality
Types of MT Quality Measurement
Predictive Measurements
No reference texts required
Used to predict project
Scope
Cost
Resources
Billables / Chargeables –Profit
Like FuzzyMatch for MT!
Measuring MT Quality
Predictive Measurements
Quality Estimation Score
Predicts quality of translations from MT engine
Correlates closely to post-editing effort
Creates potential for tiered pricing model
Measuring MT Quality
Predictive Measurements
KantanAnalytics™ The Power of 2! Combined TM & MT
measurements Predictive, not comparative
Benefits Tiered Pricing Model Prioritise PE activity Schedule Resources Cost Seamlessly integrated into
all CAT tools
KantanAnalytics™ - a predictive quality estimation technology
Measuring MT Quality
Predictive Measurements
Fluency
Adequacy
Productivity
Acceptability
Language Task
F-Measure TER
NIST
GTM
BLEU
METEOR
Attributes of Quality – Model
Translation Style Business Model
Language Attributes Task-oriented Attributes
KantanAnalytics™ - MT Quality Estimation
aka FuzzyMatch
Measuring MT Quality
Predictive Measurements
NISTGTMBLEU
F-Measure
TER METEOR
MT Quality Estimation
MT
De
velo
per
s
Pro
du
ctio
n
Measuring MT Quality
Conclusions
Automated scores are only useful for MT developers
No practical use to consumers of MT services
Predictive Quality Estimation is a must have for MT vendors
It creates the potential for tiered pricing, predictive quality and reliable MT outputs
The most progressive MT systems provide both measurement types!
MT systems that don’t are dinosaurs!