mt domain customization – conditions and benefits. chris wendt (microsoft)

23
Translator

Upload: taus-enabling-better-translation

Post on 22-Jan-2018

418 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Translator

Page 2: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 3 3

Page 4: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 5

•Learn word and phrase alignments from “parallel” data

Page 5: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 6

••

••

Page 6: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 7

• f e*e* = argmaxe P(e | f)

•P(e | f) = P(f | e) ∙ P(e) / P(f)

argmaxe P(e | f) = argmax P(f | e) ∙ P(e)

•P(f | e) channel translation model

•P(e) language model

Page 7: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 8

Start With

•Parallel sentences•

•Monolingual data

•Decoding Algorithm

Build These Components

•Translation Model •

•Language Model – P(E)

•Decoder•

Page 8: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 9

Translation Model

Target Language

Model

Other Models

Microsoft s vast language knowledge

Translation Model

Target Language

ModelYour and your community s language knowledge

Translator service and API

Your Applications

Your test and tuning documents Lambda weight vector

Page 9: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 10

Your site or application

Translator Service

Supply Corrections

Consume TranslationsCollaborative Translations

Store

Microsoft Translator Hub

CustomModelsGeneric

Models

Your own, previously translated documents

Supply Documents

Build custom models

Import Correctionsfor training

Page 10: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 11

Your site or application

Translator Service

Supply Corrections

Consume TranslationsCollaborative Translations

Store

Microsoft Translator Hub

CustomModelsGeneric

Models

Your own, previously translated documents

Supply Documents

Build custom models

Import Correctionsfor training

Translate()

AddTranslation()

GetTranslations()

GetUserTranslations()

Speak()

Detect()

BreakSentences()

Thorough customization

Retrain every 2 months,

or 20000 segments

Continuous Improvement

Page 11: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 12

What goes in What it does Rules to follow

Be strict. Compose them to be optimally

representative of what you are going to

translate in the future.Calculate the BLEU score –

just for you.

Dictionaries Forces the given

translation with a

probability of 1.

Be restrictive. Safe to use only for

compound nouns and named entities.

Better to not use and let the system learn.

Build the translation

model aka phrase table.

Teaches how to translate.

Be liberal. Any in-domain human

translation is better than MT. Add and

remove documents as you go and try to

improve the score.

Build the target language

model. Improve grammar

and fluency.

Be liberal. Use any in-domain target

language material you can get.

Page 12: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 13

•• Humans can easily detect 0.5 to 1.0 points

• Faster post-editing

Higher document comprehension

•• Small: Higher improvement within the domain

• Large: Better suited for input variability Better exploit of training docs

• Better to build a larger domain (lower BLEU delta)

Page 13: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 14

Page 14: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 15

Quality

SpeedPrice

You can only have

twoP3

Page 15: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 16

Post-Editing

•Goal: Human translation quality

•Increase human translator’s productivity

•In practice: 0% to 25% productivity increase

Varies by content, style and language

Raw publishing

Goals:

Good enough for the purpose

Speed

Cost

Publish the output of the MT system directly to end user

Best with bilingual UI

Good results with technical audiences

Cost-effective way for inbound material

Triage

Analysis and classification

P3 – Post-Publish Post-Editing

Know what you are human translating, and why

Make use of communityDomain experts

Enthusiasts

Employees

Professional translators

Best of both worldsFast

Better than raw

Always current

Page 16: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 17

Assimilation Dissemination Post-Edit

Use customized machine translation

Never miss a chance to collect a human edit

Make the source visible on demand Show the source

Show domain-relevant dictionaries

Apply TM with 100% Apply TM with 80%

Reveal alternatives

Publish raw first, collect human feedback Use modern, collaborative TM

systems (i.e. MemSource)

Page 17: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 1818

Page 18: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 19

•• Deep Neural Networks (>30% in ASR)

• Recurrent Neural Networks (1-6 BLEU)

•• Filtering, domain adaptation

Page 19: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 20

Page 20: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 21

Page 23: MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Slide 24