translation memory program machine...

17
Translator Workbench Computer Aided Translation (CAT), Machine Aided Human Translation (MAHT) วยในการแปล ลดงานอน CAT online-dictionary, language spell checker, translation memory (TM) technology ปกขายงแบบ freelance edition team/enterprise edition TM ใประโยชในการแปลอางๆ หองานๆ เพราะ งานใหกหอคายเม เอหาคายเม ไควรเม แปลใหหมด Translator Memory program AI technology เน neural network, fuzzy logic, วยในการแปล โดยจากอลการแปลานมา fuzzy matching เอหาประโยคใกเยงนฉบด สามารถ concord หาประโยคนฉบและประโยคแปลไ terminology management เอเบ glossary ของ องการ support อลใน format างๆ MS Word, RTF, HTML, SGML, XMLและ convert อลไปมาไ, แยก อความจาก format เอแปลเฉพาะอความ Translation Memory Program support การ alignment ระหาง document แปลไแว เอสางฐานอล TM วยแปลงานๆ คายเม นหาวอางการแปลของสงย ตรวจสอบาการใแปลสเสมอ วอางโปรแกรม SDL Trados freelance $545 ($129/yr), Transit NXT Freelance (225 euro/yr), Déjà Vu X3 Pro 420 euro, WordFast Classic/Pro 400 euro (free 500 translation units) WordFisher (free but not compatible with Word 2007 and above), Machine Translation Milestones in the history of MT (http://www.hutchinsweb.me.uk/SUSU-2007-1- ppt.pdf) Outline of the evolution of machine translation (http://www.hutchinsweb.me.uk/Aslib-2008- ppt.pdf) Machine translation: problems and issues (http://www.hutchinsweb.me.uk/SUSU-2007-2- ppt.pdf)

Upload: others

Post on 02-Sep-2019

16 views

Category:

Documents


0 download

TRANSCRIPT

Translator Workbench

Computer Aided Translation (CAT), Machine Aided Human Translation (MAHT) ช่วยในการแปล ลดงานซ้ำซ้อน CAT มีทั้ง online-dictionary, language spell checker, translation memory (TM) technology ปกติมีขายทั้งแบบ freelance edition กับ team/enterprise edition TM ใช้ประโยชน์ในการแปลคู่มือต่างๆ หรืองานซ้ำๆ เพราะงานใหม่มักซ้ำหรือคล้ายเดิม เนื้อหาก็คล้ายเดิม ไม่ควรเริ่มแปลใหม่หมด

Translator Memory programใช้ AI technology เช่น neural network, fuzzy logic, ช่วยในการแปล โดยดูจากข้อมูลการแปลที่ผ่านมา ใช้ fuzzy matching เพื่อหาประโยคที่ใกล้เคียงต้นฉบับที่สุด สามารถ concord หาคู่ประโยคต้นฉบับและประโยคแปลได้ ทำ terminology management เพื่อเก็บ glossary ของคำที่ต้องการ support ข้อมูลใน format ต่างๆ ทั้ง MS Word, RTF, HTML, SGML, XMLและ convert ข้อมูลไปมาได้, แยกข้อความจาก format เพื่อแปลเฉพาะข้อความ

Translation Memory Programsupport การทำ alignment ระหว่าง document ที่แปลไว้แล้ว เพื่อสร้างฐานข้อมูล TM ช่วยแปลงานซ้ำๆ คล้ายเดิม ค้นหาตัวอย่างการแปลของคำที่สงสัย ตรวจสอบว่ามีการใช้คำแปลสม่ำเสมอ ตัวอย่างโปรแกรม SDL Trados freelance $545 ($129/yr), Transit NXT Freelance (225 euro/yr), Déjà Vu X3 Pro 420 euro, WordFast Classic/Pro 400 euro (free 500 translation units) WordFisher (free but not compatible with Word 2007 and above),

Machine TranslationMilestones in the history of MT (http://www.hutchinsweb.me.uk/SUSU-2007-1-ppt.pdf) Outline of the evolution of machine translation(http://www.hutchinsweb.me.uk/Aslib-2008-ppt.pdf) Machine translation: problems and issues(http://www.hutchinsweb.me.uk/SUSU-2007-2-ppt.pdf)

คำย่อที่ใช้MAHT : Machine Aided Human Translation CAT : Computer Aided Translation TWB : Translator's Work BenchTW : Translator Workstation HAMT : Human Aided Machine Translation FAHQT : Fully Automatic High Quality Translation SMT : Statistical Machine Translation EBMT : Example-Based Machine Translation

Google Translate

MT มีอนาคต?มีใครต้องการงานแปลไม่มีคุณภาพแบบนี้? มี MT ใช้งานอยู่จริงๆ หรือ มีหนทางที่ MT จะแปลได้ดีมากขึ้น? โลกธุรกิจของการแปลเป็นอย่างไร Localization (เทศานุวัตน์) คืออะไร อนาคตของการแปลจะเป็นอย่างไร Monolingual translator?

Copyright © 2009, Asia Online Pte Ltd

Evolution of Quality Machine Translation

Human Driven, Hybrid Rules, Process, Syntax and Clean Data SMT

Clean Data SMT

Syntax-based SMT

Quality

2002 2003 2004 2005 2006 2007

Rules Based MT

2008 20101980

Phrase-based SMT

Rules +

Statistics

Copyright © 2009, Asia Online Pte Ltd

(US $ Millions)

2007 2008 2009 2010 2011 201245 65 95 143 217 336

Current Machine Translation Forecast

12,000

14,250

16,70019,620

21,650

24,00025,000

20,000

15,000

10,000

0

2,5002,0001,5001,000

500524

8821,484Adjusted Machine Translation

Quality Improved Forecast

Market Inflection Point

Current World Wide Translation Market Forecast

Global Translation Market

• Market is maturing rapidly and ready to grow as quality increases, creating a market inflection point. • Existing forecasts do not take into account new content that simply cannot be translated because of

current technology, manpower, time and cost limitations. • As costs lower and technology improves new content will quickly start to be translated and expand the

market overall. • Google has made machine translation acceptable and topical. Google is still “gist” quality, while Asia

Online focuses on creating near human quality.

Copyright © 2009, Asia Online Pte Ltd

Expanding the Reach of Translation

User Generated Content

Support / Knowledge Base

Communications

Enterprise Information

User Documentation

User Interface

ProductsCorporate

Part

ly M

ultil

ingu

al Corporate Brochures

Product Brochures

Software Products

Manuals / Online Help

HR / Training / Reports

2,000

10,000

50,000

200,000

500,000

10,000,000

20,000,000+

50,000,000+

Email / IM

Call Center / Help Desk

Blogs / Reviews

Example WordsHuman

Machine

Problem:

Solution:

• Only 0.5% of what needs to be translated today is being translated due to cost and time constraints. • Humans cost too much and take too long. The average human translates at 2,000-3,000 words per day.

• Machine translation offers an alternative to human that is faster and lower cost and can deliver huge volumes of content quickly – millions of words per day. • In many cases, “good enough” quality is sufficient. Translation quality has improved significantly. • Where “gold-standard” translations are required, machine translation + human post editing can deliver the same content quality significantly faster.

Existing FocusNew Markets

Copyright © 2009, Asia Online Pte Ltd

Copyright © 2009, Asia Online Pte Ltd

Word Segmentation

เขาเป็นคนใช้ของเธอ

เขาเป็นคนใช้ของเธอ

ตากลม

ตากลม

He is your servant

He is the one who used your things

Round eyes

Take the air

เขาเป็นคนใช้ของเธอ

Copyright © 2009, Asia Online Pte Ltd

Sentence Alignment• Next step – match sentences that have been

translated by human between English and Thai.

• Semi-automated. Requires human verification using a crowd-sourcing approach on the web.

Copyright © 2009, Asia Online Pte Ltd

How the SMT Decoder WorksOriginal

Spanish InputStatistical Analysis

Possible Output text in Broken English

There could be thousands of combinations. Uses

word alignment induced phrases from bilingual

corpus.

1. What hunger have I 2. Hungry I am so 3. I am so hungry 4. Have I that hunger 5. … Statistical

Analysis

Statistical Analysis

I am so hungry

Best Word Combination

Statistically calculate the possible word and phrase

matches for the translation.

Statistically determine best

word combination

using the monolingual

corpus.

Validate the syntax to ensure best possible grammar. Where appropriate reinsert

additional information such as subject.

Output Translation

I am so hungry

Que hambre

Copyright © 2009, Asia Online Pte Ltd

Initial System put into production

The Quality Evolution Cycle

All users allowed to suggest changes which go through vetting process

Changes are collected and added to initial corpus to drive continuous retraining

Trained Internal Experts begin initial clean up and correction process

Expert Users also allowed to make changes

Copyright © 2009, Asia Online Pte Ltd

Translation Quality

Quality Evolution via Error Analysisand Correction Cycle

Correct Mistranslation Syntax/Grammar Terminology Spelling Punctuation

Initial System

Spelling and Terminology

Human Feedback

Targeted Corrections of Bad Learning

Correct

Correct

Correct

Correct

Key

Human Feedback can raise the raw output to previously unseen quality levels

ใครจะอยากได้งานแปลไม่ดี? Harry Potter ใหม่ล่าสุด ตอนอวสาน ออกแล้ววันนี้ รออ่านฉบับแปลโดย นักแปลชื่อดัง …อีก 6 เดือนให้หลัง? ตามอ่านฉบับช่วยกันแปลโดย internet users กลุ่มหนึ่ง 6 วันต่อมา?

ปัจจุบันและอนาคตของนักแปล

ปัจจุบันและอนาคตของนักแปล

หาก MT แปลงานได้คุณภาพระดับหนึ่ง เรายังต้องการนักแปล หรือต้องการแค่คน edit/rewrite งานแปลจาก MT (monolingual translator) MT เหมาะกับงานทุกงาน? นักแปลแบบต่างๆ แปลวรรณกรรม แปลงานธุรกิจ ธุรกิจการแปลต้องการเครื่องมือช่วยการแปลให้รวดเร็ว

2007-06-18 Translation Memory System (TMS) 24

Translation Memory Systems

Presentation by35 Melina Takanen & Julianna Ekert

CAT Prof. Thorsten Trippel University of Bielefeld Summer Term 2007 18th June 2007

2007-06-18 Translation Memory System (TMS) 25

Table of contents! Introduction ! Definition ! Characteristics of TMS ! The translation workflow ! Reasons for using TMS

2007-06-18 Translation Memory System (TMS) 26

Definition!Classification of translation types

2007-06-18 Translation Memory System (TMS) 27

Definition!Classification of translation types

Translation Memory Systems

2007-06-18 Translation Memory System (TMS) 28

Definition of Machine Translation Systems (MTS)! Machine translation combines a number of

fields of study such as lexicography, linguistics, computational linguistics, computer science and language engineering.

2007-06-18 Translation Memory System (TMS) 29

Definition of Machine Translation Systems (MTS)! Machine translation combines a number of

fields of study such as lexicography, linguistics, computational linguistics, computer science and language engineering.

! It is based on the hypothesis that natural languages can be fully described, controlled and mathematically coded.

2007-06-18 Translation Memory System (TMS) 30

Definition of Translation Memory Systems (TMS)

! TMS is a multilingual text archive containing multilingual texts, allowing storage and retrieval of aligned multilingual text segments against various search conditions.

2007-06-18 Translation Memory System (TMS) 31

Definition of TMS cont.! Unlike machine translation systems which

generate translations automatically, translation memory systems allow professional translators to be in charge of the decision-making whether to accept or reject a term or an equivalent phrase suggested by the system during the translation process.

2007-06-18 Translation Memory System (TMS) 32

Definition of TMS cont.

!Translators can also build their own ‘memory’.

2007-06-18 Translation Memory System (TMS) 33

Characteristics of TMS

! Perfect matching ! Fuzzy matching ! Filter ! Segmentation ! Alignment

2007-06-18 Translation Memory System (TMS) 34

Perfect matching!occurs when a source-language (SL)

segment is completely identical including spelling, punctuation and inflections, to the old segment found in the database, that is in the translation ‘memory’.

2007-06-18 Translation Memory System (TMS) 35

Perfect matching example

=

2007-06-18 Translation Memory System (TMS) 36

Fuzzy matching! Unlike a perfect match, a fuzzy match occurs

when an old and a new SL segment are similar but not exactly identical. Even a very small difference such as punctuation leads to a fuzzy match.

2007-06-18 Translation Memory System (TMS) 37

Fuzzy matching example

2007-06-18 Translation Memory System (TMS) 38

Filter! It is a feature that converts a SL text from one

format into another giving the translator the flexibility to work with texts of different formats:

! Text without graphics ! Text without HTML code signs

2007-06-18 Translation Memory System (TMS) 39

Filter example

HTML code signs

No HTML code signs

2007-06-18 Translation Memory System (TMS) 40

Segmentation! It is a process of breaking a text up into units

consisting of a word or a string of words that is linguistically acceptable.

2007-06-18 Translation Memory System (TMS) 41

Segmentation! It is a process of breaking a text up into units

consisting of a word or a string of words that is linguistically acceptable.

! Especially useful for: ! Headings, ! Lists, ! Bullet points

2007-06-18 Translation Memory System (TMS) 42

Segmentation example

one segment

2007-06-18 Translation Memory System (TMS) 43

Alignment! It is a process of binding a SL segment to its

corresponding TL segment.

*SL = Source Language **TL = Target Language

2007-06-18 Translation Memory System (TMS) 44

Alignment! It is a process of binding a SL segment to its

corresponding TL segment.

! The purpose is to create a new translation memory base or to add to an existing one.

2007-06-18 Translation Memory System (TMS) 45

Alignment! It is a process of binding a SL segment to its

corresponding TL segment.

! The purpose is to create a new translation memory base or to add to an existing one.

! The corresponding pairs of SL & TL are called ‘translation units’.

2007-06-18 Translation Memory System (TMS) 46

Alignment example

=

2007-06-18 Translation Memory System (TMS) 47

The translation workflow

2007-06-18 Translation Memory System (TMS) 48

The translation workflow

2007-06-18 Translation Memory System (TMS) 49

The translation workflow

2007-06-18 Translation Memory System (TMS) 50

The translation workflow

2007-06-18 Translation Memory System (TMS) 51

The translation workflow

2007-06-18 Translation Memory System (TMS) 52

The translation workflow

2007-06-18 Translation Memory System (TMS) 53

The translation workflow

2007-06-18 Translation Memory System (TMS) 54

The translation workflow

2007-06-18 Translation Memory System (TMS) 55

The translation workflow

2007-06-18 Translation Memory System (TMS) 56

The translation workflow

2007-06-18 Translation Memory System (TMS) 57

Reasons for using TMS! avoids having to re-translate anything that has

been already translated,

2007-06-18 Translation Memory System (TMS) 58

Reasons for using TMS! avoids having to re-translate anything that has

been already translated,

! allows workgroups to share translation that were previously done,

2007-06-18 Translation Memory System (TMS) 59

Reasons for using TMS! avoids having to re-translate anything that has

been already translated,

! allows workgroups to share translation that were previously done,

! allows translators to build up a precious database of translations.

2007-06-18 Translation Memory System (TMS) 60

Reasons for using TMS

! Of course, it is possible to create as many translation memories as needed, for example for different subjects and/or clients, and/or language pairs.

Translation Memory eXchange (TMX)

• TMX is the vendor-neutral open XML standard for the exchange of Translation Memory (TM) data created by Computer Aided Translation (CAT) and localization tools.

• TMX is developed and maintained by OSCAR (Open Standards for Container/Content Allowing Re-use), a LISA Special Interest Group

• TMX is developed and maintained by OSCAR (Open Standards for Container/Content Allowing Re-use), a LISA Special Interest Group

<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"

datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>

<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"

datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote">

</header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>

<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"

datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>

<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"

datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>

<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"

datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>

<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"

datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01">

<seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>

TM Software

• Transit NXT http://www.star-ts.com/transit-nxt-translation-memory.shtml

• SDL Trados http://www.trados.com/en/

• Déjà Vu http://www.atril.com/

• Wordfast http://www.wordfast.net

• Wordfast anywhere http://www.freetm.com/

• OmegaT http://www.omegat.org/en/omegat.html