translation memory program machine...
TRANSCRIPT
Translator Workbench
Computer Aided Translation (CAT), Machine Aided Human Translation (MAHT) ช่วยในการแปล ลดงานซ้ำซ้อน CAT มีทั้ง online-dictionary, language spell checker, translation memory (TM) technology ปกติมีขายทั้งแบบ freelance edition กับ team/enterprise edition TM ใช้ประโยชน์ในการแปลคู่มือต่างๆ หรืองานซ้ำๆ เพราะงานใหม่มักซ้ำหรือคล้ายเดิม เนื้อหาก็คล้ายเดิม ไม่ควรเริ่มแปลใหม่หมด
Translator Memory programใช้ AI technology เช่น neural network, fuzzy logic, ช่วยในการแปล โดยดูจากข้อมูลการแปลที่ผ่านมา ใช้ fuzzy matching เพื่อหาประโยคที่ใกล้เคียงต้นฉบับที่สุด สามารถ concord หาคู่ประโยคต้นฉบับและประโยคแปลได้ ทำ terminology management เพื่อเก็บ glossary ของคำที่ต้องการ support ข้อมูลใน format ต่างๆ ทั้ง MS Word, RTF, HTML, SGML, XMLและ convert ข้อมูลไปมาได้, แยกข้อความจาก format เพื่อแปลเฉพาะข้อความ
Translation Memory Programsupport การทำ alignment ระหว่าง document ที่แปลไว้แล้ว เพื่อสร้างฐานข้อมูล TM ช่วยแปลงานซ้ำๆ คล้ายเดิม ค้นหาตัวอย่างการแปลของคำที่สงสัย ตรวจสอบว่ามีการใช้คำแปลสม่ำเสมอ ตัวอย่างโปรแกรม SDL Trados freelance $545 ($129/yr), Transit NXT Freelance (225 euro/yr), Déjà Vu X3 Pro 420 euro, WordFast Classic/Pro 400 euro (free 500 translation units) WordFisher (free but not compatible with Word 2007 and above),
Machine TranslationMilestones in the history of MT (http://www.hutchinsweb.me.uk/SUSU-2007-1-ppt.pdf) Outline of the evolution of machine translation(http://www.hutchinsweb.me.uk/Aslib-2008-ppt.pdf) Machine translation: problems and issues(http://www.hutchinsweb.me.uk/SUSU-2007-2-ppt.pdf)
คำย่อที่ใช้MAHT : Machine Aided Human Translation CAT : Computer Aided Translation TWB : Translator's Work BenchTW : Translator Workstation HAMT : Human Aided Machine Translation FAHQT : Fully Automatic High Quality Translation SMT : Statistical Machine Translation EBMT : Example-Based Machine Translation
Google Translate
MT มีอนาคต?มีใครต้องการงานแปลไม่มีคุณภาพแบบนี้? มี MT ใช้งานอยู่จริงๆ หรือ มีหนทางที่ MT จะแปลได้ดีมากขึ้น? โลกธุรกิจของการแปลเป็นอย่างไร Localization (เทศานุวัตน์) คืออะไร อนาคตของการแปลจะเป็นอย่างไร Monolingual translator?
Copyright © 2009, Asia Online Pte Ltd
Evolution of Quality Machine Translation
Human Driven, Hybrid Rules, Process, Syntax and Clean Data SMT
Clean Data SMT
Syntax-based SMT
Quality
2002 2003 2004 2005 2006 2007
Rules Based MT
2008 20101980
Phrase-based SMT
Rules +
Statistics
Copyright © 2009, Asia Online Pte Ltd
(US $ Millions)
2007 2008 2009 2010 2011 201245 65 95 143 217 336
Current Machine Translation Forecast
12,000
14,250
16,70019,620
21,650
24,00025,000
20,000
15,000
10,000
0
2,5002,0001,5001,000
500524
8821,484Adjusted Machine Translation
Quality Improved Forecast
Market Inflection Point
Current World Wide Translation Market Forecast
Global Translation Market
• Market is maturing rapidly and ready to grow as quality increases, creating a market inflection point. • Existing forecasts do not take into account new content that simply cannot be translated because of
current technology, manpower, time and cost limitations. • As costs lower and technology improves new content will quickly start to be translated and expand the
market overall. • Google has made machine translation acceptable and topical. Google is still “gist” quality, while Asia
Online focuses on creating near human quality.
Copyright © 2009, Asia Online Pte Ltd
Expanding the Reach of Translation
User Generated Content
Support / Knowledge Base
Communications
Enterprise Information
User Documentation
User Interface
ProductsCorporate
Part
ly M
ultil
ingu
al Corporate Brochures
Product Brochures
Software Products
Manuals / Online Help
HR / Training / Reports
2,000
10,000
50,000
200,000
500,000
10,000,000
20,000,000+
50,000,000+
Email / IM
Call Center / Help Desk
Blogs / Reviews
Example WordsHuman
Machine
Problem:
Solution:
• Only 0.5% of what needs to be translated today is being translated due to cost and time constraints. • Humans cost too much and take too long. The average human translates at 2,000-3,000 words per day.
• Machine translation offers an alternative to human that is faster and lower cost and can deliver huge volumes of content quickly – millions of words per day. • In many cases, “good enough” quality is sufficient. Translation quality has improved significantly. • Where “gold-standard” translations are required, machine translation + human post editing can deliver the same content quality significantly faster.
Existing FocusNew Markets
Copyright © 2009, Asia Online Pte Ltd
Copyright © 2009, Asia Online Pte Ltd
Word Segmentation
เขาเป็นคนใช้ของเธอ
เขาเป็นคนใช้ของเธอ
ตากลม
ตากลม
He is your servant
He is the one who used your things
Round eyes
Take the air
เขาเป็นคนใช้ของเธอ
Copyright © 2009, Asia Online Pte Ltd
Sentence Alignment• Next step – match sentences that have been
translated by human between English and Thai.
• Semi-automated. Requires human verification using a crowd-sourcing approach on the web.
Copyright © 2009, Asia Online Pte Ltd
How the SMT Decoder WorksOriginal
Spanish InputStatistical Analysis
Possible Output text in Broken English
There could be thousands of combinations. Uses
word alignment induced phrases from bilingual
corpus.
1. What hunger have I 2. Hungry I am so 3. I am so hungry 4. Have I that hunger 5. … Statistical
Analysis
Statistical Analysis
I am so hungry
Best Word Combination
Statistically calculate the possible word and phrase
matches for the translation.
Statistically determine best
word combination
using the monolingual
corpus.
Validate the syntax to ensure best possible grammar. Where appropriate reinsert
additional information such as subject.
Output Translation
I am so hungry
Que hambre
Copyright © 2009, Asia Online Pte Ltd
Initial System put into production
The Quality Evolution Cycle
All users allowed to suggest changes which go through vetting process
Changes are collected and added to initial corpus to drive continuous retraining
Trained Internal Experts begin initial clean up and correction process
Expert Users also allowed to make changes
Copyright © 2009, Asia Online Pte Ltd
Translation Quality
Quality Evolution via Error Analysisand Correction Cycle
Correct Mistranslation Syntax/Grammar Terminology Spelling Punctuation
Initial System
Spelling and Terminology
Human Feedback
Targeted Corrections of Bad Learning
Correct
Correct
Correct
Correct
Key
Human Feedback can raise the raw output to previously unseen quality levels
ใครจะอยากได้งานแปลไม่ดี? Harry Potter ใหม่ล่าสุด ตอนอวสาน ออกแล้ววันนี้ รออ่านฉบับแปลโดย นักแปลชื่อดัง …อีก 6 เดือนให้หลัง? ตามอ่านฉบับช่วยกันแปลโดย internet users กลุ่มหนึ่ง 6 วันต่อมา?
ปัจจุบันและอนาคตของนักแปล
ปัจจุบันและอนาคตของนักแปล
หาก MT แปลงานได้คุณภาพระดับหนึ่ง เรายังต้องการนักแปล หรือต้องการแค่คน edit/rewrite งานแปลจาก MT (monolingual translator) MT เหมาะกับงานทุกงาน? นักแปลแบบต่างๆ แปลวรรณกรรม แปลงานธุรกิจ ธุรกิจการแปลต้องการเครื่องมือช่วยการแปลให้รวดเร็ว
2007-06-18 Translation Memory System (TMS) 24
Translation Memory Systems
Presentation by35 Melina Takanen & Julianna Ekert
CAT Prof. Thorsten Trippel University of Bielefeld Summer Term 2007 18th June 2007
2007-06-18 Translation Memory System (TMS) 25
Table of contents! Introduction ! Definition ! Characteristics of TMS ! The translation workflow ! Reasons for using TMS
2007-06-18 Translation Memory System (TMS) 26
Definition!Classification of translation types
2007-06-18 Translation Memory System (TMS) 27
Definition!Classification of translation types
Translation Memory Systems
2007-06-18 Translation Memory System (TMS) 28
Definition of Machine Translation Systems (MTS)! Machine translation combines a number of
fields of study such as lexicography, linguistics, computational linguistics, computer science and language engineering.
2007-06-18 Translation Memory System (TMS) 29
Definition of Machine Translation Systems (MTS)! Machine translation combines a number of
fields of study such as lexicography, linguistics, computational linguistics, computer science and language engineering.
! It is based on the hypothesis that natural languages can be fully described, controlled and mathematically coded.
2007-06-18 Translation Memory System (TMS) 30
Definition of Translation Memory Systems (TMS)
! TMS is a multilingual text archive containing multilingual texts, allowing storage and retrieval of aligned multilingual text segments against various search conditions.
2007-06-18 Translation Memory System (TMS) 31
Definition of TMS cont.! Unlike machine translation systems which
generate translations automatically, translation memory systems allow professional translators to be in charge of the decision-making whether to accept or reject a term or an equivalent phrase suggested by the system during the translation process.
2007-06-18 Translation Memory System (TMS) 32
Definition of TMS cont.
!Translators can also build their own ‘memory’.
2007-06-18 Translation Memory System (TMS) 33
Characteristics of TMS
! Perfect matching ! Fuzzy matching ! Filter ! Segmentation ! Alignment
2007-06-18 Translation Memory System (TMS) 34
Perfect matching!occurs when a source-language (SL)
segment is completely identical including spelling, punctuation and inflections, to the old segment found in the database, that is in the translation ‘memory’.
2007-06-18 Translation Memory System (TMS) 35
Perfect matching example
=
2007-06-18 Translation Memory System (TMS) 36
Fuzzy matching! Unlike a perfect match, a fuzzy match occurs
when an old and a new SL segment are similar but not exactly identical. Even a very small difference such as punctuation leads to a fuzzy match.
2007-06-18 Translation Memory System (TMS) 37
Fuzzy matching example
2007-06-18 Translation Memory System (TMS) 38
Filter! It is a feature that converts a SL text from one
format into another giving the translator the flexibility to work with texts of different formats:
! Text without graphics ! Text without HTML code signs
2007-06-18 Translation Memory System (TMS) 39
Filter example
HTML code signs
No HTML code signs
2007-06-18 Translation Memory System (TMS) 40
Segmentation! It is a process of breaking a text up into units
consisting of a word or a string of words that is linguistically acceptable.
2007-06-18 Translation Memory System (TMS) 41
Segmentation! It is a process of breaking a text up into units
consisting of a word or a string of words that is linguistically acceptable.
! Especially useful for: ! Headings, ! Lists, ! Bullet points
2007-06-18 Translation Memory System (TMS) 42
Segmentation example
one segment
2007-06-18 Translation Memory System (TMS) 43
Alignment! It is a process of binding a SL segment to its
corresponding TL segment.
*SL = Source Language **TL = Target Language
2007-06-18 Translation Memory System (TMS) 44
Alignment! It is a process of binding a SL segment to its
corresponding TL segment.
! The purpose is to create a new translation memory base or to add to an existing one.
2007-06-18 Translation Memory System (TMS) 45
Alignment! It is a process of binding a SL segment to its
corresponding TL segment.
! The purpose is to create a new translation memory base or to add to an existing one.
! The corresponding pairs of SL & TL are called ‘translation units’.
2007-06-18 Translation Memory System (TMS) 46
Alignment example
=
2007-06-18 Translation Memory System (TMS) 47
The translation workflow
2007-06-18 Translation Memory System (TMS) 48
The translation workflow
2007-06-18 Translation Memory System (TMS) 49
The translation workflow
2007-06-18 Translation Memory System (TMS) 50
The translation workflow
2007-06-18 Translation Memory System (TMS) 51
The translation workflow
2007-06-18 Translation Memory System (TMS) 52
The translation workflow
2007-06-18 Translation Memory System (TMS) 53
The translation workflow
2007-06-18 Translation Memory System (TMS) 54
The translation workflow
2007-06-18 Translation Memory System (TMS) 55
The translation workflow
2007-06-18 Translation Memory System (TMS) 56
The translation workflow
2007-06-18 Translation Memory System (TMS) 57
Reasons for using TMS! avoids having to re-translate anything that has
been already translated,
2007-06-18 Translation Memory System (TMS) 58
Reasons for using TMS! avoids having to re-translate anything that has
been already translated,
! allows workgroups to share translation that were previously done,
2007-06-18 Translation Memory System (TMS) 59
Reasons for using TMS! avoids having to re-translate anything that has
been already translated,
! allows workgroups to share translation that were previously done,
! allows translators to build up a precious database of translations.
2007-06-18 Translation Memory System (TMS) 60
Reasons for using TMS
! Of course, it is possible to create as many translation memories as needed, for example for different subjects and/or clients, and/or language pairs.
Translation Memory eXchange (TMX)
• TMX is the vendor-neutral open XML standard for the exchange of Translation Memory (TM) data created by Computer Aided Translation (CAT) and localization tools.
• TMX is developed and maintained by OSCAR (Open Standards for Container/Content Allowing Re-use), a LISA Special Interest Group
• TMX is developed and maintained by OSCAR (Open Standards for Container/Content Allowing Re-use), a LISA Special Interest Group
<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"
datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>
<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"
datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote">
</header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>
<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"
datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>
<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"
datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>
<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"
datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01"> <seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>
<?xml version="1.0" ?> <tmx version="1.4"> <header creationtool="PerlConvertor" creationtoolversion="1.0" segtype="sentence" o-tmf="text” adminlang="EN-US" srclang="EN-US"
datatype="plain text” creationdate=“20100110T095611Z" creationid="Wirote"> </header> <body> <tu> <tuv xml:lang="EN-US"> <seg>A boost of vitamin shine for your lips.</seg> </tuv> <tuv xml:lang="TH-01">
<seg>เติมวิตามินให้เรียวปากใสปิ๊ง</seg> </tuv> </tu> </body> </tmx>
TM Software
• Transit NXT http://www.star-ts.com/transit-nxt-translation-memory.shtml
• SDL Trados http://www.trados.com/en/
• Déjà Vu http://www.atril.com/
• Wordfast http://www.wordfast.net
• Wordfast anywhere http://www.freetm.com/
• OmegaT http://www.omegat.org/en/omegat.html