project proposal: translation example search engine

7
Project Proposal CSC 630, Fall 2013, University of Arizona Sumin Byeon

Upload: sumin-byeon

Post on 21-May-2015

529 views

Category:

Technology


2 download

DESCRIPTION

I propose to use a local document fingerprinting algorithm, Winnowing, to find near matches of natural language translation samples.

TRANSCRIPT

Page 1: Project Proposal: Translation Example Search Engine

Project Proposal

CSC 630, Fall 2013, University of ArizonaSumin Byeon

Page 2: Project Proposal: Translation Example Search Engine

Example-BasedMachine Translation

• Translation example sets (S₁→T₁), (S₂→T₂), (S₃→T₃), ...

• Given a query text S, find the closest match S’ such that (S’→T’)

• T’ is accepted as the translation of S

Page 3: Project Proposal: Translation Example Search Engine

Hypothesis

S2# T2#S#

Sn# Tn#

S1# T1#

…#

h(S)# h(Sσ),#φ(S)# Ti#

Which hash function? Optimal value of k? Window size?

Page 4: Project Proposal: Translation Example Search Engine

Relationship with Content Addressability• Content recognizability

• Hash - Winnowing

• Content recoverability

• By locating or reconstructing

• Unlike other projects like NDN or Receipt, mine is relatively straightforward

• Simple key-value storage

• Key: hash

• Value: (reference to original text, offset)

Page 5: Project Proposal: Translation Example Search Engine

Text Matching• Full-text search may be an effective solution, but...

• Loses information regarding the ordering of the query words

• Limited support for phrase search

• Certain linguistic features will be ignored (e.g., “a”, “the”)

• Matching long enough partial text

• Longer text - lower probability of finding matches

• Shorter text - higher probability of ambiguity (i.e., homonym, false cognates)

Page 6: Project Proposal: Translation Example Search Engine

Grand Plan

• Winnowing algorithm implementation

• Index a large number of samples (+10,000)

• Translation sample search engine with simple RESTful interface

• Integrate it with Better Translator

Page 7: Project Proposal: Translation Example Search Engine

Better Translator

• Language translator exploiting an indirect translation trick

• e.g., (Korean)→(Japanese)→(English)

• A perfect platform to test the hypothesis

• 여러분이 몰랐던 구글 번역기

• Google Translate: You did not know Google Translate

• Better Translator: Google Translate you did not know