discriminative learning of extraction sets for machine translation john denero and dan klein uc...
Post on 20-Jan-2016
219 Views
Preview:
TRANSCRIPT
Discriminative Learning of Extraction Sets for Machine Translation
John DeNero and Dan KleinUC Berkeley
Identifying Phrasal Translations
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Phrase alignment models: Choose a segmentation and a one-to-one phrase alignment
Past Go over
Underlying assumption: There is a correct phrasal segmentation
Unique Segmentations?
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Problem 1: Overlapping phrases can be useful (and complementary)
Problem 2: Phrases and their sub-phrases can both be useful
Hypothesis: This is why models of phrase alignment don’t work well
Identifying Phrasal Translations
This talk: Modeling sets of overlapping, multi-scale phrase pairs
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Input: sentence pairs
Output: extracted phrases
… But the Standard Pipeline has Overlap!
M O T I V A T I O N
In the past two years
过去
两
年
中
past
two
year
in
Sentence Pair
Word Alignment
Extracted Phrases
Related Work
M O T I V A T I O N
Sentence Pair
Word Alignment
Extracted Phrases
Translation models: Sinuhe system (Kääriäinen, 2009)
Combining Aligners: Yonggang Deng & Bowen Zhou (2009)
Fixed alignments; learned phrase pair weights
Fixed directional alignments; learned symmetrization
Extraction models: Moore and Quirk, 2007
Fixed alignments; learned phrase pair weights
Our Task: Predict Extraction Sets
M O T I V A T I O N
Sentence Pair
Extracted Phrases
Conditional model of extraction sets given sentence pairs
In the past two years
过去两年中
0
1
2
3
40 1 2 3 4 5
In the past two years
过去两年中
0
1
2
3
40 1 2 3 4 5
Extracted Phrases + ``Word Alignments’’
Alignments Imply Extraction Sets
M O D E L
In the past two years
过去
两
年
中
past
two
year
in
0
1
2
3
40 1 2 3 4 5
Word-level alignment
links
Word-to-span alignments
Extraction set of bispans
Nulls and Possibles
据
报道
according to
news report
it is reported
据
报道
according to
news report
it is reported
Nulls:
Possibles:
Incorporating Possible Alignments
M O D E L
In the past two years
过去
两
年
中
past
two
year
in
0
1
2
3
40 1 2 3 4 5
Sure and possible
word links
Word-to-span alignments
Extraction set of bispans
Linear Model for Extraction Sets
M O D E L
In the past two years
过去
两
年
中
0
1
2
3
40 1 2 3 4 5
Features on sure links
Features on all bispans
Features on Bispans and Sure Links
F E A T U R E S
过
地球
go over
Earth
over the Earth
Some features on sure links
HMM posteriors
Presence in dictionary
Numbers & punctuation
Features on bispans
HMM phrase table features: e.g., phrase relative frequencies
Lexical indicator features for phrases with common words
Monolingual phrase features: e.g., “the _____”
Shape features: e.g., Chinese character counts
Getting Gold Extraction Sets
T R A I N I N G
Hand Aligned: Sure and possible
word links
Word-to-span alignments
Extraction set of bispans
Deterministic: A bispan is included iff every word within the bispan aligns within the bispan
Deterministic: Find min and max alignment index for each word
Discriminative Training with MIRA
T R A I N I N G
Loss function: F-score of bispan errors (precision & recall)
Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin
Gold (annotated) Guess (arg max w ɸ)∙
Inference: An ITG Parser
I N F E R E N C E
ITG captures some bispans
Coarse-to-Fine Approximation
I N F E R E N C E
Coarse Pass: Features that are local to terminal productions
Fine Pass: Agenda search using coarse pass as a heuristic
We use an agenda-based parser. It’s fast!
Experimental Setup
R E S U L T S
Chinese-to-English newswire
Parallel corpus: 11.3 million words; sentences length ≤ 40
MT systems: Tuned and tested on NIST ‘04 and ‘05
Supervised data: 150 training & 191 test sentences (NIST ‘02)
Unsupervised Model: Jointly trained HMM (Berkeley Aligner)
Baselines and Limited Systems
R E S U L T S
HMM:
ITG:
Coarse:
State-of-the-art unsupervised baseline
Joint training & competitive posterior decoding
Source of many features for supervised models
Supervised ITG aligner with block terminals
State-of-the-art supervised baseline
Re-implementation of Haghighi et al., 2009
Supervised block ITG + possible alignments
Coarse pass of full extraction set model
Word Alignment Performance
R E S U L T S
Precision
Recall
1 - AER
84.7
84.0
84.4
82.2
84.2
83.1
83.4
83.8
83.6
84.0
76.9
80.4 HMMITGCoarseFull
Extracted Bispan Performance
R E S U L T S
Precision
Recall
F1
F5
69.0
74.2
71.6
74.0
70.0
72.9
71.4
72.8
75.8
62.3
68.4
62.8
69.5
59.5
64.1
59.9
HMMITGCoarseFull
Translation Performance (BLEU)
R E S U L T S
Moses
Joshua
31.5 32 32.5 33 33.5 34 34.5 35 35.5 36 36.5
34.4
35.9
34.2
35.7
33.6
34.7
33.2
34.5
HMMITGCoarseFull
Supervised conditions also included HMM alignments
Conclusions
Extraction set model directly learns what phrases to extract
The system performs well as an aligner or a rule extractor
Are segmentations always bad?
Idea: get overlap and multi-scale into the learning!
Thank you!
nlp.cs.berkeley.edu
top related