bagging-based system combination for domain adaptation
DESCRIPTION
Bagging-based System Combination for Domain Adaptation. Linfeng Song , Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences. An Example. An Example. Initial MT system. An Example. Initial MT system. Tuned MT system that fits domain A. - PowerPoint PPT PresentationTRANSCRIPT
INSTITU
TE OF CO
MPU
TING
TECH
NO
LOG
YBagging-based System
Combination for Domain Adaptation
Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu
Institute of Computing Technology Chinese Academy of Sciences
INSTITUTE OF COMPUTING TECHNOLOGY
2
An Example
INSTITUTE OF COMPUTING TECHNOLOGY
3
An Example
Initial MT system
INSTITUTE OF COMPUTING TECHNOLOGY
4
An Example
Development setA:90% B:10%
Initial MT system Tuned MT system that fits domain A
The translation styles of A and B
are quite different
INSTITUTE OF COMPUTING TECHNOLOGY
5
An Example
Development setA:90% B:10%
Initial MT system Tuned MT system that fits domain A
Test setA:10% B:90%
INSTITUTE OF COMPUTING TECHNOLOGY
6
An Example
Development setA:90% B:10%
Initial MT system Tuned MT system that fits domain A
Test setA:10% B:90%
The translation style fits A, but we mainly want to translate B
INSTITUTE OF COMPUTING TECHNOLOGY
7
Traditional Methods
Monolingual data with domain annotation
INSTITUTE OF COMPUTING TECHNOLOGY
8
Traditional Methods
Monolingual data with domain annotation
Domain recognizer
INSTITUTE OF COMPUTING TECHNOLOGY
9
Traditional Methods
Bilingual training data
INSTITUTE OF COMPUTING TECHNOLOGY
10
Traditional Methods
Bilingual training data
Domain recognizer
training data : domain A
training data : domain B
INSTITUTE OF COMPUTING TECHNOLOGY
11
Traditional Methods
Bilingual training data
Domain recognizer
training data : domain A
training data : domain B
MT system domain A
MT system domain B
INSTITUTE OF COMPUTING TECHNOLOGY
12
Traditional Methods
Test set
INSTITUTE OF COMPUTING TECHNOLOGY
13
Traditional Methods
Domain recognizer
Test set
Test set domain A
Test set domain B
INSTITUTE OF COMPUTING TECHNOLOGY
14
Traditional Methods
The translation result
MT system domain A
MT system domain B
Test set domain A
Test set domain B
The translation result domain A
The translation result domain B
INSTITUTE OF COMPUTING TECHNOLOGY
15
The merits Simple and effective
Fits Human’s intuition
INSTITUTE OF COMPUTING TECHNOLOGY
16
The drawbacks Classification Error (CE)
Especially for unsupervised methods Supervised methods can make CE low, yet
requiring annotation data limits its usage
INSTITUTE OF COMPUTING TECHNOLOGY
17
Our motivation Jump out of the alley of doing adaptation directly
Statistics methods (such as Bagging) can help.
INSTITUTE OF COMPUTING TECHNOLOGY
18
The general framework of Bagging
Preliminary
INSTITUTE OF COMPUTING TECHNOLOGY
19
General framework of Bagging
Training set D
INSTITUTE OF COMPUTING TECHNOLOGY
20
General framework of Bagging
C1
Training set D
Training set D1 Training set D2 Training set D3 ……
C2 C3 ……
INSTITUTE OF COMPUTING TECHNOLOGY
21
General framework of Bagging
C1 C2 C3 ……
Test sample
INSTITUTE OF COMPUTING TECHNOLOGY
22
General framework of Bagging
C1 C2 C3 ……
Test sample
Result of C1 Result of C2 Result of C3 ……
Voting result
INSTITUTE OF COMPUTING TECHNOLOGY
23
Our method
INSTITUTE OF COMPUTING TECHNOLOGY
24
Training
A,A,A,B,B
Suppose there is a development set
For simplicity, there are only 5 sentences, 3 belong A, 2 belong B
INSTITUTE OF COMPUTING TECHNOLOGY
25
Training
A,A,A,B,B
A,B,B,B,B
A,A,B,B,B
A,A,B,B,B
A,A,A,B,B
A,A,A,A,B
……
We bootstrap N new development
sets
INSTITUTE OF COMPUTING TECHNOLOGY
26
Training
A,A,A,B,B
A,B,B,B,B
A,A,B,B,B
A,A,B,B,B
A,A,A,B,B
A,A,A,A,B
MT system-1
……
MT system-2
MT system-3
MT system-4
MT system-5
……
For each set, a subsystem is tuned
INSTITUTE OF COMPUTING TECHNOLOGY
27
Decoding For simplicity, Suppose only 2 subsystem has
been tuned
Subsystem-1W:<-0.8,0.2>
Subsystem-1W:<-0.6,0.4>
INSTITUTE OF COMPUTING TECHNOLOGY
28
Decoding
Subsystem-1W:<-0.8,0.2>
Subsystem-1W:<-0.6,0.4>
A B
Now a sentence “A B” needs a translation
INSTITUTE OF COMPUTING TECHNOLOGY
29
Decoding
Subsystem-1W:<-0.8,0.2>
Subsystem-1W:<-0.6,0.4>
A B
a b; <0.2, 0.2>a c; <0.2, 0.3>
a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>
After translation, each system generate its N-
best candidate
INSTITUTE OF COMPUTING TECHNOLOGY
30
Decoding
a b; <0.1, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>
Fuse these N-best lists and eliminate deductions
Subsystem-1W:<-0.8,0.2>
Subsystem-1W:<-0.6,0.4>
A B
a b; <0.2, 0.2>a c; <0.2, 0.3>
a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>
INSTITUTE OF COMPUTING TECHNOLOGY
31
Decoding
a b; <0.1, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>
Subsystem-1W:<-0.8,0.2>
Subsystem-1W:<-0.6,0.4>
A B
a b; <0.2, 0.2>a c; <0.2, 0.3>
a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>
Candidates are identical only if their target strings
and feature values are entirely equal
INSTITUTE OF COMPUTING TECHNOLOGY
32
Decoding
Calculate the voting score
a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>
Subsystem-1W:<-0.8,0.2>
Subsystem-1W:<-0.6,0.4>
S
ttcfeatcscorefinal
1
)(_
a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18
S represent the number of subsystems
INSTITUTE OF COMPUTING TECHNOLOGY
33
Decoding
The one with the highest score
wins
a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>
Subsystem-1W:<-0.8,0.2>
Subsystem-1W:<-0.6,0.4>
a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18
S
ttcfeatcscorefinal
1
)(_
INSTITUTE OF COMPUTING TECHNOLOGY
34
Decoding
The one with the highest score
wins
a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>
Subsystem-1W:<-0.8,0.2>
Subsystem-1W:<-0.6,0.4>
a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18
Since subsystems are different copies of the same model and share unique training
data, calibration is unnecessary
S
ttcfeatcscorefinal
1
)(_
INSTITUTE OF COMPUTING TECHNOLOGY
35
Experiments
INSTITUTE OF COMPUTING TECHNOLOGY
36
Basic Setups Data: NTCIR9 Chinese-English patent corpus
1k sentence pairs as development set Another 1k pairs as test set The remains are used for training
System: hierarchical phrase based model
Alignment: GIZA++ grow-diag-final
INSTITUTE OF COMPUTING TECHNOLOGY
37
Effectiveness : Show and Prove Tune 30 subsystems using Bagging
Tune 30 subsystems with random initial weight
Evaluate the fusion results of the first N (N=5,10, 15, 20, 30) subsystems of both and compare
INSTITUTE OF COMPUTING TECHNOLOGY
38
Results: 1-best
1 5 10 15 20 3031.00
31.10
31.20
31.30
31.40
31.50
31.60
31.70
31.80
31.90
32.00
31.08
31.51
31.6431.73
31.8
31.9
31.08 31.11 31.1331.17
31.23 31.2
baggingrandom
Number of subsystem
+0.82
INSTITUTE OF COMPUTING TECHNOLOGY
39
Results: 1-best
1 5 10 15 20 3031.00
31.10
31.20
31.30
31.40
31.50
31.60
31.70
31.80
31.90
32.00
31.08
31.51
31.6431.73
31.8
31.9
31.08 31.11 31.1331.17
31.23 31.2
baggingrandom
Number of subsystem
+0.70
INSTITUTE OF COMPUTING TECHNOLOGY
40
Results: Oracle
1 5 10 15 20 3036.00
37.00
38.00
39.00
40.00
41.00
42.00
43.00
36.74
40.35
42.27 42.52 42.74 42.96
36.74
38.3538.67 38.82 39.04 39.25
baggingrandom
Number of subsystem
+6.22
INSTITUTE OF COMPUTING TECHNOLOGY
41
Results: Oracle
1 5 10 15 20 3036.00
37.00
38.00
39.00
40.00
41.00
42.00
43.00
36.74
40.35
42.27 42.52 42.74 42.96
36.74
38.3538.67 38.82 39.04 39.25
baggingrandom
Number of subsystem
+3.71
INSTITUTE OF COMPUTING TECHNOLOGY
42
Compare with traditional methods Evaluate a supervised method
For tackling data sparsity only operate on development set and test set
Evaluate a unsupervised method Similar to Yamada (2007) To avoid data sparsity, only LM specific
INSTITUTE OF COMPUTING TECHNOLOGY
43
Results
baseline bagging supervise unsupervise31.00
31.10
31.20
31.30
31.40
31.50
31.60
31.70
31.80
31.90
32.00
31.08
31.9
31.63
31.24
1-best
INSTITUTE OF COMPUTING TECHNOLOGY
44
Conclusions Propose a bagging-based method to address
multi-domain translation problem.
Experiments shows that: Bagging is effective for domain adaptation
problem Our method surpass baseline explicitly, and is
even better than some traditional methods.
INSTITUTE OF COMPUTING TECHNOLOGY
45
Thank you for listeningAnd any questions?