bagging-based system combination for domain adaptation

INSTITU

TE OF CO

MPU

TING

TECH

NO

LOG

YBagging-based System

Combination for Domain Adaptation

Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu

Institute of Computing Technology Chinese Academy of Sciences

INSTITUTE OF COMPUTING TECHNOLOGY

2

An Example


3

An Example

Initial MT system


4

An Example

Development setA:90% B:10%

Initial MT system Tuned MT system that fits domain A

The translation styles of A and B

are quite different


5

An Example



Test setA:10% B:90%


6

An Example



Test setA:10% B:90%

The translation style fits A, but we mainly want to translate B


7

Traditional Methods

Monolingual data with domain annotation


8

Traditional Methods

Monolingual data with domain annotation

Domain recognizer


9

Traditional Methods

Bilingual training data


10

Traditional Methods


Domain recognizer

training data : domain A

training data : domain B


11

Traditional Methods


Domain recognizer

training data : domain A

training data : domain B

MT system domain A

MT system domain B


12

Traditional Methods

Test set


13

Traditional Methods

Domain recognizer

Test set

Test set domain A

Test set domain B


14

Traditional Methods

The translation result

MT system domain A

MT system domain B

Test set domain A

Test set domain B

The translation result domain A

The translation result domain B


15

The merits Simple and effective

Fits Human’s intuition


16

The drawbacks Classification Error (CE)

Especially for unsupervised methods Supervised methods can make CE low, yet

requiring annotation data limits its usage


17

Our motivation Jump out of the alley of doing adaptation directly

Statistics methods (such as Bagging) can help.


18

The general framework of Bagging

Preliminary


19

General framework of Bagging

Training set D


20


C1

Training set D

Training set D1 Training set D2 Training set D3 ……

C2 C3 ……


21


C1 C2 C3 ……

Test sample


22


C1 C2 C3 ……

Test sample

Result of C1 Result of C2 Result of C3 ……

Voting result


23

Our method


24

Training

A,A,A,B,B

Suppose there is a development set

For simplicity, there are only 5 sentences, 3 belong A, 2 belong B


25

Training

A,A,A,B,B

A,B,B,B,B

A,A,B,B,B

A,A,B,B,B

A,A,A,B,B

A,A,A,A,B

……

We bootstrap N new development

sets


26

Training

A,A,A,B,B

A,B,B,B,B

A,A,B,B,B

A,A,B,B,B

A,A,A,B,B

A,A,A,A,B

MT system-1

……

MT system-2

MT system-3

MT system-4

MT system-5

……

For each set, a subsystem is tuned


27

Decoding For simplicity, Suppose only 2 subsystem has

been tuned

Subsystem-1W:<-0.8,0.2>



28

Decoding



A B

Now a sentence “A B” needs a translation


29

Decoding



A B

a b; <0.2, 0.2>a c; <0.2, 0.3>

a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>

After translation, each system generate its N-

best candidate


30

Decoding

a b; <0.1, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>

Fuse these N-best lists and eliminate deductions



A B

a b; <0.2, 0.2>a c; <0.2, 0.3>

a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>


31

Decoding

a b; <0.1, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>



A B

a b; <0.2, 0.2>a c; <0.2, 0.3>

a b; <0.2, 0.2>a b; <0.1, 0.3>a d; <0.3, 0.4>

Candidates are identical only if their target strings

and feature values are entirely equal


32

Decoding

Calculate the voting score

a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>



S

ttcfeatcscorefinal

1

)(_

a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18

S represent the number of subsystems


33

Decoding

The one with the highest score

wins

a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>



a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18

S

ttcfeatcscorefinal

1

)(_


34

Decoding

The one with the highest score

wins

a b; <0.2, 0.2>a b; <0.1, 0.3>a c; <0.2, 0.3>a d; <0.3, 0.4>



a b; <0.2, 0.2>; -0.16a b; <0.1, 0.3>; +0.04a c; <0.2, 0.3>; -0.1a d; <0.3, 0.4>; -0.18

Since subsystems are different copies of the same model and share unique training

data, calibration is unnecessary

S

ttcfeatcscorefinal

1

)(_


35

Experiments


36

Basic Setups Data: NTCIR9 Chinese-English patent corpus

1k sentence pairs as development set Another 1k pairs as test set The remains are used for training

System: hierarchical phrase based model

Alignment: GIZA++ grow-diag-final


37

Effectiveness : Show and Prove Tune 30 subsystems using Bagging

Tune 30 subsystems with random initial weight

Evaluate the fusion results of the first N (N=5,10, 15, 20, 30) subsystems of both and compare


38

Results: 1-best

1 5 10 15 20 3031.00

31.10

31.20

31.30

31.40

31.50

31.60

31.70

31.80

31.90

32.00

31.08

31.51

31.6431.73

31.8

31.9

31.08 31.11 31.1331.17

31.23 31.2

baggingrandom

Number of subsystem

+0.82


39

Results: 1-best

1 5 10 15 20 3031.00

31.10

31.20

31.30

31.40

31.50

31.60

31.70

31.80

31.90

32.00

31.08

31.51

31.6431.73

31.8

31.9

31.08 31.11 31.1331.17

31.23 31.2

baggingrandom

Number of subsystem

+0.70


40

Results: Oracle

1 5 10 15 20 3036.00

37.00

38.00

39.00

40.00

41.00

42.00

43.00

36.74

40.35

42.27 42.52 42.74 42.96

36.74

38.3538.67 38.82 39.04 39.25

baggingrandom

Number of subsystem

+6.22


41

Results: Oracle

1 5 10 15 20 3036.00

37.00

38.00

39.00

40.00

41.00

42.00

43.00

36.74

40.35

42.27 42.52 42.74 42.96

36.74

38.3538.67 38.82 39.04 39.25

baggingrandom

Number of subsystem

+3.71


42

Compare with traditional methods Evaluate a supervised method

For tackling data sparsity only operate on development set and test set

Evaluate a unsupervised method Similar to Yamada (2007) To avoid data sparsity, only LM specific


43

Results

baseline bagging supervise unsupervise31.00

31.10

31.20

31.30

31.40

31.50

31.60

31.70

31.80

31.90

32.00

31.08

31.9

31.63

31.24

1-best


44

Conclusions Propose a bagging-based method to address

multi-domain translation problem.

Experiments shows that: Bagging is effective for domain adaptation

problem Our method surpass baseline explicitly, and is

even better than some traditional methods.


45

Thank you for listeningAnd any questions?

bagging-based system combination for domain adaptation

Documents