acl(+ws) 2007 emnlp- conll 2007 サーベイ

ACL(+WS) 2007EMNLP-CoNLL 2007

サーベイ

東大　中川研　二宮　崇

機械学習勉強会　 2007 年 12 月 6 日

ACL 2007 ・ EMNLP-CoNLL 2007

2007 年 6 月 23日～ 6 月 30 日

＠プラハきれいな街並みと

お城しかし、統計的に

は登録参加者 800人中 48 人はスリにあう、という危険なところでもあるそうです…。

プラハの思い出

Domain Adaptation J. Jiang & C.X. Zhai (2007) Instance Weighting for Domain

Adaptation in NLP, in Proc. of ACL 2007 J. E. Miller, M. Torii, K. Vijay-Shanker (2007) Building

Domain-Specific Taggers without Annotated (Domain) Data, in Proc. of EMNLP-CoNLL 2007

J. Blitzer, M. Dredze, F. Pereira (2007) Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification, in Proc. of ACL 2007 J.Blitzer, R. McDonald, F. Pereira (2006) Domain Adaptation with

Structural Correspondence Learning, in Proc. of EMNLP 2006 Rie Kubota Ando, Tong Zhang (2005) A Framework for Learning

Predictive Structures from Multiple Tasks and Unlabeled Data, in JMLR, 6:1817-1853

Domain Adaptation: motivation (1/2)

特定のドメインで高性能な NLP ツールは異なるドメインでは性能が低下 (NLP tools achieve high performance in some specific domain. Performance of NLP tools drop significantly in different domains)NLP Tools: POS tagger, Named entity

tagger, Parser, Sentiment analyzer特定のドメイン (specific domain) ： news

paper異なるドメイン (different domains) ：

speech, blog, e-mail, bio-medical papers

Domain Adaptation: motivation (2/2)

多くの高性能なNLPツールは教師付学習に依存 (Many NLP tools use supervised learning techniques) 特定ドメインには比較的多量の正解データ (large amount of annotated

resources in some specific domain) ちょっと別のドメインになると、少量の正解データしかない /正解データがまっ

たくない (only a small amount of annotated resources in different domains)

かといって、教師無学習は教師付学習ほど性能が高くない… (but, unsupervised methods don’t work as much as supervised methods)

そこで、、、 (so,,,) 多量の正解付データで学習した識別器を異なるドメインに適応 (adopt

the classifier trained on the resources on some specific domain to some different domains) 少量の正解データをフル活用 (utilize the small amount of annotated

resources) 大量の生データを利用 (utilize raw resources(=not annotated resources) )

Domain Adaptation: terminology

ドメイン (Domain) ソースドメイン (Source Domain)

多量の正解データがあって、十分高性能な解析ができているドメイン (the domain in which we have large amounts of resources with annotation)

ターゲットドメイン (Target Domain) 研究対象のドメイン。 (the domain in which we want to achieve high

performance) 解析性能を上げたいが、正解データが少ないドメイン。(but, we have only a few/no amounts of resources with annotation in this domain)

仮定 (assumption) ソースドメインに多量の正解付データ (a large amount of

annotated resources in the source domain) ターゲットドメインに少量 / 無の正解付データ (a few amount of

resources in the target domain) ターゲットドメインに大量の正解無データ (no resources with

annotation, but a large amount of raw resources)

取り組み方その１Story #1

学習データ (Training Data) Source Domain: 大量の正解付データ (Large annotated

resources) Target Domain: 少量正解付データ (Small annotated resources)

Annotated Data（ News Paper）Annoatted Data（ Blog、

Bio-Medical Papers）

θ θ‘

Source Domain Target Domain

取り組み方その２Story #2

学習データ (Training Data) Source Domain: 多量の正解付データ (Large annotated

resources) Target Domain: 大量の生データ (Very large raw resources)

Annotated Data（ news paper）Raw Data（ Blog, Bio-Medial papers)

Source DomainTarget Domain

θ θ‘

取り組み方その３Story #3

学習データ (Training Data) Source Domain: 多量の正解付データ (Large annotated resources) Target Domain

大量の生データ (Very large raw resources) 少量の正解データ (Small annotated resources)

Annotated Data（ News Paper）Raw Data（ Blog, Bio-Medical Papers）

Annotated Data（ Blog,Bio-Medical Papers）

Source Domain Target Domain

θ θ‘

とりあえず思いつく簡単な手法(Naive Methods)

SrcOnly ソースドメインの正解データだけ利用 (Use only annotated

data in the source domain) TargetOnly

ターゲットドメインの正解データだけ利用 (Use only annotated data in the target domain)

All ソースドメインの正解データ、ターゲットドメインの正解データ

を合わせて利用 (Use annotated data in both source and target domains)

Weighted ソースとターゲットの正解データの量で重みづけ (Weighting

annotated data in the source domain and the target domain)

とりあえず思いつく簡単な手法(Naive Methods)

Predソースドメインで学習した分類器の出力をター

ゲットドメインの素性の一つとして用いる (Use the output of the source domain classifier as a feature of the target domain classifier)

LinIntソースドメインで学習した分類器の出力と、

ターゲットドメインで学習した分類器の出力の線形補間 (Linear interporation of the output of the target domain classifier and the source domain classifier)

Instance Weighting for Domain Adaptation in NLP

(Jiang&Zhai2007)3 種類全部のデータを使うモデル (Use all

three types of data)データ (Data)

正解付データ (annotated data): {(xi, yi)}i=1...N

xi は入力の特徴ベクトル (input: feature vector)yi は出力 (output)

生データ (raw data): {xj}j=1...M

基本的な考え方 (Basic Idea)正解付データのそれぞれの実例の出現回数 (= 出現

確率 ) を別データで補正する

Basic IdeaChange the weight of instances in the

training datax1 x2 x3 y1 1 0 11 0 0 00 1 1 11 1 0 11 1 1 00 0 1 10 0 1 11 1 1 01 1 0 01 1 1 00 1 0 00 0 0 1... ... ... ...

x1 x2 x3 y freq(x,y) p(x,y)0 0 0 0 983428 983428/N0 0 0 1 58123 58123/N0 0 1 0 178237 178237/N0 0 1 1 1323 1323/N0 1 0 0 748 748/N0 1 0 1 23 23/N0 1 1 0 373 373/N0 1 1 1 2384 2384/N1 0 0 0 82 82/N1 0 0 1 343781 343781/N1 0 1 0 45854 45854/N1 0 1 1 83472 83472/N1 1 0 0 6474 6474/N1 1 0 1 27 27/N1 1 1 0 8239 8239/N1 1 1 1 634 634/N

=

ここを変更する

Instance Weighting: 目的関数 (objective function)

普通の教師付学習 (Empirical estimation with training data)

上の式を一般化 (generalized form of the above equation)

p(x,y) = p(y | x) p(x) と展開して、Labeling Adaptation: p(y | x) を適応 Instance Adaptation: p(x) を適応

Instance Weighting (1)Labeling Adaptation: p(y|x) の適応

ps(y | x): Probability in the source domainpt(y | x): Probability in the target domain For Data (xi, yi) in the source domain, estimate the

similarity of ps(yi | xi) and pt(yi | xi) ⇒ if it is similar, then use it as the training data

Exactly, for the source domain data(xi, yi), if yi = argmaxy pt(y | xi) then use it as the training data

Instance Weighting (2) Instance Adaptation: p(x) の適応

adjust the count C withBut, no experiment... because it is

difficult to estimate it(1, 0, 1, 1, 0, 0, 1) ⇒ PERSON (1, 0, 1, 1, 0, 0, 1)

Target DomainSource Domain

p(PERSON|(1,0,1,1,0,0,1))

p((1,0,1,1,0,0,1)) p((1,0,1,1,0,0,1))

replace

)()(

xpxpC

s

t

Instance Weighting (3)boosting

θ(n-1): parameters in (n-1)-th iteration of traininggenerate the target domain annotated data (xi,yi)

with θ(n-1) by analyzing the target domain raw data (xi)yi= argmaxy’ p(y’ | xi)

use only top-k data as the training data

Instance Weighting: 結果

Labeling Adaptation のみの結果

ターゲットドメインの正解データを付加

Instance Weighting: 結果

boosting を用いた結果

ターゲットドメインの正解データを使わない手法

J. E. Miller, M. Torii, K. Vijay-Shanker (2007) Building Domain-Specific Taggers without Annotated (Domain) Data, in Proc. of EMNLP-CoNLL 2007 EM アルゴリズムによる HMM タガー遷移確率の初期値はソースドメインの正解付コーパス (Penn WSJ) か

ら (initial transition probability comes from the source domain annotated corpus)

出力確率の初期値はターゲットドメインの生コーパスとソースドメインの正解付コーパスから学習 (initial emission probability comes from the emission probability which is the most similar word)

“phosphorylate”(リン酸化 ) “phosphorylately”

“phosphorylates”“phosphorylation”

“create”

ターゲットドメインの正解データを使わない手法

J. Blitzer, M. Dredze, F. Pereira (2007) Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification, in Proc. of ACL 2007 J.Blitzer, R. McDonald, F. Pereira (2006) Domain

Adaptation with Structural Correspondence Learning, in Proc. of EMNLP 2006

Rie Kubota Ando, Tong Zhang (2005) A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data, in JMLR, 6:1817-1853

SVD-ASO主問題 : trainig data (xi,yi) → test (x, ?)補助問題 : 主問題と別の問題を複数作成

unsupervised approach主問題と似たようなタスクを設定

ただし、訓練データの正解 yi を使わず xi だけを使って正解データを設定できるタスク

例 : POS tagging なら、次に来る単語の予測など例 : テキストジャンルの推定なら、テキストを２

つに分割して、半分のテキストから残り半分のテキストの最も頻度の高い単語の予測など

SVD-ASO主問題 : trainig data (xi,yi) → test (x, ?)補助問題 : 主問題と別の問題を複数作成

semi-supervised approach二種類の独立した素性のマップ Φ1 、 Φ2 を作成主問題の classifier を Φ1 を使って作成補助問題は Φ2 を使って主問題の classifier の出力

を予想する

SVD-ASO全ての問題 l=1,...,m に対し、次の損失関

数から、 θ, wl, vl を求める

θ は全問題で共通の行列SVD で求める

vl ,wl は各問題に specific な重みベクター

SVD-ASO ：アルゴリズム

SVD-ASO の Domain Adapation への応用

補助問題を正解がない別ドメインと考えるPOS tagger

J.Blitzer, R. McDonald, F. Pereira (2006) Domain Adaptation with Structural Correspondence Learning, in Proc. of EMNLP 2006

Sentiment AnalysisJ. Blitzer, M. Dredze, F. Pereira (2007)

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification, in Proc. of ACL 2007

SVD-ASO の POS tagger Domain Adapation への応用：

アルゴリズム

acl(+ws) 2007 emnlp- conll 2007 サーベイ

Documents