collaborative ranking: a case study on entity ranking (emnlp2011読み会)

EMNLP2011読み会 Collaborative Ranking: A Case Study

on Entity Ranking (D11-1071)

2011-12-23 Yoshihiko Suhara @sleepy_yoshi

今日読む論文

• Collaborative Ranking: A Case Study on Entity Ranking

– by Zheng Chen, Heng Ji

一枚概要

• TAC-KBP2010 Entity Linkingタスク – クエリに対してエンティティを回答

–生成した候補をランキングすることで回答を選択

• Collaborative Ranking を提案 – (1) query-level collaboration

• Micro collaborative ranking

– (2) ranker-level collaboration • Macro collaborative ranking

背景

Named Entity Recogtion の歴史

5 [McNamee+ 10]

Knowledge Base Population (KBP) Track

• TAC > KBP > Entity Linking

[Ji+ 2010] 7

KBエントリ

[Chen+ 2010]

クエリ例

Entity Linking System Michael Jordan “England Youth

International goalkeeper“

Michael Jordan (mycologist)

Michael Jordan (footballer)

候補の生成

回答

候補の生成

回答

今回はここ

候補の生成

回答

今回はここ

INPUT: クエリと回答候補のエンティティ群 OUTPUT: 順位づけした最上位のエンティティ or NIL

クエリとエンティティ候補

• クエリ 𝑞 = (𝑞. 𝑖𝑑, 𝑞. 𝑠𝑡𝑟𝑖𝑛𝑔, 𝑞. 𝑡𝑒𝑥𝑡)

• クエリ𝑞に対するKBエントリ候補

𝑜 𝑞 = 𝑜1𝑞, … , 𝑜

𝑛 𝑞𝑞

• KBエントリ𝑜𝑖𝑞の情報

– KB title – KB infobox

• attribute-value pairs (e.g., per:alternate_names, per:date_of_birth, ...)

– KB text

Introduction

ランキングとその応用

• NLPにおける多くの問題が順位づけ問題として定式化できる

–構文解析

• 構文木の順位づけ

–機械翻訳

• 翻訳候補の順位づけ

–照応解析

–など

既存手法の課題

• 全てのデータに対して効果的に働く学習手法はない

⇒協調的なモデルを作ろう! (= ollaborative ranking)

cf. collaborative filtering (協調フィルタリング)

• 関係ありません

わかりやすい図解?!

ﾜｶﾗﾝ!

Collaborative Ranking のポイント

• (1) 疑似的にクエリを増やすことで精度向上を図る

– query-level collaboration

• (2) 複数のrankerを効果的に統合することで精度向上を図る

– ranker-level collaboration

• (3) (1)と(2)の合わせ技

Collaborative Ranking

• 3つの提案手法

– (1) Micro Collaborative Ranking (MiCR)

– (2) Macro Collabortive Ranking (MaCR)

– (3) Micro-Macro Collaborative Ranking (MiMaCR)

(1) Micro Collaborative Ranking (MiCR)

Micro Collaborative Ranking

• (1) クエリqに対してk個のcollaboratorを選ぶ

–選択基準は後述

• (2) collaboratorを考慮したランキングを行う

Collaborator の選び方

• クラスタリング問題として解く

– クエリ𝑞が与えられた際，コーパスからq.stringを含む文書を最大300件取得

– クラスタリングアルゴリズムを適用

– q.textを含むクラスタからcollaboratorを選択

階層型クラスタリング (agglomerative) とスペクトラルクラスタリング (graph) を利用

𝑥𝑗𝑞

= 𝜙 𝑞, 𝑜𝑗𝑞

, 𝑥𝑗𝑐𝑞1 = 𝜙 𝑐𝑞1, 𝑜𝑗

𝑐𝑞1

𝑥𝑗𝑞

𝑐𝑞1

𝑥𝑗𝑞

たぶん

再掲: クエリとエンティティ候補

• クエリ 𝑞 = (𝑞. 𝑖𝑑, 𝑞. 𝑠𝑡𝑟𝑖𝑛𝑔, 𝑞. 𝑡𝑒𝑥𝑡)

• クエリ𝑞に対するKBエントリ候補

𝑜 𝑞 = 𝑜1𝑞, … , 𝑜

𝑛 𝑞𝑞

• KBエントリ𝑜𝑖𝑞の情報

– KB title – KB infobox

• attribute-value pairs (e.g., per:alternate_names, per:date_of_birth, ...)

– KB text

MiCR の𝑔1(⋅)の計算方法

-> average

これがいい

(2) Macro Collaborative Ranking (MaCR)

Macro Collaborative Ranking

• 複数のRanker 𝐹∗ = *𝑓1, … , 𝑓𝑚+ を用意して，それらの合成関数でスコアを計算

MaCR の𝑔2(⋅)の計算方法

これがいい

(3) Micro-Macro Collaborative Ranking

(MiMaCR)

Micro-Macro Collaborative Ranking

• MiCR + MaCR

m個の voting

k+1個の average

Experiments

Dataset

• TAC-KBP2009 dataset – 75% training data, 25% development data

• TAC-KBP2010 dataset – test data

• reference KB

– Oct. 2008 dump of English Wikipedia – 818,741 entries

• Source text corpus – mostly Newswire and Web Text – 1,777,888 documents in 5 genres

Baseline Rankers (1/2)

• 教師なし

– Naive (𝑓1)

• あらゆるクエリにNILを返す

– Entity (𝑓2)

• q.textとKB textから抽出した固有表現の重みづけ類似度

– TFIDF (𝑓3)

• q.textとKB textのコサイン類似度をTF-IDFで重みづけ

– Profile (𝑓4)

• q.textとKB textのprofile類似度 [Chen+ 10] 35

Baseline Rankers (2/2)

• 教師あり – Maxent (𝑓5)

• Maximum entropy model (pointwise ranker)

– SVM (𝑓6) • SVM (pointwise ranker)

– RankSVM (𝑓7) • RankingSVM (pointwise ranker)

– ListNet (𝑓8) • ListNet (listwise ranker)

• 特徴 – 1. surface features [Dredze+ 10][Zheng+ 10] – 2. document features [Dredze+ 10][Zheng+ 10] – 3. profiling features [Chen+ 11]

評価

• マイクロ平均で評価

Baseline rankers の比較

• 教師ありrankerの方が基本的によい

MiCRの評価

• 実験条件

• rankerはTFIDF (𝑓3)

• 𝑔1はave, max, minの3種類

• collaborator searchはgraph, agglomerativeの2種類

average max min

MaCRの評価 (1/2)

• 実験条件

– 𝑔2はvotingとaverage

※rankerはdev.における性能順に追加

MaCRの評価 (2/2)

• top-10 KBP2009 entity linking systems を MaCR

MiMaCR の評価 (1/2)

• 実験条件

– micro-ranking (𝑔1(⋅))

• graph clustering

• 5 rankers (TFIDF, entity, Maxent, SVM, ListNet) – average for TFIDF, entity

– supervised versions for Maxent, SVM, ListNet

– macro-ranking (𝑔2(⋅))

• voting

MiMaCR の評価 (2/2)

まとめと感想

• Entity linking task のエンティティ候補を高精度にランキングするためにCollaborative ranking を提案

– query-level collaboration [new!]

• ただしcollaboratorの選択基準や𝑔1 ⋅ の計算方法に依存

– ranker-level collaboration

• タスク依存のチューニングが強い印象

–他のタスクでも同様に効果が出るか?

References

• [McNamee+ 10] P. McNamee, J. C. Mayfield, C. D. Piatko, “Processing Named Entities in Text”, Johns Hopkins APL Technical Digest, Vol.30(1), pp.31-40, 2011.

• [Chen+ 10] Z. Chen, S. Tamang, A. Lee, X. Li, W.-P. Lin, M. Snover, J. Artiles, M. Passantino and H. Ji, “CUNYBLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description”, In Proc. TAC2010, 2010.

• [Ji+ 10] H. Ji, R. Grishman, H. T. Dang and K. Griffit, “An Overview of the TAC2010 Knowledge Base Population Track. In Proc. TAC2010, 2010.

• [Dredze+ 10] M. Dredze, P. McNamee, D. Rao, A. Gerber and T. Finin, “Entity Disambiguation for Knowledge Base Population”, In Proc. COLING2010, 2010.

• [Zheng+ 10] Z. Zheng, F. Li, M. Huang, X. Zhu, Learning to Link Entities with Knowledge Base. In Proc. HLT-NAACL2010, 2010.

• [Chen+ 11] Z. Chen, S. Tamang, A. Lee and H. Ji, “A Toolkit for Knowledge Base Population”, In Proc. SIGIR2011, 2011.

collaborative ranking: a case study on entity ranking (emnlp2011読み会)

Technology

pesanmedia.files.wordpress.com€¦ · web viewcreate...

entity relationship diagram

entity fra entity framework_mvc_tutorialsmework mvc...

net entity framework

entity framework 4.1

entity framework

entity-relationship model

entity-relationship modeling

eigenrank: a ranking-oriented approach to collaborative...

ado.net entity framework

entity framework

enhanced entity relationship

entity modelingx

enhancing named entity recognition in twitter messages using...

tags : dÉveloppement durable Économie collaborative...

entity system

conception collaborative d’un rover européen...

2. informationsmodellierung mit entity-relationship-modell...

entity relationship model - wordpress.com · 4/4/2016 ·...

entity framework intro