文献紹介:opinion mining in newspaper articles by entropy-based word connections

22
文献紹介 2014/04/04 長岡技術科学大学 自然言語処理研究室 岡田 正平

Upload: shohei-okada

Post on 17-Jul-2015

60 views

Category:

Science


2 download

TRANSCRIPT

Page 1: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

文献紹介2014/04/04

長岡技術科学大学自然言語処理研究室

岡田正平

Page 2: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

紹介する文献Thomas Scholz and Stefan Conrad.Opinion Mining in Newspaper Articles by Entropy-based Word Connections.Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1828-1839. (2013)

2

Page 3: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

概要• 新聞記事中の主張部分のtonalityを推定

–その主張が肯定/否定的かあるいは客観的(中性的)かを決定

• entropy-based word connection–素性を求めるときに利用

3

Page 4: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

背景• 企業・組織等のPRの結果を解析

– Media Response Analysis (MRA)• Opinion Miningの自動化がもたらす恩恵• 新聞記事は主観でない記述も含む• 使われる単語が似ていてもtonalityが異なる場合がある

4

Page 5: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

タスク定義𝑡: 𝑠 = 𝑤1,𝑤2,⋯ ,𝑤𝑘 ↦𝑦 ∈ {positive, neutral, negative}

• 𝑑: newspaper article• 𝑠 ⊆ 𝑑: statement• 𝑦: tonality

5

Page 6: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

Example statement (positive)There are structural factors behind the African growth story: a growing and sizable population which is increasingly urbanisedwith disposable income; growing political stability; and a financial services industry that is still in its infancy.

6

Page 7: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | Graph Model𝑣𝑖 (node): 単語(名詞,形容詞,動詞,副詞,否定表現)

𝑣𝑖, 𝑣𝑗間のedge 𝑒𝑖𝑗の重み:𝜀𝑖𝑗 = 𝑦𝑖𝑗𝑖,𝑦𝑖𝑗𝑖,𝑦𝑖𝑗𝑖

𝑦𝑖𝑗𝑖: positive statement における𝑣𝑖と𝑣𝑗の共起回数

𝑦𝑖𝑗𝑖: neutral statement 〃𝑦𝑖𝑗𝑖: negative statement 〃

(元文献より引用)7

Page 8: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成

𝑙番目の文There are structural factors behind the African growth story.

に対応するsubgraph 𝐺𝑠𝑠(実線)(元文献より引用)

8

Page 9: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成• positive/negative の確率

𝑃 𝑝𝑝𝑠 𝑣𝑖 =∑ 𝑦𝑖𝑗𝑖𝑒𝑖𝑖∈𝐺𝑠𝑠

∑ 𝑦𝑖𝑗𝑖 + 𝑦𝑖𝑗𝑖𝑒𝑖𝑖∈𝐺𝑠𝑠

𝑃 𝑛𝑒𝑛 𝑣𝑖 =∑ 𝑦𝑖𝑗𝑖𝑒𝑖𝑖∈𝐺𝑠𝑠

∑ 𝑦𝑖𝑗𝑖 + 𝑦𝑖𝑗𝑖𝑒𝑖𝑖∈𝐺𝑠𝑠

9

Page 10: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成𝑃 𝑝𝑝𝑠 factor

=5+2+2+2

10

Page 11: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成𝑃 𝑝𝑝𝑠 factor

= 5+25+2+2+2

≃ 0.64

11

Page 12: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成• subjective/neutral(objective) の確率

𝑃 𝑠𝑠𝑠 𝑣𝑖 =∑ 𝑦𝑖𝑗𝑖 + 𝑦𝑖𝑗𝑖𝑒𝑖𝑖∈𝐺𝑠𝑠

∑ 𝑦𝑖𝑗𝑖 + 𝑦𝑖𝑗𝑖 + 𝑦𝑖𝑗𝑖𝑒𝑖𝑖∈𝐺𝑠𝑠

𝑃 𝑛𝑒𝑠 𝑣𝑖 =∑ 𝑦𝑖𝑗𝑖𝑒𝑖𝑖∈𝐺𝑠𝑠

∑ 𝑦𝑖𝑗𝑖 + 𝑦𝑖𝑗 + 𝑦𝑖𝑗𝑖𝑒𝑖𝑖∈𝐺𝑠𝑠

12

Page 13: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成𝑃 𝑠𝑠𝑠 factor

=5+1+2+2+2+2

13

Page 14: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成𝑃 𝑠𝑠𝑠 factor

= 5+2+2+25+1+2+2+2+2

≃ 0.79

14

Page 15: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成• エントロピーの考え方を適用

𝐻 𝑋 = −�𝑝 𝑥𝑖 log2 𝑝(𝑥𝑖)𝑛

𝑖=1

15

Page 16: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成

𝑓𝑝𝑝𝑠 𝑣𝑖 = �

1 + 𝑃 𝑝𝑝𝑠 𝑣𝑖 ∗ log2 𝑃(𝑝𝑝𝑠|𝑣𝑖)if 𝑃 𝑛𝑒𝑛 𝑣𝑖 ≤ 𝑃(𝑝𝑝𝑠|𝑣𝑖)

−1 − 𝑃 𝑛𝑒𝑛 𝑣𝑖 ∗ log2 𝑃(𝑛𝑒𝑛|𝑣𝑖)otherwise

• −1 ≤ 𝑓𝑝𝑝𝑠 𝑣𝑖 ≤ 1• 第2項を2倍する必要があるのでは?

16

Page 17: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成

𝑓𝑠𝑠𝑠 𝑣𝑖 = �

1 + 𝑃 𝑠𝑠𝑠 𝑣𝑖 ∗ log2 𝑃(𝑠𝑠𝑠|𝑣𝑖)if 𝑃 𝑛𝑒𝑠 𝑣𝑖 ≤ 𝑃(𝑠𝑠𝑠|𝑣𝑖)

−1 − 𝑃 𝑛𝑒𝑠 𝑣𝑖 ∗ log2 𝑃(𝑛𝑒𝑠|𝑣𝑖)otherwise

17

Page 18: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成• 取り得る品詞ごとに各node(単語)の素性が平均値が計算される

• 𝑇𝑐𝑐𝑐,𝑧 𝑣𝑖 = �𝑓𝑧 𝑣𝑖 if 𝑣𝑖 ∈ 𝑐𝑐𝑡 0 if𝑣𝑖 ∉ 𝑐𝑐𝑡

• 𝑐𝑐𝑡 ∈ {𝑐𝑑𝑎, 𝑐𝑑𝑣,𝑛, 𝑣}

18

Page 19: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

提案手法 | 素性の生成• 最終的な素性8種(元論文から引用)

• SVMによる分類

19

Page 20: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

実験 | データ• pressrelation dataset (PDS)

– 1,521 statements• 金融情報機関に関するニュースから抽出した

statements(Finace)– 8,500 statements

• 4人の作業者によるアノテーション• それぞれ30%でグラフの学習• 残りのうち20%でSVMの学習

20

Page 21: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

実験 | 結果• PDSに対して64%の精度

–比較手法の最高精度より15ポイント上昇• Financeに対して65%の精度

–比較手法の最高精度より4ポイント上昇• SVMの学習データ量を減少させても,提案手法は比較的安定した結果

21

Page 22: 文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections

まとめ• 単語間のEntropy-basedの重み付けを利用した手法

• 訓練データをあまり必要としない• 主張部分の抽出手法・視点の決定手法と組み合わせることで,MRAのためのOpinion Miningを実現可能となる

22