finding what matters in questions

NAACL-HLT 2013 1

Finding What Matters in Questions

Xiaoqiang Luo, Hema Raghavan, Vittorio Castelli, Sameer Maskey and Radu Florian

IBM T.J. Watson Research Center

3

Introduction

ه e.q. : “How does one apply for a New York day care license?”ه bag-of-words model 的最高分 :

ى “New licenses for day care centers in York county, PA”ه MMP model :

“ 用ى New York,” “day care,” and “license” 這三個 phrase 來搜尋ه We call these important phrases mandatory matching

phrases (MMPs)

NAACL-HLT 2013

NAACL-HLT 2013 4

Question Corpus

ه subset of the DARPA BOLT corpus containing forum postings in English.

ه 四人挑選 question ه 以下 5 種 question 不會用

需要推理或計算才能得到答案的問句ى問題描述不清楚或有ى ambiguation可以拆成很多問句的問題ىmultiple choice questionsىى factoid questions

NAACL-HLT 2013 5

Question Corpus

ه 兩位標記者負責標記所挑選的 question 的MMP 類型 (MMP-Must, MMP-Maybe) 以及span

ه E.q.

不重疊連續

NAACL-HLT 2013 6

Generate MMP Training Instances

NAACL-HLT 2013 7


m

N

m

N

m

N

NAACL-HLT 2013 8


ه Output instances:ه < span, MMP type>

E.q. : hedge funds = <(5, 6), +1>

Position : 0 1 2 3 4 5 6 7 8 9

deep : 0

1

2

3

4

5

6

ه MMP type:ه MMP-Must : +1ه MMP-Skip : -1ه MMP-Maybe : -1

<(4, 6), +1>

p

Np <(4, 4), +1><(4, 6), +1>

p

Np

<(7, 9), +1><(9, 9), +1>

p

<(5, 6), +1>

NAACL-HLT 2013 9


NAACL-HLT 2013 10

MMP Features

Lexical Features:ه CaseFeatures:

ه is the first word of an MMP upper-case?ه Is it all capital letters? ه Does it contain numeric letters?ه E.q. :

.For “(NP American)” in Figure 1, the upper-case feature firesى

NAACL-HLT 2013 11

MMP Features

Lexical Features:ه CommonQWord:

ه Does the MMP contain question words, including “What,” “When,” “Who,” etc.

NAACL-HLT 2013 12

MMP Features

Syntactic Features:ه PhraseLabel:

ه this feature returns the phrasal label of the MMP.ه E.q:

”.For “(NP American)” in Figure 1, the feature value is “NPى

NAACL-HLT 2013 13

MMP Features

Syntactic Features:ه NPUnique:

ه this Boolean feature fires if a phrase is the only NP in a question

ه E.q.: .For “(NP American),” the feature value would be falseى

NAACL-HLT 2013 14

MMP Features

Syntactic Features:ه PosOfPTN:

ه (1) the position of the left-most word of the nodeه (2) whether the left-most word is the beginning of the

questionه (3) the depth of the anchoring node, defined as the

length of the path to the root node.

NAACL-HLT 2013 15

E.q. of PosOfPTN

ه E.q: For “(NP American)” in Figure 1:ه 5th word in the sentenceه not the first word of the sentenceه Depth of the node is 6

Position : 1 2 3 4 5 6 7 8 9 10

deep : 0

1

2

3

4

5

6

NAACL-HLT 2013 16

MMP Features

Syntactic Features:ه PhrLenToQLenRatio:

ه This feature computes the number of words in an MMP, and its relative ratio to the sentence length.

NAACL-HLT 2013 17

MMP Features

Semantic Features (NETypes):ه The feature tests if a phrase is or contains a named

entity, and, if this is the case, the value is the entity type.ى information extraction (IE) pipeline consisting of syntactic

parsing, mention detection and coreference resolution (Florian et al., 2004; Luo et al., 2004; Luo and Zitouni, 2005)

ه E.q. : For “(NP American)” in Figure 1, the feature value would be “GPE.”

ه

NAACL-HLT 2013 18

MMP Features

Corpus-based Features ( AvgCorpusIDF):ه This group of features computes the average of the

IDFs of the words in this phrase. Have stop wordsى

NAACL-HLT 2013 19

MMP Classification Results

Classifier:ه logistic regression binary classifier using WEKA.Data set:

questionstraining set 174

test set 27

NAACL-HLT 2013 20

Performances of the MMP classifier

NAACL-HLT 2013 21

Example Questions by MMP Model

NAACL-HLT 2013 22

Data for Relevance Model

ه From BOLT-IR task(IR, 2012)ه Top snippets returned by the search engine are

judged for relevancy by our annotators.

questiontraining set 390 (28915 snippet, 6528 answer)

test set 59

NAACL-HLT 2013 23

Relevance Prediction

ه The relevance model is a conditional distribution P(r|q, s;D)ه where r is a binary random variable indicating if the

candidate snippet s is relevant to the question q.ه D is the document where the snippet s is found.

NAACL-HLT 2013 24


Baseline systemه (1) Text Match Features

ه query and snippet 的 cosine scoresه (2) Answer Type Features:

ه The top 3 predictions of a statistical classifier trained to predict answer categories were used as features.

NAACL-HLT 2013 25


Baseline systemه (3) Mention Match Features

ه whether a named entity in the query occurs in the snippet.

NAACL-HLT 2013 26


Baseline systemه (4) Event match features

ه use several hand-crafted dictionaries containing terms exclusive to various types of events like ”violence”, ”legal”, ”election”.

ه If both the query and snippet contain the same event type ’The features take value is ‘1ى

NAACL-HLT 2013 27


Baseline systemه (5) Snippet Statistics:

ه snippet lengthه the position of the snippet in the post etc were created.

NAACL-HLT 2013 28


Features Derived from MMPه HardMatch:

ه Let I(m s)∈ be a 1 or 0 function indicating if a snippet contains the MMP m

NAACL-HLT 2013 29


Features Derived from MMPه SoftLMMatch:

ه The SoftLMMatch score is a language-model (LM) based score, similar to that used in (Bendersky and Croft, 2008), except that MMPs play the role of concepts.

NAACL-HLT 2013 30



ه The SoftLMMatch score is a language-model (LM) based score, similar to that used in (Bendersky and Croft, 2008), except that MMPs play the role of concepts.

NAACL-HLT 2013 31



ه where wi is the ith in snippet sه I(wi = v) is an indicator function, taking value 1 if wi is v

and 0 otherwiseه |V | is the vocabulary size

NAACL-HLT 2013 32


Features Derived from MMPه MMPInclScore:

ه where w ∈ m are the words in mه I( ・ ) is the indicator function taking value 1 when the argument

is true and 0 otherwiseه is a constant thresholdه l(w, s) is the similarity of word w to the snippet s as:

ى l(w, s) = maxv s ∈ JW(w, v) ى JW(w, v) = (Jaro Winkler similarity score between words w and v)

NAACL-HLT 2013 33


Features Derived from MMPه MMPInclScore:

ه The MMP weighted inclusion score between the question q and snippet s is computed as:

NAACL-HLT 2013 34


Features Derived from MMPه MMPRankDep:

ه This feature, RD(q, s) first tests if there exists a matched bilexcial dependency between q and s;

NAACL-HLT 2013 35



ه Let m(i) be the ith ranked MMPه let <wh, wd | q> and <uh, ud | s> be bilexical

dependencies from q and s, respectively wh and uh are the headsىwd and ud are the dependentsى

NAACL-HLT 2013 36



ه EQ(w, u) EQ(w, u) is true if either w and u are exactly the same, or theirى

morphs are the same, or they head the same entity, or their synset in WordNet overlap

ه RD(q, s) RD(q, s) is true if and only ifى

ي EQ(wh, uh) EQ(w∧ d, ud) w∧ h m∈ (i) w∧ d m∈ (j) is true for some <wh, wd | q>, for some <uh, ud | s> and for some i and j.

NAACL-HLT 2013 37


3 snippet classifiers modelه noMMP model

ه a system without MMP features;ه IDF-as-MMP model

ه a baseline with each word as an MMP and the word’s IDF as the MMP score.

ه MMP model

NAACL-HLT 2013 38


Performance of 3 snippet classifiers system

NAACL-HLT 2013 39

End-to-End System Results

ه The question-answering system is used in the 2012 BOLT IR evaluation (IR, 2012)ه There are 499K(Arabic), 449K(Chinese ) and

262K(English ) threads in each of these languages. ه The Arabic and Chinese posts were first translated into

English before being processed.

NAACL-HLT 2013 40

End-to-End System Results

ه performance

NAACL-HLT 2013 41

BOLT Evaluation Results

ه The BOLT evaluation consists of 146 questions, mostly event- or topic- related

NAACL-HLT 2013 42

BOLT Evaluation Results

NAACL-HLT 2013 43

Conclusions

ه 作者提供一個使用 mandatory matching phrases (MMP) 的 QA 系統

ه 從 question 抽取出 MMP 的 F-measure 高達 88.6%

ه 將 MMP model 跟 snippet relevance model 合併可以有效提升 snippet relevance model 的效能ه 使用 MMP 的 QA 系統是 2012 BOLT IR 中效能最好的系統

finding what matters in questions

Documents