enhancing biomedical text rankers by term proximity information 劉瑞瓏...

33
Enhancing Biomedical Text Rankers by Term Proximity Information 劉劉劉 劉劉劉劉劉劉劉劉劉劉 2012/06/13

Upload: darleen-shelton

Post on 04-Jan-2016

256 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Enhancing Biomedical Text Rankers by

Term Proximity Information

劉瑞瓏慈濟大學醫學資訊學系

2012/06/13

Page 2: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Outline

• Background– Text ranking– Biomedical information needs

• An approach to enhancing text rankers in the biomedical domain

• Evaluation

• Conclusion

2

Page 3: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Research Background

3

Page 4: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Text Ranking• Goal

– Given a query q and a set T of texts retrieved for q, ranking those texts (in T) according to their degrees of relevance to q

• Motivation– Reducing information overload, since T is often

quite huge, even a smart search engine is used– Text ranking is a key issue in information

retrieval, and often a “secret” component for search engines

4

Page 5: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

An Example Ranker

5

Page 6: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Biomedical Information Need

• Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature

• Retrieval of the evidences requires a system that – Accepts a natural language query for a biomedical

information need, and – Ranks relevant texts higher for access or processing

6

Page 7: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

An Example

• Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) – A disease as the target concept (i.e., urinary tract infection)

– Two concepts about the scenario of the information need (i.e., treatment and admission)

• Neither special nor related to any disease

7

Page 8: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Contextual Completeness

• Biomedical queries need to be well-formed, and so call for a retrieval system that considers contextual completeness of each query concept t in the text d– Contextual completeness of t in d is the extent

to which the query concepts other than t appear in nearby areas in d

8

Page 9: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

An Example

9

• In children with an acute febrile illness, what is the efficacy of single medication therapy with acetaminophen or ibuprofen in reducing fever?

[From Lin & Demner-Fushman, 2006]

PICO

Task

Answer

Strength

Page 10: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

An Approach to Improving Rankers for Biomedical Info Needs

10

Page 11: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

11

Goals

• An approach PRE (Proximity-based Ranker Enhancer) that – Measures contextual completeness of query

concepts appearing in a nearby area in the text– Serves as a supplement to improve existing

rankers

Page 12: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

12

Contrast with Related Work• Biomedical text ranking

– Using synonyms and considering diversity of passages, without considering term proximity

• Text ranking– Individual text scoring techniques (e.g., BM25)

and learning to rank techniques (e.g., Ranking SVM), without considering term proximity

• Improving ranking by term proximity– Term proximity is employed, but contextual

completeness was not considered

Page 13: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

System Overview

13

Text Ranker Development

TrainingTesting

Underlying RankerPRE

Text Ranking TF in d

User

Query (q)

Text (d)

TF (Term Frequency) Assessment

Training Data

Ranked Texts

Page 14: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

TF Assessment

14

• Three types of term proximity– Overall proximity (QTermTF)– Individual proximity (IndiP)– Collective proximity (CollP)

• A term t may get a large TF increment in d, if – Many query terms appear frequently in d– Query terms are individually near to t at some

places, and– Query terms collectively appear at a place near to t

Page 15: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

15

•RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q)•TFincrement(t,d,q) = QtermTF(d,q)IndiP(t,d,q)×CollP(t,d,q)•QtermTF(d,q) = Total TF of query terms in d•IndiP(t,d,q) =ΣmM -

{t}SigmoidWeight(Mindist(t,m))/ MaxIndiP•Mindist(x,y) = shortest distance between x and y in d•SigmoidWeight(dt) = 1/(1+e-((|q|-1)-dt))•CollP(t,d,q) = MaxkK{mM - {t}

SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d•dist(t,k,m) = Distance between t (at position k) and m

Page 16: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

16

Page 17: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Empirical Evaluation

17

Page 18: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Experimental Data• OHSUMED

– A popular database of biomedical queries and references

– 106 queries– 348,566 references– 16,140 query-reference pairs

• Definitively relevant• Possibly relevant• Not relevant

18

Page 19: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

• TREC Genomics 2006– 28 queries (topics) and 27,999 query-passage

pairs• Definitively relevant, possibly relevant, and not

relevant

– 13,993 query-reference pairs

• TREC Genomics 2007– 36 queries and 35,996 query-passage pairs

• Relevant and not relevant

– 22,913 query-reference pairs

19

Page 20: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Underlying Rankers

20

Page 21: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Baseline Ranker Enhancer• Three state-of-the-art techniques that enhanced

text rankers by term proximity– The t-function: t() [Tao & Zhai, 2007]

– The p-function: p() [Cummins & O’Riordan, 2009] – The proximity language model: PLM [Zhao & Yun,

2009]

21

Page 22: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Evaluation Criteria• Evaluating how relevant references are ranked

higher for users to access– Mean average precision (MAP)

– Normalized discount cumulative gain at x (NDCG@X)

22

Page 23: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Results

23

Page 24: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

24

Page 25: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

25

Page 26: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

26

Page 27: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

27

Page 28: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

28

Page 29: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

29

Page 30: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

30

Page 31: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

Conclusion

31

Page 32: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

• Contextual completeness of query concepts in the texts is essential in ranking biomedical texts

• To measure contextual completeness, it is helpful to integrate three types of term proximity– Overall proximity– Individual proximity– Collective proximity

• Existing rankers may be comprehensively enhanced

32

Page 33: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13

33

Thank You!