query understanding in web search - 黒橋・河原研...

64
Query Understanding in Web Search - by Large Scale Log Data Mining and Statistical Learning Hang Li Microsoft Research Asia COLING 2010 NLPIX Workshop August 28, 2010 Joint Work with Colleagues, Interns, Collaborators 1

Upload: phunghanh

Post on 22-Apr-2018

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Query Understanding in Web Search- by Large Scale Log Data Mining and

Statistical Learning

Hang Li

Microsoft Research Asia

COLING 2010 NLPIX Workshop

August 28, 2010

Joint Work with Colleagues, Interns, Collaborators

1

Page 2: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Web Search is Part of Our Life

2

Page 3: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Search System = `Black Boxes’

3

Page 4: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Natural Language ProcessingInformation Retrieval

Data Mining

Large Scale Distributed Computing

Advanced Web Search Technologies Are Used…

Statistical Learning

4

Page 5: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Web Search Relies on NLP and IR

• Query Understanding– Classification, structure prediction, topic modeling, similarity

learning

• Document Understanding– Classification, structure prediction, topic modeling, learning on

graph

• Query Document Matching– Language model, similarity learning

• Ranking– Learning to rank

• User Understanding– Classification, topic modeling

5

Page 6: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Query Understanding

• Input: query

• Output: query representation

– Refined query (e.g., spelling error correction)

– Similar queries

– Categories

– Topics

– Key phrases

– Named entities

6

Page 7: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

This Talk = Query Understanding

7

Page 8: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Talk Outline: Three Projects

• LOGAL: Search and Browse Log Mining Platform

• Semantic Matching: Improving Tail Query Relevance

• Context aware Search: Better Search Using Context Information

2010/8/30 8

Page 9: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

PROJECT: LOGAL (LOG OBJECT GALLERY)

9

Joint work with Daxin Jiang, Xiaohui Sun

Page 10: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

LOGAL

Search and Browse Log Mining Platform

10

Page 11: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Data Structure of Search/Browse Logs

• Various types of data

• Complex relationship among data objects

– Hierarchical relationship

– Sequential relationship

Users

Queries

Srch/Ads clicks

Follow-up clicks

Sessions

Search result pages

Page 12: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Rich Log Mining Applications

Log Mining Applications

Search Applications

Ads Applications

Query Understanding

Document Understanding

User Understanding

Query-Doc Matching

Query Suggestion

Query Expansion

Query Substitution

Query Classification

Document Annotation

Document Classification

Document Summarization

Personalized Search

Search UI Design

User Satisfaction Prediction

Document & Ad (re-)Ranking

Search Results Clustering

Search Results Diversification

Web Site Recommendation

Keyword Generation

Behavior Targeting

Ad Click-Through Prediction

Contextual Advertising

12

Page 13: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

The Problem

Search Log

Apps

Log Data

Query Understanding

A huge gap between the data and the applications

Toolbar Data Ads. Log Web Site Log

Document Understanding

User Understanding

Query-Doc Matching

13

Page 14: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

The Problem

Search Log

Apps

Log Data

Query Understanding

Toolbar Data Ads. Log Web Site Log

Document Understanding

User Understanding

Query-Doc Matching

Each researcher or developer

1. Has to access the raw log data directly

2. Has to build the application from scratch

Very difficult to build large-scale log mining applications

14

Page 15: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Log Data Mining Platform

Log Objects Gallery (LOGAL)

ToolbarData

Web SiteLog

SearchLog

Ads.Log

Raw Logs

Query Understanding

Document Understanding

Query DocMatching

User Understanding

App Level

Middle Level

Raw Data Level

Data Platform

15

Page 16: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Query Histogram

Example applications:• Query auto completion• Query suggestion• Query analysis: temporal changes of query frequency

Query Countfacebook 3,157 Kgoogle 1,796 Kyoutube 1,162 K

myspace 702 Kfacebook com 665 Kyahoo 658 Kyahoo mail 486 Kyahoo com 486 Kebay 486 Kfacebook login 445 K

Page 17: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Click-through Bipartite

• Example applications

– Document (re-)ranking

– Search results clustering

– Web page summarization

– Query suggestion

click-through bipartite

Page 18: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Click PatternQuery

Doc 1Doc 2…

………………Doc N

Doc 1 Doc 2

………………

Doc N

Doc 1 Doc 2

… …

……………Doc N

Pattern 1 (count)

Pattern n (count)

Pattern 2 (count)

• Example applications– Estimate relevance of document to query

– Predict users’ satisfaction

– Query classification (informational vs navigational)

Page 19: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Session Pattern

• Example applications– Doc (re-)ranking

– Query suggestion

– Site recommendation

– User satisfaction prediction

Srch click: search clickAds click: advertisement click

User activities in a session

Query Click:Srch clickAds click…

Browse

Page 20: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

PROJECT: SEMANTIC MATCHING

20

Joint work with Gu Xu, Jun Xu, Jingfang Xu

Page 21: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Semantic Matching

Improving Tail Query Relevance

21

Page 22: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Different Queries Can Represent Same Intent “Distance between Sun and Earth” - Luke DeLorme

• distance from earth to the sun• distance from sun to earth• distance from sun to the earth• distance from the earth to the sun• distance from the sun to earth• distance from the sun to the earth• distance of earth from sun• distance of earth from the sun• distance of earth to sun• distance of earth to the sun• distance of sun from earth• distance of sun from the earth• distance of sun to earth• distance of the earth from the sun• distance of the earth to the sun• distance of the sun from earth• distance of the sun from the earth• distance of the sun to earth• distance of the sun to the earth• distance sun• distance sun and earth• distance sun earth• distance sun from earth• distance sun to earth• distance to sun from earth• distance to the sun from earth• earth and sun distance

• "how far" earth sun

• "how far" sun

• "how far" sun earth

• average distance earth sun

• average distance from earth to sun

• average distance from the earth to the sun

• distance between earth & sun

• distance between earth and sun

• distance between earth and the sun

• distance between earth sun

• distance between sun and earth

• distance between the earth and sun

• distance between the earth and the sun

• distance between the sun and earth

• distance between the sun and the earth

• distance earth and sun

• distance earth from sun

• distance earth is from the sun

• distance earth sun

• distance earth to sun

• distance earth to the sun

• distance from earth to sun

• distance from earth to the sun

• distance from sun to earth

• distance from sun to the earth

• distance from the earth to the sun

• distance from the sun to earth

• how far away is the sun from earth• how far away is the sun from the

earth• how far earth from sun• how far earth is from the sun• how far earth sun• how far from earth is the sun• how far from earth to sun• how far from the earth to the sun• how far from the sun is earth• how far from the sun is the earth• how far is earth away from the sun• how far is earth from sun• how far is earth from the sun• how far is earth to the sun• how far is it from earth to the sun• how far is it from the earth to the sun• how far is sun from earth• how far is the earth away from the

sun• how far is the earth from sun• how far is the earth from the sun• how far is the earth to the sun• how far is the sun• how far is the sun away from earth• how far is the sun away from the

earth

Microsoft Confidential

Page 23: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Different Levels of Semantic Matching

Structure

Term

Word Sense

Topic

Leve

l of

Sem

anti

cs

Match exactly same terms

NY New York

disk disc

Match terms with same meanings

NY New York

motherboard mainboard

utube youtube

Match topics of query and documents

Microsoft Office … working for Microsoft … my office is in …

Topic: PC Software Topic: Personal Homepage

Match intent with answers (structures of query and document)Microsoft Office home find homepage of Microsoft Office

21 movie find movie named 21

buy laptop less than 1000 find online dealers to buylaptop with less than 1000 dollars

23

Page 24: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Semantic Matching Is Useful for

• General Search Relevance

• Vertical Search

• Entity Search

• Task Completion

24

Page 25: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

SYSTEM VIEW OF SEMANTIC MATCHING

25

Page 26: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Overall System

Query Representation

Query Index Document Index

Search Log Data Web Data

Offline Query Processing Offline Document Processing

Ranked Documents

Document Representations

Query

Query Knowledge

Online Query Processing Semantic Matching

Microsoft

Online

Offline

26

Page 27: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Online Query Processing

Named Entity Recognition in Query

Query Topic Identification

Similar Query Finding

Query RefinementSense

Topic

Structure

michael jordan berkele

michael jordan berkeley

michael jordan berkeley

michael I. jordan berkeley

michael jordan berkeley: academic

michael I. jordan berkeley: academic

[michael jordan: PersonName] [berkeley: Location]: academic

[michael I. jordan: PersonName] [berkeley: Location]: academic

27

Page 28: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Offline Document Processing

Named Entity Recognition in Doc.

Document Topic Identification

Key Concept Identification

TokenizationSense

Topic

Structure

Michael Jordan is Professor in the Department of Electrical Engineering

[Michael Jordan] is [Professor] in the [Department] of [Electrical Engineering]

[Michael Jordan/M. Jordan] is [Professor]in the [Department/Dept.] of [Electrical Engineering/EE]

[Michael Jordan/M. Jordan] is [Professor]in the [Department/Dept.] of [Electrical Engineering/EE]: academic

[Michael Jordan/M. Jordan: PersonName]is [Professor] in the [Department/Dept.] of [Electrical Engineering/EE]: academic

28

Page 29: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Online Semantic Matching

QueryRepresentation

DocumentRepresentation

[michael jordan: PersonName] [berkeley: Location]: academic

[michael I. jordan: PersonName] [berkeley: Location]: academic

[Michael Jordan/M. Jordan: PersonName]is [Professor] in the [Department/Dept.] of [Electrical Engineering/EE]: academic

SemanticMatching

Matching can be conducted at different levels

Ranking Results

29

Page 30: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

QUERY REFINEMENT USING CRF MODEL

30

Page 31: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Correcting Errors in Query

search “windows onecare”

window onecar

Query Refiner

SearchSystem

windows onecare

31

Page 32: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Structured Prediction Problem

windows onecare

window onecarObserved “noisy” word sequence

“Ideal” word sequence

original query word sequence

“ideal” query word sequence

32

Page 33: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Conditional Random Fields for Query Refinement

Introducing Refinement Operations

oi-1 oi oi+1

yi-1 yi yi+1

xi-1 xi xi+1

OperationsSpelling: insertion, deletion, substitution, transposition, …

Word Stemming: +s/-s, +es/-es, +ed/-ed, +ing/-ing, … 33

Page 34: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Query Refinement Using Conditional Random Fields

34

Page 35: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

NAMED ENTITY MINING FROM QUERY LOG USING TOPIC MODEL

35

Page 36: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Named Entity Recognition in Query

harry potter harry potter film harry potter author

harry potter – Movie (0.5)harry potter – Book (0.4)harry potter – Game (0.1)

harry potter filmharry potter – Movie (0.95)

harry potter authorharry potter – Book (0.95)

36

Page 37: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Our Approach

• Using Query Log Data (or Click-through Data)

• Using Topic Model

• Weakly Supervised Latent Dirichlet Allocation

• vs Pasca’s work (named entity mining from log data, deterministic approach)

37

Page 38: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Seed and Query Log

final fantasyMovie Game

gone with the windMovie Book

harry potterMovie Book Game

final fantasy 300final fantasy movie 120final fantasy wallpaper 50gone with the wind movie 120gone with the wind review 10gone with the wind photos 10harry potter 1000harry potter book 650 gone with the wind book 80gone with the wind summary 20 harry potter cheats 300harry potter pics 200harry potter summary 100final fantasy xbox 10final fantasy soundtrack 10gone with the wind 250harry potter movie 800……

38

Page 39: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Pseudo Documents of Named Entities

\# 1000\# movie 800\# book 650 \# cheats 300\# pics 200\# summary 100

\# 250\# movie 120\# book 80\# summary 20\# review 10\# photos 10

\# 300\# movie 120\# wallpaper 50\# xbox 10\# soundtrack 10

final fantasy Movie, Game

gone with the wind

harry potter Movie, Book, Game

Movie, Book

39

Page 40: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Latent Dirichlet Allocation Model

z: Movie, Book, Gamew: \#, \# movie, \# book, ….

: distribution of classes for named entity: distribution of contexts for class

40

Page 41: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Weakly Supervised Latent Dirichlet Allocation

),()|(log yCDp

0

0.2

0.4

0.6

0.8

harry potter

final fantacy

gone with wind

Movie

Book

Game

Music

constraints41

Page 42: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

PROJECT: CONTEXT AWARE SEARCH

42

Joint work with Daxin Jiang, Jian Pei, and others

Page 43: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Context aware Search:

Better Search Using Context Information

43

Page 44: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Job related

Search in Office

44

Page 45: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Search at Home

• Household related

• Hobby and leisure

45

Page 46: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Location

• Time

• Activity

Search in Mobile Context

46

Page 47: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Community

Search in Social Context

47

Page 48: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Conventional Web Search

48

Page 49: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Suppose that user raises query “jaguar”

• If we know the user raises query “”BMW” before “”jaguar”

• Then we know that the user is likely to look for the car

Search Intent and Context

49

Page 50: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• User usually conducts multiple related searches in a session

Context of Search

User query query

click click click Current search

Context

50

Page 51: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Example of search sessions

Context Information is Useful

SID Search sessions

S1 Ford Toyota GMC Allstate

www.autohome.com

S2 Ford cars Toyota cars GMC cars Allstate

www.autohome.com

S3 Ford cars Toyota cars Allstate

www.allstate.com

S4 GMC GMC dealers

www.gmc.com

51

Page 52: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• 50% of users clicked car review site www.autohome.com after searching several car names.

Context Information is Useful

52

Page 53: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• How to model context?

• How to learn context model from data?

• How to apply context model in search?

Challenges in Context aware Search

53

Page 54: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Three Models

– Sequential Model (Cao et al. KDD 2008)

– Hidden Markov Model (Cao et al. WWW 2009)

– Conditional Random Fields Model (Cao et al. SIGIR 2009)

• Large Scale Data Mining to Construct Models

• Using Learning Models to Make Prediction (Context aware Search)

Our Approach

54

Page 55: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Modeling Context by Sequential Model

q1

u1

c1

qi

ui

ci

qt

ut

ct… …

55

Page 56: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Modeling Context by CRF

q1

u1

c1

qi

ui

ci

qt

ut

ct… …

56

Page 57: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

Modeling Context by HMM

q1

u1

c1

qi

ui

ci

qt

ut

ct… …

57

Page 58: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

TRAINING HIDDEN MARKOV MODEL

58

Page 59: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Challenge 1:– EM algorithm needs determined number of hidden states.

– However, in our problem, hidden states correspond to search intents, for which the number is unknown.

• Our Solution:– Conduct clustering on click-bipartite graph and view

clusters as hidden states.

Training Very Large HMM

59

Page 60: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Challenge 2: – Search log data contains hundreds of millions of sessions.

– It is impractical to train HMM from such huge training data on single machine.

• Our Solution:– Deploy learning task on distributed system under map-

reduce model

Training Very Large HMM

60

Page 61: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Challenge 3: – Each machine needs to hold the values of all parameters.

– Since search log data contains millions of unique queries and URLs, the space of parameters is extremely large.

• Our Solution:– Employ special initialization strategy based on the clusters

mined from click-through bipartite

Training Very Large HMM

61

Page 62: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

SUMMARY

62

Page 63: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

• Web search relies on NLP and IR

• Query understanding = identify user search intent

• Query understanding needs– Large scale mining platform

– Advanced NLP and IR technologies

– Advanced statistical learning technologies

• Our Projects– LOGAL: search and browse log mining platform

– Semantic Matching: improving tail query relevance

– Context aware Search: better search using context information

• NLP Challenges and Technologies in Information Explosion Era

Summary

63

Page 64: Query Understanding in Web Search - 黒橋・河原研 …nlp.ist.i.kyoto-u.ac.jp/NLPIX2010/slides/InvitedTalkAtNL...Query Understanding in Web Search-by Large Scale Log Data Mining

THANKS!

64