elasticseach in outbrain recommender system - looking at content recommendations through a search...

Post on 18-Feb-2017

88 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Looking at Content Recommendation through a Search Lens

2People want GREAT

content

7

Content Recommendation EngineRelevance

Rec Engine

Content Inventory

8

Challenges

• Personalization

• A Jungle of Market RulesGeo targeting, publisher blacklisting of sites, URLs, titles

• Scale 35K req/sec, 50ms latency, millions of potential content recs

9

Search EnginesWhat can they do?

10

1. Score documents by relevance to query

Relevance

Query

Donald

Trump Search Engine

11

2. Filter documents by certain attributes

12

3. Work Efficiently and at Scale

13

3. Work Efficiently and at Scale

what the day brings

14

3. Work Efficiently and at Scale

what the day brings

15

3. Work Efficiently and at Scale

what the day brings

16

3. Work Efficiently and at Scale

Open Source

Distributed

Scalable

RESTful

Real-time search

17

3. Work Efficiently and at Scale

18

How Do we Reduce the Problem of Recommending Content to

Users to a Search Problem?

19

John, www.angelina.com

Television and Celebrities

Blacklist Site:www.brad.com

Translate user and context to a query of interests and market rules

20

Translate articles to searchable documents in the same feature space of user interests and market rules

Is about: Celebrities

site:www.brad.com

Breakup: What’s Next?

Brad's acting career

continues to flourish while he films a

new …

21

What is a Document About?

Semantic Features

CategoriesEntertainment/Television

TopicsStory, Murder, Television

EntitiesDolores, Westworld, HBO

NLP

22

Constructing a User Profile

Time

User Profile

23

User Profile

User Profile

25

26

27

28

29

30

31

32

33

Indexing Our Inventory to Elasticsearch Every ES document has one or more fields

Fields can be of different types

• Strings• Numeric• Boolean• Array of [stings | numbers | …]

34

Indexing Our Inventory to Elasticsearch Every article becomes an ES documentEvery article feature becomes a field{ "title" : "Westworld season 1 ends with explosive finale", "categories" : ["entertainment_television"], "topics" : ["story", "murder", "television"], "entities" : ["dolores", ”westworld", ”hbo"]}

Querying Elasticsearch

{ "query": { "filtered": { "query": { "term": {”category": ”celebrities" } },

”filter": { "term": {"site": "www.cnn.com" } } }}

36

{ "query": { "bool": { "should": [ {"terms":{ "categories": ["television", ”celebrities"]} }, {"terms":{ "topics": ["business", "cinema", "murder"]} },

{"terms":{ "entities": [”hbo", ”dolores", ”nyse"]} } ] } }}

Create Elasticsearch Query with User Interests

37

{ "query": { "bool": { "should": [ { "terms": { "categories": { "query": "television", "boost": 2.3 }}}, { "terms": { "categories": { "query": "investments", "boost": 1.6 }}}, { "terms": { "entities": { "query": ”dolores", "boost": 1.2 }}} ]}}}

Using Weights to Improve Relevance

38

{ "query": { "bool": { "should": [ {"terms":{ "categories": "?"}}, {"terms":{ "topics": "?"}},

{"terms":{ "entities": "?"}}}}}]

What about Cold-Start Users?

39

What about Cold-Start Users?

Display the most popular content

How? Index popularity score{ "title" : "Westworld season 1 ends ..", "categories" : ["entertainment_television"], "popularity" : 0.6}

{ "title" : "10 Best NY Resturants", "categories" : ["lifestyle/food"], "popularity" : 0.3}

40

What about Cold-Start Users?

Score by this field in the query

{ "query": { "function_score": { "query": { "match_all": {} }, "field_value_factor": { "field": "popularity" }, "boost_mode": "replace"}}}

41

Query with Blacklisted Sites

”www.angelina.com"

Blacklisted: ”www.brad.com”

From Market Rules to Elasticsearch Filters

42

Query with Blacklisted Sites

{ "must_not": [ { "terms": { "site": [

“www.brad.com”,]}}]}

www.angelina.com

43

{ "must_not": [ { "terms": { "site": [

“www.brad.com”,]}}]}

{ "title" : ”Breakup: what’s next?", ”site" : ”www.brad.com”}

Query with Blacklisted Sites

44

{ "must_not": [ { "terms": { "site": [

“www.brad.com”,]}}]}

{ "title" : ”Breakup: what’s next?", ”site" : ”www.brad.com”}

Document is Filtered

Out

Query with Blacklisted Sites

45

{ "title" : ”Breakup: what’s next?", ”site" : ”www.brad.com”}

{ "must_not": [ { "terms": { "site": [

“www.brad.com”,]}}]}

{ "title" : ”Top news of the week", ”site" : “www.cnn.com”}

Document Passes Filter

Query with Blacklisted Sites

46

From Market Rules to Elasticsearch FiltersGeo Targeting

”Music World – everything on NY Music Scene "

Targeting "US" users only

47

Index Geo Field in the Document

{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}

48

Add a Geo Filter to the Query

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }}}}

49

Apply Filter on Documents

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }

{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}

50

Apply Filter on Documents

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }

Document Passes Filter

{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}

51

Apply Filter on Documents

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["fr"]} }

{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}

52

Apply Filter on Documents

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["fr"]} }

Document is Filtered

Out

{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : ["us"]}

53

What about Documents Without a Specific Targeting?

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }

{ "title" :”Music World–everything on NY Music Scene“, "categories" : [”music"], "entities" : [”aerosmith", ”ny"], "geo" : [“"]}

54

What about Documents Without a Specific Targeting?

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us"]} }

Document is Filtered

Out

{ "title" :”Music Around the World“, "categories" : [”music"], "entities" : [”colplay", ”muse"], "geo" : [“"]}

55

Solution – Index & Query the Value "all"

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us", "all"]} }

{ "title" :”Music Around the World“, "categories" : [”music"], "entities" : [”colplay", ”muse"], "geo" : [“all"]}

56

{ "query": { "filtered": { "query": { "terms": { … } }, "filter": { "terms" : { "geo" : ["us", "all"]} }

Solution – Index & Query the Value "all"

Document Passes Filter

{ "title" :”Music Around the World“, "categories" : [”music"], "entities" : [”colplay", ”muse"], "geo" : [“all"]}

57

Adding Domain Specific Functionality to Elasticsearch

58

Indexing Marketer Cost Per Click Without IndexingCPC values change rapidly

Limitation: you cannot update a document in Elasticseach

Requirement: to keep up with throughput index should be immutable

Solution: store & update CPCs in a separate off-heap storage

59

Writing a Custom Scoring Function

Combining high-granularity behavioral signals

Applying supervised learning models to compute scores

Use dynamic scripting (e.g Groovy)

OR

Use native Java via Elsaticseach plugins mechanism

Thank Yousliberman@outbrain.com

top related