1©MapR Technologies 2013- Confidential
Introduction to MahoutAnd How To Build a Recommender
2©MapR Technologies 2013- Confidential
Topic For Today
What is recommendation? What makes it different? What is multi-model recommendation? How can I build it using common household items?
3©MapR Technologies 2013- Confidential
Oh … Also This
Detailed break-down of a live machine learning system running with Mahout on MapR
With code examples
4©MapR Technologies 2013- Confidential
I may have to summarize
5©MapR Technologies 2013- Confidential
I may have to summarize
just a bit
6©MapR Technologies 2013- Confidential
Part 1:5 minutes of background
7©MapR Technologies 2013- Confidential
Part 2:5 minutes: I want a pony
8©MapR Technologies 2013- Confidential
9©MapR Technologies 2013- Confidential
Part 1:5 minutes of background
10©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
11©MapR Technologies 2013- Confidential
What Does Machine Learning Look Like?
O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high qualityO(κ d log k) or O(d log κ log k) for larger k, looser quality
But tonight we’re going to show you how to keep it simple yet powerful…
12©MapR Technologies 2013- Confidential
Recommendations as Machine Learning
Recommendation: – Involves observation of interactions between people taking action (users)
and items for input data to the recommender model– Goal is to suggest additional appropriate or desirable interactions– Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts
13©MapR Technologies 2013- Confidential
14©MapR Technologies 2013- Confidential
15©MapR Technologies 2013- Confidential
Part 2:How recommenders work
(I still want a pony)
16©MapR Technologies 2013- Confidential
Recommendations
Recap:Behavior of a crowd helps us understand what individuals will do
17©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Alice
Charles
18©MapR Technologies 2013- Confidential
Recommendations
Alice got an apple and a puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charles
19©MapR Technologies 2013- Confidential
Recommendations
What else would Bob like??
Alice
Bob
Charles
20©MapR Technologies 2013- Confidential
Recommendations
A puppy, of course!
Alice
Bob
Charles
21©MapR Technologies 2013- Confidential
You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony)
22©MapR Technologies 2013- Confidential
Recommendations
What if everybody gets a pony?
?
Alice
Bob
Charles
Amelia What else would you recommend for Amelia?
23©MapR Technologies 2013- Confidential
Recommendations
?
Alice
Bob
Charles
AmeliaIf everybody gets a pony, it’s not a very good indicator of what to else predict...
24©MapR Technologies 2013- Confidential
Problems with Raw Co-occurrence
Very popular items co-occur with everything (or why it’s not very helpful to know that everybody wants a pony…)– Examples: Welcome document; Elevator music
Very widespread occurrence is not interesting as a way to generate indicators – Unless you want to offer an item that is constantly desired, such as razor
blades (or ponies)
What we want is anomalous co-occurrence– This is the source of interesting indicators of preference on which to
base recommendation
25©MapR Technologies 2013- Confidential
Get Useful Indicators from Behaviors
1. Use log files to build history matrix of users x items– Remember: this history of interactions will be sparse compared to all
potential combinations
2. Transform to a co-occurrence matrix of items x items
3. Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix
– Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference
– RowSimilarityJob in Apache Mahout uses LLR
26©MapR Technologies 2013- Confidential
Log Files
Alice
Bob
Charles
Alice
Bob
Charles
Alice
27©MapR Technologies 2013- Confidential
Log Files
u1
u3
u2
u1
u3
u2
u1
t1
t4
t3
t2
t3
t3
t1
28©MapR Technologies 2013- Confidential
Log Files and Dimensions
u1
u3
u2
u1
u3
u2
u1
t1
t4
t3
t2
t3
t3
t1
t1
t2
t3
t4
Things
u1 Alice
BobCharles
u3u2
Users
29©MapR Technologies 2013- Confidential
History Matrix: Users by Items
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔
30©MapR Technologies 2013- Confidential
Co-occurrence Matrix: Items by Items
-
1 2
1 1
1
1
2 1
How do you tell which co-occurrences are useful?.
00
0 0
31©MapR Technologies 2013- Confidential
Co-occurrence Matrix: Items by Items
-
1 2
1 1
1
1
2 1
Use LLR test to turn co-occurrence into indicators…
00
0 0
32©MapR Technologies 2013- Confidential
Co-occurrence Binary Matrix
1
1not
not
1
33©MapR Technologies 2013- Confidential
Spot the Anomaly
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
What conclusion do you draw from each situation?
34©MapR Technologies 2013- Confidential
Spot the Anomaly
Root LLR is roughly like standard deviations In Apache Mahout, RowSimilarityJob uses LLR
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.90 1.95
4.52 14.3
What conclusion do you draw from each situation?
35©MapR Technologies 2013- Confidential
Co-occurrence Matrix
-
1 2
1 1
1
1
2 1
Recap: Use LLR test to turn co-occurrence into indicators
00
0 0
36©MapR Technologies 2013- Confidential
Indicator Matrix: Anomalous Co-Occurrence
✔✔
Result: The marked row will be added to the indicator field in the item document…
37©MapR Technologies 2013- Confidential
Indicator Matrix
✔
id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine.
Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators.
38©MapR Technologies 2013- Confidential
Internals of the Recommender Engine
38
39©MapR Technologies 2013- Confidential
Internals of the Recommender Engine
39
40©MapR Technologies 2013- Confidential
Looking Inside LucidWorks
What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
40
Real-time recommendation query and results: Evaluation
41©MapR Technologies 2013- Confidential
Search-based Recommendations
Sample document– Merchant Id– Field for text description– Phone– Address– Location
42©MapR Technologies 2013- Confidential
Search-based Recommendations
Sample document– Merchant Id– Field for text description– Phone– Address– Location
– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40
43©MapR Technologies 2013- Confidential
Search-based Recommendations
Sample document– Merchant Id– Field for text description– Phone– Address– Location
– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40
Sample query– Current location– Recent merchant descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted offers– Local top40
44©MapR Technologies 2013- Confidential
Search-based Recommendations
Sample document– Merchant Id– Field for text description– Phone– Address– Location
– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40
Sample query– Current location– Recent merchant descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted offers– Local top40
Original data and meta-data
Derived from cooccurrence and cross-occurrence analysis
Recommendation query
45©MapR Technologies 2013- Confidential
For example
Users enter queries (A)– (actor = user, item=query)
Users view videos (B)– (actor = user, item=video)
ATA gives query recommendation– “did you mean to ask for”
BTB gives video recommendation– “you might like these videos”
46©MapR Technologies 2013- Confidential
The punch-line
BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)
47©MapR Technologies 2013- Confidential
Real-life example
Query: “Paco de Lucia” Conventional meta-data search results:– “hombres del paco” times 400– not much else
Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff
48©MapR Technologies 2013- Confidential
Real-life example
49©MapR Technologies 2013- Confidential
Hypothetical Example
Want a navigational ontology? Just put labels on a web page with traffic– This gives A = users x label clicks
Remember viewing history– This gives B = users x items
Cross recommend– B’A = label to item mapping
After several users click, results are whatever users think they should be
50©MapR Technologies 2013- Confidential
Nice. But we can do better?
51©MapR Technologies 2013- Confidential
A Quick Simplification
Users who do h (a vector of things a user has done)
Also do r
User-centric recommendations(transpose translates back to things)
Item-centric recommendations(change the order of operations)
A translates things into users
52©MapR Technologies 2013- Confidential
Symmetry Gives Cross Recommentations
Conventional recommendations with off-line learning
Cross recommendations
53©MapR Technologies 2013- Confidential
users
things
54©MapR Technologies 2013- Confidential
users
thingtype 1
thingtype 2
55©MapR Technologies 2013- Confidential
56©MapR Technologies 2013- Confidential
Part 3:What about that worked example?
57©MapR Technologies 2013- Confidential
http://bit.ly/18vbbaT
58©MapR Technologies 2013- Confidential
SolRIndexerSolR
IndexerSolrindexing
Cooccurrence(Mahout)
Item meta-data
Indexshards
Complete history
Analyze with Map-Reduce
59©MapR Technologies 2013- Confidential
SolRIndexerSolR
IndexerSolrsearchWeb tier
Item meta-data
Indexshards
User history
Deploy with Conventional Search System
60©MapR Technologies 2013- Confidential
Me, Us
Ted Dunning, Chief Application Architect, MapRCommitter PMC member, Mahout, Zookeeper, DrillBought the beer at the first HUG
MapRDistributes more open source components for HadoopAdds major technology for performance, HA, industry standard API’s
InfoHash tag - #maprSee also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR