More Related Content Similar to How to Determine which Algorithms Really Matter Similar to How to Determine which Algorithms Really Matter (20) More from DataWorks Summit More from DataWorks Summit (20) How to Determine which Algorithms Really Matter1. © 2014 MapR Technologies 1
© MapR Technologies, confidential
Hadoop Summit 2014
Which Algorithms Really Matter?
2. © 2014 MapR Technologies 2
Me, Us
• Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
• MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
3. © 2014 MapR Technologies 4
Topic For Today
• What is important? What is not?
• Why?
• What is the difference from academic research?
• Some examples
4. © 2014 MapR Technologies 5
What is Important?
• Deployable
• Robust
• Transparent
• Skillset and mindset matched?
• Proportionate
5. © 2014 MapR Technologies 6
What is Important?
• Deployable
– Clever prototypes don’t count if they can’t be standardized
• Robust
• Transparent
• Skillset and mindset matched?
• Proportionate
6. © 2014 MapR Technologies 7
What is Important?
• Deployable
– Clever prototypes don’t count
• Robust
– Mishandling is common
• Transparent
– Will degradation be obvious?
• Skillset and mindset matched?
• Proportionate
7. © 2014 MapR Technologies 8
What is Important?
• Deployable
– Clever prototypes don’t count
• Robust
– Mishandling is common
• Transparent
– Will degradation be obvious?
• Skillset and mindset matched?
– How long will your fancy data scientist enjoy doing standard ops tasks?
• Proportionate
– Where is the highest value per minute of effort?
8. © 2014 MapR Technologies 9
Academic Goals vs Pragmatics
• Academic goals
– Reproducible
– Isolate theoretically important aspects
– Work on novel problems
• Pragmatics
– Highest net value
– Available data is constantly changing
– Diligence and consistency have larger impact than cleverness
– Many systems feed themselves, exploration and exploitation are both
important
– Engineering constraints on budget and schedule
9. © 2014 MapR Technologies 10
Example 1:
Making Recommendations Better
10. © 2014 MapR Technologies 11
Recommendation Advances
• What are the most important algorithmic advances in
recommendations over the last 10 years?
• Cooccurrence analysis?
• Matrix completion via factorization?
• Latent factor log-linear models?
• Temporal dynamics?
11. © 2014 MapR Technologies 12
The Winner – None of the Above
• What are the most important algorithmic advances in
recommendations over the last 10 years?
1. Result dithering
2. Anti-flood
12. © 2014 MapR Technologies 13
The Real Issues
• Exploration
• Diversity
• Speed
• Not the last fraction of a percent
13. © 2014 MapR Technologies 14
Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual
performance much better
14. © 2014 MapR Technologies 15
Result Dithering
• Dithering is used to re-order recommendation results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line performance worse
• Dithering also has a near perfect record of making actual
performance much better
“Made more difference than any other change”
15. © 2014 MapR Technologies 16
Simple Dithering Algorithm
• Generate synthetic score from log rank plus Gaussian
• Pick noise scale to provide desired level of mixing
• Typically
• Oh… use floor(t/T) as seed
s = logr + N(0,e)
e Î 0.4, 0.8[ ]
Dr µrexpe
16. © 2014 MapR Technologies 17
Example … ε = 0.5
1 2 6 5 3 4 13 16
1 2 3 8 5 7 6 34
1 4 3 2 6 7 11 10
1 2 4 3 15 7 13 19
1 6 2 3 4 16 9 5
1 2 3 5 24 7 17 13
1 2 3 4 6 12 5 14
2 1 3 5 7 6 4 17
4 1 2 7 3 9 8 5
2 1 5 3 4 7 13 6
3 1 5 4 2 7 8 6
2 1 3 4 7 12 17 16
17. © 2014 MapR Technologies 18
Example … ε = log 2 = 0.69
1 2 8 3 9 15 7 6
1 8 14 15 3 2 22 10
1 3 8 2 10 5 7 4
1 2 10 7 3 8 6 14
1 5 33 15 2 9 11 29
1 2 7 3 5 4 19 6
1 3 5 23 9 7 4 2
2 4 11 8 3 1 44 9
2 3 1 4 6 7 8 33
3 4 1 2 10 11 15 14
11 1 2 4 5 7 3 14
1 8 7 3 22 11 2 33
18. © 2014 MapR Technologies 19
Exploring The Second Page
19. © 2014 MapR Technologies 20
Lesson 1:
Exploration is good
20. © 2014 MapR Technologies 21
Example 2:
Bayesian Bandits
21. © 2014 MapR Technologies 22
Bayesian Bandits
• Based on Thompson sampling
• Very general sequential test
• Near optimal regret
• Trade-off exploration and exploitation
• Possibly best known solution for exploration/exploitation
• Incredibly simple
22. © 2014 MapR Technologies 23
Thompson Sampling
• Select each shell according to the probability that it is the best
• Probability that it is the best can be computed using posterior
• But I promised a simple answer
P(i is best) = I E[ri |q]= max
j
E[rj |q]
é
ëê
ù
ûúò P(q | D) dq
23. © 2014 MapR Technologies 24
Thompson Sampling – Take 2
• Sample θ
• Pick i to maximize reward
• Record result from using i
q ~P(q | D)
i = argmax
j
E[rj |q]
24. © 2014 MapR Technologies 25
Fast Convergence
11000 100 200 300 400 500 600 700 800 900 1000
0.12
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
n
regret
ε- greedy, ε = 0.05
Bayesian Bandit with Gamma- Normal
25. © 2014 MapR Technologies 26
Thompson Sampling on Ads
An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011
26. © 2014 MapR Technologies 27
Bayesian Bandits versus Result Dithering
• Many useful systems are difficult to frame in fully Bayesian form
• Thompson sampling cannot be applied without posterior
sampling
• Can still do useful exploration with dithering
• But better to use Thompson sampling if possible
27. © 2014 MapR Technologies 28
Lesson 2:
Exploration is pretty easy to
do and pays big benefits.
28. © 2014 MapR Technologies 29
Example 3:
On-line Clustering
29. © 2014 MapR Technologies 30
The Problem
• K-means clustering is useful for feature extraction or
compression
• At scale and at high dimension, the desirable number of clusters
increases
• Very large number of clusters may require more passes through
the data
• Super-linear scaling is generally infeasible
30. © 2014 MapR Technologies 31
The Solution
• Sketch-based algorithms produce a sketch of the data
• Streaming k-means uses adaptive dp-means to produce this
sketch in the form of many weighted centroids which
approximate the original distribution
• The size of the sketch grows very slowly with increasing data
size
• Many operations such as clustering are well behaved on
sketches
Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson.
Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.
33. © 2014 MapR Technologies 34
The Cluster Proximity Features
• Every point can be described by the nearest cluster
– 4.3 bits per point in this case
– Significant error that can be decreased (to a point) by increasing
number of clusters
• Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign
bit + 2 proximities)
– Error is negligible
– Unwinds the data into a simple representation
• Or we can increase the number of clusters (n fold increase adds
log n bits per point, decreases error by sqrt(n)
34. © 2014 MapR Technologies 35
Diagonalized Cluster Proximity
35. © 2014 MapR Technologies 36
Lots of Clusters Are Fine
36. © 2014 MapR Technologies 37
Typical k-means Failure
Selecting two seeds
here cannot be
fixed with Lloyds
Result is that these
two clusters get glued
together
37. © 2014 MapR Technologies 38
Streaming k-means Ideas
• By using a sketch with lots (k log N) of centroids, we avoid
pathological cases
• We still get a very good result if the sketch is created
– in one pass
– with approximate search
• In fact, adaptive dp-means works just fine
• In the end, the sketch can be used for clustering or …
38. © 2014 MapR Technologies 39
Lesson 3:
Sketches make big data small.
39. © 2014 MapR Technologies 40
Example 4:
Search Abuse
40. © 2014 MapR Technologies 41
Recommendation
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob Bob got an apple
41. © 2014 MapR Technologies 42
Recommendation
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob Bob got an apple. What else would Bob like?
42. © 2014 MapR Technologies 43
Recommendation
Alice got an apple and
a puppyAlice
Charles got a bicycleCharles
Bob A puppy!
43. © 2014 MapR Technologies 44
History Matrix: Users x Items
Alice
Bob
Charles
✔ ✔ ✔
✔ ✔
✔ ✔
44. © 2014 MapR Technologies 45
Co-Occurrence Matrix: Items x Items
-
1 2
1 1
1
1
2 1
0
0
0 0
Use LLR test to turn co-
occurrence into indicators of
interesting co-occurrence
45. © 2014 MapR Technologies 46
Indicator Matrix: Anomalous Co-Occurrence
✔
✔
46. © 2014 MapR Technologies 47
Co-occurrence Binary Matrix
1
1not
not
1
47. © 2014 MapR Technologies 48
Indicator Matrix: Anomalous Co-Occurrence
✔
✔
Result: The marked row will be added to the indicator field in the
item document…
48. © 2014 MapR Technologies 49
Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the
Solr document used to deploy the recommendation engine.
Note: data for the
indicator field is added
directly to meta-data
for a document in Solr
index. You don’t need
to create a separate
index for the
indicators.
49. © 2014 MapR Technologies 50
Internals of the Recommender Engine
50
50. © 2014 MapR Technologies 51
Internals of the Recommender Engine
51
51. © 2014 MapR Technologies 52
Looking Inside LucidWorks
What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
52
Real-time recommendation query and results: Evaluation
53. © 2014 MapR Technologies 54
Lesson 4:
Recursive search abuse pays
Search can implement recs
Which can implement search
56. © 2014 MapR Technologies 57
Me, Us
• Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
• MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
Editor's Notes TED: consider using the word “interesting” instead of “anomalous”… people may think you are talking about anomaly detection… Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence Only important co-occurrence is puppy follows apple *Take that row of matrix and combine with all the meta data we might have…
*Important thing to get from the co-occurrence matrix is this indicator..
Cool thing: analogous to what a lot of recommendation engines do
*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)
Find the useful co-occurrence and get rid of the rest.
Sparsify and get the anomalous co-occurrence Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence.
*Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question This is a diagnostics window in the LucidWorks Solr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine.
In other words, do these indicator artists represented by their indicator Id make reasonable recommendations
Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?