5. v
Four Sided Marketplace
Customers Shoppers
Products
(Advertisers)
Search
Advertising
Shopping
Delivery
Customer Service
Inventory
Picking
Loyalty
Stores
(Retailers)
6. v
What this talk is about
A new collaborative filtering algorithm
• A case-study
• live end to end recommendation system
• one person month
• hundreds of millions of transactions
7. v
Online grocery vs Traditional e-commerce
Week 3Week 2
Online
Grocery
Week 1
Traditional
e-commerce
14. v
“Frequently bought with” Recommendations
Not necessarily
consumed together
Help customers shop for
complementary products
and try alternatives
Probably
consumed together
17. v
Learning from feedback
Traditionally collaborative filtering used explicit feedback to predict ratings
There may still bias in whether the user chooses to rate
Explicit Feedback Implicit Feedback
18. v
Learning from Explicit Feedback
• Explicit feedback may be more reliable but there is much less of it
• Less reliable if users rate based on aspirations instead of true preferences
vs
19. v
Implicit Feedback - trade-off quality and quantity
Strengthofevidence
Number of Events
20. v
Architecture
Event Data Score and
Select Top N
(Spark/EMR)
User/Product Factors
Event Data
Run-time
ranking for
diversity
Candidate
Selection
ALS
(Spark/EMR)
Generate
User-Product
Matrix
21. v
A Matrix Factorization Formulation for Implicit Feedback
N Products
MUsers
1
-
-
9
-
-
-
3
20
User Product Matrix
R; (M x N)
1
0
0
1
0
0
0
1
1binary
preferences
Preference Matrix R;
(M x N)
“Collaborative Filtering for Implicit Feedback” - Hu et. al
22. v
A Matrix Factorization Formulation for Implicit Feedback
~
Y
XT
Product Factors
(k x N)
User Factors
(M x k)
1
0
0
1
0
0
0
1
1
x
Preference Matrix R;
(M x N)
23. v
Matrix Factorization from Implicit Feedback - The Intuition
#Purchases Preference p Confidence c
0 0 Low
1 1 Low
>>1 1 High
• Confidence increases linearly with purchases r
• c = 1 + alpha * r
• alpha controls the marginal rate of learning from user purchases
• Key questions
• How should the unobserved events be treated
• How should one trade-off observed and the unobserved
24. v
Regularized Weighted Squared Loss
Confidence
User
Factors
Matrix
Product
Factors
Matrix
Preference
Matrix Regularization
Solve using Alternating Least Squares
26. v
Spark ALS Hyper-parameter Tuning
• rank k - diminishing returns after 150
• alpha - controls rate of learning from observed events
• iterations - ALS tends to converge within 5, seldom more than 10
• lambda - regularization parameter
28. v
Scoring user and products
With millions of products and users, scoring every (user, product) pair is prohibitive
Two goals in selecting products to score
• Products that have an a priori high purchase rate (popular)
• Long tail which have not been discovered
Exclude previously purchased products
~
29. v
Candidate Product Selection
We start with simple stratified sampling
For each user, score N products
Sample h products from Head
Sample t products from tail
N ~ 10000
h ~ 3000
t ~7000
31. v
Offline evaluation
• Ideally we want to evaluate user response to recommendations
• But we will only know this from an live A/B test
• Recall based metrics are an offline proxy (albeit not the best)
• Recall: “Fraction of purchased products covered among Top N
recommendations”
• We only use this for hyper parameter tuning
32. v
Tuning Spark For ALS
Understanding Spark execution model and its implementation of ALS helps
• Training is communication heavy1
, set partitions <= #CPU cores
• Scoring is memory intensive
• Broad guidelines2
• Limit executor memory to 64GB
• 5 cores per executor
• Set executors based on data size
1 - http://apache-spark-user-list.1001560.n3.nabble.com/Error-No-space-left-on-device-tp9887p9896.html
2 - http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
34. v
Online ranking for diversity
“Diversity within sessions, Novelty across sessions”
“Establish trust in a fresh and comprehensive catalog”
“Less is more”
Cached list of
~1000 products
per user
Final list of
<100 products
promote diversity
35. v
Diversity
Top K products - ranked by score
Rank product categories by their median product score
> > >
36. v
Weighted sampling for diversity
Sample category in
proportion to score
Within category, sample in
proportion to product score
38. v
A/B Test Results
• Statistically significant increases
• Items per order
• GMV per order
• Total product sales spread over more
categories
39. v
Ok, we have a recommendation system
Where do we go from here?
40. v
What else do you do with user and product factors?
Score (user, product) pair on demand
Get Top N similar users
Get Top N similar product
As features in other models
41. v
Products similar to “Haigs Spicy Hummus"
More “Spicy Hummus”
Spicy Salsas
Generated using Approximate Nearest Neighbor
(“annoy” from Spotify)
42. v
Ensembles
Use different types of evidence and/or product metadata to easily create ensembles
User x Products Purchased
User x Products Viewed
User x Brands Purchased
Model or Linear
Combination
…
43. v
What next
• Improve candidate selection by leverage user and product factors
• Make recommendations more contextual
• Address cold-start problems, particularly for users
• Explain recommendations (“Because you did X”)
45. v
Fulfillment in Traditional E-commerce
• Manage inventory in warehouses optimized for quick
fulfillment
• Users only specify the “What” they want
• Disallow users from ordering out of stock products
• Set expectations
• “3 day shipping” but will ship in 10 business days
46. v
Fulfillment for on-demand delivery from local retailers
• Shoppers navigate a complex environment where products
• may have run out
• may be misplaced
• may be damaged
• User specifies “What”, “When” and “Where from”
• Improvise under uncertainty
47. v
Addressing new challenges in on-demand delivery
• Tight technology integrations help improve tracking of in-store availability
• Complemented by predictive models that estimate availability in real-time
• Last minute out of stocks can still happen
49. v
What makes a replacement acceptable?
Flavor PackingSizeBrand Price
• Several product attributes matter
• Context matters, might benefit from personalization
• Must scale to millions of products
• Not always symmetric
• May be ok to replace X with gluten free X but not the other way around
Diet
Info
50. v
• Shoppers are trained to pick replacements
• But shoppers can benefit from algorithmic suggestions
• Many unfamiliar products in a vast catalog
• Validation for common products
• Finding replacements fast improves operational efficiency
Replacement Recommendations for Shoppers
51. v
• Customers can specify replacements while placing the order
• Can choose to communicate with the shopper in store to verify
Replacement Recommendations for Customers
52. v
How do we algorithmically generate replacements?