The Universal Recommender

The Big Idea
Universal Recommender

A LITTLE HISTORY:
MOTIVATION
• Coocurrence: Mahout 2012
• Factorized ALS: Mahout then Spark’s MLlib
• Experience with then current Recommender Tech
• Evaluation and Experiments
• Could only use “purchase” data threw out 100x view data
• No “realtime”
• too many edge cases, users that had no recommendations
• didn’t adapt to metadata/content of items
• Lots of discussions with Ted Dunning, Sean Owen, Sebastian
Schelter, Pat Ferrel (me)
• Cooccurrence and cross-cooccurrence led to many innovations

ANATOMY OF A RECOMMENDATION
PERSONALIZED
r = recommendations
hp = a user’s history of some action
(purchase for instance)
P = the history of all users’ primary action
rows are users, columns are items
(PtP) = compares column to column using
log-likelihood based correlation test
r = (PtP)hp

COOCCURRENCE WITH LLR
• Let’s call (PtP) an indicator matrix for some primary action like
purchase
• Rows = items, columns = items, element =
similarity/correlation score
• The score is row compared to column using a “similarity” or
“correlation” metric
• Log-Likelihood Ratio (LLR) finds important/correlating
cooccurrences and filters out the rest—a major improvement
in quality over simple cooccurrence or other similarity metrics.
• Experiments on real-world data show LLR is significantly
better than other similarity metrics
* http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf

LLR AND SIMILARITY METRICS
PRECISION (MAP@K)
Higher is better
MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10
Similarity Metrics
Mean Average Precision
Mahout Cooccurrence Recommender with E-Commerce Data
Cosine Tanimoto Log-likelihood

FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !

FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
• That’s exactly what a search engine
does!
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !

USER HISTORY + COOCCURRENCES
+ SEARCH = RECOMMENDATIONS
• The final calculation uses hp as the query on the Cooccurrence
Matrix (PtP), returns a ranked set of items
• Query is a “similarity” query, not relational or key based fetch
• Uses Search Engine as Cosine-based K-Nearest Neighbor
(KNN) Engine with norms and TF-IDF weighting
• Highly optimized for serving these queries in realtime
• Several (Solr, Elasticsearch) have High Availability, massively
scalable clustered auto-sharding features like the best of
NoSQL DBs.
r = (PtP)hp

THE UNIVERSAL RECOMMENDER:
THE BREAKTHROUGH IDEA
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
r = (PtP)hp
r = (PtP)hp + (PtV)hv + (PtC)hc + …

CORRELATED CROSS-OCCURRENCE
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
CROSS-OCCURRENCE
r = (PtP)hp

• Comparing the history of the primary action to other actions finds
actions that lead to the one you want to recommend
• Given strong data about user preferences on a general population
we can also use
• items clicked
• terms searched
• categories viewed
• items shared
• people followed
• items disliked (yes dislikes may predict likes)
• location
• device preference
• gender
• age bracket
• Virtually any anything we know about the population can be
tested for correlation and used to predict a particular users
preferences
CORRELATED CROSS-OCCURRENCE:
SO WHAT?

CORRELATED CROSS-OCCURRENCE;
ADDING CONTENT MODELS
• Collaborative Topic Filtering
• Use Latent Dirichlet Allocation (LDA) to model topics directly from the
textual content
• Calculate based on Word2Vec type word vectors instead of bag-of-
words analysis to boost quality
• Create cross-occurrence indicators from topics the user has preferred
• Repeat periodically
• Entity Preferences:
• Use a Named Entity Recognition (NER) system to find entities in
textual content
• Create cross-occurrence indicators for these entities
• Entities and Topics are long lived and richly describe user
interests, these are very good for use in the Universal
Recommender.

THE UNIVERSAL RECOMMENDER
ADDING CONTENT-BASED RECS
Indicators can also be based on content
similarity
(TTt) is a calculation that compares every 2
documents to each other and finds the most
similar—based upon content alone
r = (TTt)ht + l*L …

INDICATOR TYPES
• Cooccurrence
• Find the best indicator of a user preference for the item type to be recommended: examples are “buy”,
“read”, “video_watch”, “share”, “follow”, “like”.
• Cross-occurrence
• Item metadata as “user” preference, for example: treat item category as a user category-preferences
• Calculated from user actions on any data that may give information about user— category-preferences,
search terms, gender, location
• Create with Mahout-Samsara SimilarityAnalysis.cooccurrence
• Content or metadata
• Content text, tags, categories, description text, anything describing an item
• Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity
• Intrinsic
• Popularity rank, geo-location, anything describing an item
• Some may be derived from usage data like popularity rank, or hotness
• Is a known or specially calculated property of the item

AKA THE WHOLE ENCHILADA
“Universal” means one query on all indicators at once
Unified query:
purchase-correlator: users-history-of-purchases
view-correlator: users-history-of-views
category-correlator: users-history-of-categories-viewed
tags-correlator: users-history-of-purchases
geo-location-correlator: users-location
…
(TTt)ht + l*L …

AKA THE WHOLE ENCHILADA
“Universal” means one query on all correlators at once
Once indicators are indexed as search fields this entire
equation is a single query
Fast!
(TTt)ht + l*L …

BETTER USER COVERAGE
• Any number of user actions—entire user clickstream
• Metadata—from user profile or items
• Context—on-site, time, location
• Content—unstructured text or semi-structured
categorical
• Mixes any number of “indicators” to increase quality
or tune to specific context
• Solution to the “cold-start” problem—items with too
short a lifespan or new users with no history
• Can recommend to new users using
realtime history
• Can use new interaction data from
any user in realtime
• 95% implemented in Universal Recommender
v0.3.0—most current release
All Users
ALS or 1-action
Recommenders

POLISH THE APPLE
• Dithering for auto-optimize via explore-exploit:
Randomize some returned recs, if they are acted upon they become
part of the new training data and are more likely to be recommended
in the future
• Visibility control:
• Don’t show dups, blacklist items already shown
• Filter items the user has already seen
• Zero-downtime Deployment: deploy prediction server
once then hot-swap new index when ready.
• Generate some intrinsic indicators like hot, popular—
helps solve the “cold-start” problem
• Asymmetric train vs query—query with most recent user
data, train on all historical data

Architecture Based on
PredictionIO

UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
background
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
realtime
RECOMMENDATION SERVING
Spark-Mahout’s
Correlation Engine

LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
backgroundREALTIME
Spark-Mahout’s
Correlation Engine

LAMBDA ARCHITECTURE
Application
query and
recommendations
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
PredictionIO REST
Serving Component
Spark-Mahout’s
Correlation Engine
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
BACKGROUNDREALTIME

TECH STACK
• Hbase 1.X
• Postgres, MySQL, or other JDBC possible
• Spark 1.6.X
• Fast, massively scalable, seems like the “winner”
• HDFS 2.6—Hadoop Distributed File System
• Reiable, massively scalable, the defacto standard
• Spray
• Supplies REST endpoints, muti-threaded via Akka actors
• Elasticsearch 1.7.X or 2.X
• Reliable, massively scalable, fast
• Scala & Java 8
• Fits functional and oop programming style for productivity
• Stable, Scalable, High Availability, Well Supported

* The ES json query looks like this:
* {
* "size": 20
* "query": {
* "bool": {
* "should": [
* {
* "terms": {
* "rate": ["0", "67", "4"]
* }
* },
* {
* "terms": {
* "buy": ["0", "32"],
* "boost": 2
* }
* },
* { // categorical boosts
* "terms": {
* "category": ["cat1"],
* "boost": 1.05
* }
* }
* ],
* "must": [ // categorical filters
* {
* "terms": {
* "category": ["cat1"],
* "boost": 0
* }
* },
* {
* "must_not": [//blacklisted items
* {
* "ids": {
* "values": ["items-id1", "item-id2", ...]
* }
* },
* {
* "constant_score": {// date in query must fall between the expire and avqilable dates of an item
* "filter": {
* "range": {
* "availabledate": {
* "lte": "2015-08-30T12:24:41-07:00"
* }
* }
* },
* "boost": 0
* }
* },
* {
* "constant_score": {// date range filter in query must be between these item property values
* "filter": {
* "range" : {
* "expiredate" : {
* "gte": "2015-08-15T11:28:45.114-07:00"
* "lt": "2015-08-20T11:28:45.114-07:00"
* }
* }
* }, "boost": 0
* }
* },
* {
* "constant_score": { // this orders popular items for backfill
* "filter": {
* "match_all": {}
* },
* "boost": 0.000001 // must have as least a small number to be boostable
* }
* }
* }
* }
* }
*
An example Elasticsearch query on a multi-
field index created from the output of the CCO
engine. The index includes about 90% of the
data in the “whole enchilada” equation.
This executes in 50ms on a non-cached
cluster and ~26ms on an unoptimized cluster.

The Universal Recommender

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Universal Recommender

Similar to The Universal Recommender (20)

Recently uploaded

Recently uploaded (20)

The Universal Recommender