SlideShare a Scribd company logo
1 of 108
Download to read offline
Recommender Systems 102
Beyond the (usual) user-item matrixā€”implementation & results
DataScience SG Meetup Jan 2020
About me
Ā§ Lead Data Scientist @ health-tech startup
- Early detection of preventable diseases
- Healthcare resource allocation
Ā§ Previously: VP, Data Science @ Lazada
- E-commerce ML systems
- Facilitated integration with Alibaba
Ā§ More at https://eugeneyan.com
RecSys
Overview
Figure 1. Obligatory (clichƩ) recsys representation
Definition: Use
behavior data to
predict what other
users will like based on
user/item similarity
Topics*
Ā§ Data Acquisition, Preparation, Split, etc.
Ā§ Conventional Baseline
Ā§ Applying Graph and NLP approaches
* Implementation and results discussed throughout
Laying the Groundwork
Data acquisition, preparation, train-val-split, etc.
Data
Acquisition
http://jmcauley.ucsd.edu/data/amazon/links.html
{
"asin": "0000031852",
"title": "Girls Ballet Tutu Zebra Hot Pink",
"price": 3.17,
"imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg",
"relatedā€:
{ "also_bought":[ "B00JHONN1S",
"B002BZX8Z6",
"B00D2K1M3O",
...
"B007R2RM8W"
],
"also_viewed":[ "B002BZX8Z6",
"B00JHONN1S",
"B008F0SU0Y",
...
"B00BFXLZ8M"
],
"bought_together":[ "B002BZX8Z6"
]
},
"salesRank":
{ "Toys & Games":211836
},
"brand": "Coxlures",
"categories":[
[ "Sports & Outdoors",
"Other Sports",
"Dance"
]
]
}
{
"asin": "0000031852",
"title": "Girls Ballet Tutu Zebra Hot Pink",
"price": 3.17,
"imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg",
"relatedā€:
{ "also_bought":[ "B00JHONN1S",
"B002BZX8Z6",
"B00D2K1M3O",
...
"B007R2RM8W"
],
"also_viewed":[ "B002BZX8Z6",
"B00JHONN1S",
"B008F0SU0Y",
...
"B00BFXLZ8M"
],
"bought_together":[ "B002BZX8Z6"
]
},
"salesRank":
{ "Toys & Games":211836
},
"brand": "Coxlures",
"categories":[
[ "Sports & Outdoors",
"Other Sports",
"Dance"
]
]
}
Parsing
json
Ā§ Require parsing json to tabular form
Ā§ Fairly large, with the largest having 142.8
million rows and 20gb on disk
Ā§ Not able to load into ram fully on regular
laptop (16gb ram)
def parse_json_to_csv(read_path: str, write_path: str) -> None:
csv_writer = csv.writer(open(write_path, 'w'))
i = 0
for d in parse(read_path):
if i == 0:
header = d.keys()
csv_writer.writerow(header)
csv_writer.writerow(d.values().lower())
i += 1
if i % 10000 == 0:
logger.info('Rows processed: {:,}'.format(i))
logger.info('Csv saved to {}'.format(write_path))
Getting
product-
pairs
Ā§ Evaluate string and convert to dictionary
Ā§ Get product-pairs for each relationship
Ā§ Explode each product-pair into a row
Getting
product-
pairs
Ā§ Evaluate string and convert to dictionary
Ā§ Get product-pairs for each relationship
Ā§ Explode each product-pair into a row
product1 | product2 | relationship
--------------------------------------
B001T9NUFS | B003AVEU6G | also_viewed
0000031895 | B002R0FA24 | also_viewed
B007ZN5Y56 | B005C4Y4F6 | also_viewed
0000031909 | B00538F5OK | also_bought
B00CYBULSO | B00B608000 | also_bought
B004FOEEHC | B00D9C32NI | bought_together
Table 1. Product-pairs and relationships (sample)
Scoring
product-
pairs
Ā§ Simple way: Assign 1.0 if product-pair has
any/multiple relationships, 0.0 otherwise
Ā§ My approach: Score relationships differently*
- Bought together: 1.2, Also bought: 1.0, Also viewed: 0.5
Scoring
product-
pairs
Ā§ Simple way: Assign 1.0 if product-pair has
any/multiple relationships, 0.0 otherwise
Ā§ My approach: Score relationships differently*
- Bought together: 1.2, Also bought: 1.0, Also viewed: 0.5
product1 | product2 | weight
--------------------------------
B001T9NUFS | B003AVEU6G | 0.5
0000031895 | B002R0FA24 | 0.5
B007ZN5Y56 | B005C4Y4F6 | 0.5
0000031909 | B00538F5OK | 1.0
B00CYBULSO | B00B608000 | 1.0
B004FOEEHC | B00D9C32NI | 1.2
Table 2. Product-pairs and weights (sample)
* Assume relationships are symmetrical
Electronics Books
Unique products 418,749 1,948,370
Product-pairs 4,005,262 26,595,848
Sparsity 0.9999 0.9999
š‘†š‘š‘Žš‘Ÿš‘ š‘–š‘”š‘¦ = 1 āˆ’
š¶š‘œš‘¢š‘›š‘”(š‘›š‘œš‘›š‘§š‘’š‘Ÿš‘œ š‘’š‘™š‘’š‘šš‘’š‘›š‘”š‘ )
š¶š‘œš‘¢š‘›š‘”(š‘”š‘œš‘”š‘Žš‘™ š‘’š‘™š‘’š‘šš‘’š‘›š‘”š‘ )
Table 3. Unique products and sparsity for electronics and books
Train-Validation Split
Or how to create negative samples (at scale)
Splitting
the data
Ā§ Random split: 2/3 train, 1/3 validation
Ā§ Easy, right?
Ā§ But our dataset only consists of positive
product-pairsā€”how do we validate?
Splitting
the data
Ā§ Random split: 2/3 train, 1/3 validation
Ā§ Easy, right?
Ā§ Not so fast! Our dataset only has positive
product-pairsā€”how do we validate?
Creating
negative
samples
Ā§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million timesā€”very slow!
Ā§ Hack: Add products in array, shuffle, slice
to sample; shuffle when exhaustedā€”fast!
Creating
negative
samples
Ā§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million timesā€”very slow!
Ā§ Hack: Add products in array, shuffle, slice to
sample; re-shuffle when exhaustedā€”fast!
Creating
negative
samples
Ā§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million timesā€”very slow!
Ā§ Hack: Add products in array, shuffle, slice to
sample; re-shuffle when exhaustedā€”fast!
products
----------
B001T9NUFS
0000031895
B007ZN5Y56
0000031909
B00CYBULSO
B004FOEEHC
Negative product-pair 1
Creating
negative
samples
Ā§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million timesā€”very slow!
Ā§ Hack: Add products in array, shuffle, slice to
sample; re-shuffle when exhaustedā€”fast!
products
----------
B001T9NUFS
0000031895
B007ZN5Y56
0000031909
B00CYBULSO
B004FOEEHC
Negative product-pair 2
Creating
negative
samples
Ā§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million timesā€”very slow!
Ā§ Hack: Add products in array, shuffle, slice to
sample; re-shuffle when exhaustedā€”fast!
products
----------
B001T9NUFS
0000031895
B007ZN5Y56
0000031909
B00CYBULSO
B004FOEEHC
Negative product-pair 3
Matrix Factorization
Letā€™s start with a baseline
Batch MF
Ā§ Common approach 1: Load matrix in
memory; apply Python package (e.g.,
scipy.svd, surprise, etc.)
Ā§ Common approach 2: Run on cluster with
SparkML Alternating Least Squares
Ā§ Very resource intensive!
- Is there a smarter way, given the sparse data?
Batch MF
Ā§ Common approach 1: Load matrix in
memory; apply Python package (e.g.,
scipy.svd, surprise, etc.)
Ā§ Common approach 2: Run on cluster with
SparkML Alternating Least Squares
Ā§ Very resource intensive!
- Is there a smarter way, given the sparse data?
Iterative
MF
Ā§ Only load (or read from disk) product-pairs,
instead of entire matrix that contains zeros
Ā§ Matrix factorization by iterating through
each product-pair
Iterative
MF
(numeric
labels,
step 0)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (multiply embeddings and sum)
prediction = sum(product1_emb * product2_emb, dim=1)
# Minimize loss
loss = MeanSquaredErrorLoss(prediction, label)
loss.backward()
optimizer.step()
Iterative
MF
(numeric
labels,
step 1)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sum(product1_emb * product2_emb, dim=1)
# Minimize loss
loss = MeanSquaredErrorLoss(prediction, label)
loss.backward()
optimizer.step()
Iterative
MF
(numeric
labels,
step 2)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sum(product1_emb * product2_emb, dim=1)
# Minimize loss
loss = MeanSquaredErrorLoss(prediction, label)
loss.backward()
optimizer.step()
Iterative
MF
(numeric
labels,
step 3)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sum(product1_emb * product2_emb, dim=1)
# Minimize loss
loss = MeanSquaredErrorLoss(prediction, label)
loss.backward()
optimizer.step()
Iterative
MF
(binary
labels)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sig(sum(product1_emb * product2_emb, dim=1))
# Minimize loss
loss = BinaryCrossEntropyLoss(prediction, label)
loss.backward()
optimizer.step()
Regularize!
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sig(sum(product1_emb * product2_emb, dim=1))
l2_reg = lambda * sum(embedding.weight ** 2)
# Minimize loss
loss = BinaryCrossEntropyLoss(prediction, label)
loss += l2_reg
loss.backward()
optimizer.step()
Training
Schedule
Figure 2. Cosine Annealing training schedule
Results
(MF)
Binary labels
AUC-ROC = 0.8083
Time for 5 epochs = 45 min
Figure 3a and 3b. Precision recall curves for Matrix Factorization
Results
(MF)
Binary labels
AUC-ROC = 0.8083
Time for 5 epochs = 45 min
Figure 3a and 3b. Precision recall curves for Matrix Factorization
Results
(MF)
Binary labels
AUC-ROC = 0.8083
Time for 5 epochs = 45 min
Continuous labels
AUC-ROC = 0.9225
Time for 5 epochs = 45 min
Figure 3a and 3b. Precision recall curves for Matrix Factorization
Results
(MF)
Figure 3a and 3b. Precision recall curves for Matrix Factorization
ā€Cliff of Deathā€
Learning
curve
(MF)
Figure 4. AUC-ROC across epochs for matrix factorization; Each time learning rate is
reset, the model seems to ā€forgetā€, causing AUC-ROC to revert to ~0.5.
Also, a single epoch seems sufficient
Matrix Factorization + bias
Incremental improvement on the baseline
Adding
bias
Ā§ What if a product is generally popular or
unpopular?
Ā§ Learn a bias factor (i.e., single number for
each product)
Results
(MF-bias)
Binary labels
AUC-ROC = 0.7951
Time for 5 epochs = 45 min
Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
Results
(MF-bias)
Binary labels
AUC-ROC = 0.7951
Time for 5 epochs = 45 min
Continuous labels
AUC-ROC = 0.8319
Time for 5 epochs = 45 min
Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
Results
(MF-bias)
Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
More
ā€œproduction
friendlyā€
Off the Beaten Path
Natural language processing (ā€œNLPā€) and Graphs in RecSys
Word2Vec
Ā§ In 2013, two seminal papers by Tomas
Mikolov on Word2Vec (ā€w2vā€)
Ā§ Demonstrated w2v could learn semantic
and syntactic word vector representations
Ā§ TL; DR: Converts words into numbers (array)
DeepWalk
Ā§ Unsupervised learning of representations of
nodes (i.e., vertices) in a social network
Ā§ Generate sequences from random walks
on (social) graph
Ā§ Learn vector representations of nodes
(e.g., profiles, content)
How do
NLP and
Graphs
matter?
How do
NLP and
Graphs
matter?
Ā§ Create graph from product-pairs + weights
Ā§ Generate sequences from graph (via
random walk)
Ā§ Learn product embeddings (via word2vec)
Ā§ Recommend based on embedding similarity
(e.g., cosine similarity, dot product)
How do
NLP and
Graphs
matter?
Ā§ Create graph from product-pairs + weights
Ā§ Generate sequences from graph (via
random walk)
Ā§ Learn product embeddings (via word2vec)
Ā§ Recommend based on embedding similarity
(e.g., cosine similarity, dot product)
How do
NLP and
Graphs
matter?
Ā§ Create graph from product-pairs + weights
Ā§ Generate sequences from graph (via
random walk)
Ā§ Learn product embeddings (via word2vec)
Ā§ Recommend based on embedding similarity
(e.g., cosine similarity, dot product)
How do
NLP and
Graphs
matter?
Ā§ Create graph from product-pairs + weights
Ā§ Generate sequences from graph (via
random walk)
Ā§ Learn product embeddings (via word2vec)
Ā§ Recommend based on embedding similarity
(e.g., cosine similarity, dot product)
More groundwork
Generating graphs and sequences
Creating a
product
graph
Ā§ We have product-pairs and weights
- These are our graph edges
Ā§ Create a weighted graph with networkx
- Each graph edge is given a numerical weight,
instead of all edges having same weight
product1 | product2 | weight
--------------------------------
B001T9NUFS | B003AVEU6G | 0.5
0000031895 | B002R0FA24 | 0.5
B007ZN5Y56 | B005C4Y4F6 | 0.5
0000031909 | B00538F5OK | 1.0
B00CYBULSO | B00B608000 | 1.1
B004FOEEHC | B00D9C32NI | 1.2
Table 2. Product-pairs and weights
Random
Walks
Ā§ Direct approach: Traverse networkx graph
- For 10 sequences of length 10 for a starting node,
need to traverse 100 times
- 2 mil nodes for books graph = 200 mil queries
- Very slow and memory intensive
Ā§ Hack: Work with transition probabilities
Random
Walks
Ā§ Direct approach: Traverse networkx graph
- For 10 sequences of length 10 for a starting node,
need to traverse 100 times
- 2 mil nodes for books graph = 200 mil queries
- Very slow and memory intensive
Ā§ Hack: Work directly on transition probabilities
Random
Walks
(Nodes and
edges)
1
2
4
5
3
1
1
1
2
3
Random
Walks
(Weighted-
adjacency
matrix)
Product1 Product2 Product3 Product4 Product5
Product1 1 1 3
Product2 1 1
Product3 1 2
Product4 3 2
Product5 1
Random
Walks
(Transition
matrix)
Product1 Product2 Product3 Product4 Product5
Product1 .2 .2 .6
Product2 .5 .5
Product3 .33 .67
Product4 .6 .4
Product5 1.0
Random
Walks
(Transition
matrix)
Product1 Product2 Product3 Product4 Product5
Product1 .2 .2 .6
Product2 .5 .5
Product3 .33 .67
Product4 .6 .4
Product5 1.0
Transition-probability(Product3)
B001T9NUFS B003AVEU6G B005C4Y4F6 B007ZN5Y56 ... B007ZN5Y56
0000031895 B00538F5OK B004FOEEHC B001T9NUFS ... 0000031895
B005C4Y4F6 0000031909 B00CYBULSO B003AVEU6G ... B00D9C32NI
B00CYBULSO B001T9NUFS B002R0FA24 B00CYBULSO ... B007ZN5Y56
B004FOEEHC B00CYBULSO B001T9NUFS B002R0FA24 ... B00B608000
...
...
...
0000031909 B00B608000 B00D9C32NI B00CYBULSO ... B007ZN5Y56
Length of sequence (10)
No. of nodes
(420k) * samples
per node (10)
Pre-canned Node2Vec
Readily available open-sourced implementations
Node2Vec
Ā§ Seemed to work out of the box
- Just need to provide edges
- Uses networkx and gensim under the hood
Ā§ But very memory intensive and slow
- Could not run to completion even on 64gb ram
https://github.com/aditya-grover/node2vec
Gensim Word2Vec
Using a trusted package as baseline
Gensim
w2v
Ā§ Very easy to use
- Takes in a list of sequences
- Can be multithreaded
- CPU-only
Ā§ Fastest to complete 5 epochs
Results
(gensim
w2v)
All products
AUC-ROC = 0.9082
Time for 5 epochs = 2.58 min
Figure 6a and 6b. Precision recall curves for gensim.word2vec
Results
(gensim
w2v)
All products
AUC-ROC = 0.9082
Time for 5 epochs = 2.58 min
Seen products only
AUC-ROC = 0.9735
Time for 5 epochs = 2.58 min
Figure 6a and 6b. Precision recall curves for gensim.word2vec
Results
(gensim
w2v)
Figure 6a and 6b. Precision recall curves for gensim.word2vec
Unseen products
without embeddings
Building w2v from Scratch
To plot learning curves and extend it
Data
Loader
Ā§ Input sequences instead of product-pairs
Ā§ Implements two features from w2v papers
- Subsampling of frequent words
- Negative sampling
Data
Loader
(sub-
sampling)
Ā§ Drop out words of higher frequency
- Frequency of 0.0026 = 0.0 dropout
- Frequency of 0.00746 = 0.5 dropout
- Frequency of 1.0 = 0.977 dropout
Ā§ Accelerated learning and improved
vectors of rare words
š·š‘Ÿš‘œš‘š‘œš‘¢š‘” š‘ƒš‘Ÿš‘œš‘ š‘¤š‘œš‘Ÿš‘‘ = 1 āˆ’
š¹š‘Ÿš‘’š‘ž š‘¤š‘œš‘Ÿš‘‘
0.001
+ 1 Ɨ
0.001
š¹š‘Ÿš‘’š‘ž(š‘¤š‘œš‘Ÿš‘‘)
Data
Loader
(sub-
sampling)
Ā§ Drop out words of higher frequency
- Frequency of 0.0026 = 0.0 dropout
- Frequency of 0.00746 = 0.5 dropout
- Frequency of 1.0 = 0.977 dropout
Ā§ Accelerated learning and improved
vectors of rare words
š·š‘Ÿš‘œš‘š‘œš‘¢š‘” š‘ƒš‘Ÿš‘œš‘ š‘¤š‘œš‘Ÿš‘‘ = 1 āˆ’
š¹š‘Ÿš‘’š‘ž š‘¤š‘œš‘Ÿš‘‘
0.001
+ 1 Ɨ
0.001
š¹š‘Ÿš‘’š‘ž(š‘¤š‘œš‘Ÿš‘‘)
Data
Loader
(Negative
sampling)
Ā§ Original skip-gram ends with SoftMax
- If vocab = 10k words, embedding dim = 128,
1.28 million weights to updateā€”expensive!
- In RecSys, ā€vocabā€ in the millions
Ā§ Negative sampling
- Only modify weights of negative pair samples
- If 6 pairs (1 pos, 5 neg) and 1 mil products, only
update 0.0006 weightsā€”efficient!
Data
Loader
(Negative
sampling)
Ā§ Original skip-gram ends with SoftMax
- If vocab = 10k words, embedding dim = 128,
1.28 million weights to updateā€”expensive!
- In RecSys, ā€vocabā€ in the millions
Ā§ Negative sampling
- Only modify weights of negative pair samples
- If 6 pairs (1 pos, 5 neg) and 1 mil products, only
update 0.0006% weightsā€”very efficient!
PyTorch
Word2Vec
(step 0)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs
score = torch.mul(emb_center, emb_context)
score = torch.sum(score, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
PyTorch
Word2Vec
(step 1)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs
score = torch.mul(emb_center, emb_context)
score = torch.sum(score, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
PyTorch
Word2Vec
(step 2)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs (interaction term and sum)
score = torch.sum(emb_center * emb_context, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
PyTorch
Word2Vec
(step 3)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs (interaction term and sum)
score = torch.sum(emb_center * emb_context, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs (batch interaction term and sum)
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
PyTorch
Word2Vec
(step 4)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs
score = torch.sum(emb_center * emb_context, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
Results
(w2v)
Figure 7a and 7b. Precision recall curves for PyTorch Word2Vec
All products
AUC-ROC = 0.9554
Time for 5 epochs = 23.63 min
Results
(w2v)
Figure 7a and 7b. Precision recall curves for PyTorch Word2Vec
All products
AUC-ROC = 0.9554
Time for 5 epochs = 23.63 min
Seen products only
AUC-ROC = 0.9855
Time for 5 epochs = 23.63 min
Learning
curve
(w2v)
Figure 8. AUC-ROC across epochs for word2vec; a single epoch seems sufficient
Overall
results so
far
Ā§ Improvement on gensim.word2vec and
Alibaba paper
All products Seen products only
PyTorch MF 0.7951 -
Gensim w2v 0.9082 0.9735
PyTorch w2v 0.9554 0.9855
Alibaba Paper* 0.9327 -
* Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (https://arxiv.org/abs/1803.02349)
Table 4. AUC-ROC across various implementations
Adding side info to w2v
To help solve the cold start problem
Extending
w2v
Ā§ For each product, we have information like
category, brand, price group, etc.
- Why not add this when learning embeddings?
B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
Extending
w2v
Ā§ For each product, we have information like
category, brand, price group, etc.
- Why not add this when learning embeddings?
B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
Extending
w2v
Ā§ For each product, we have information like
category, brand, price group, etc.
- Why not add this when learning embeddings?
B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
Television Sound bar Lamp Standing Fan
Sony Sony Phillips Dyson
500 ā€“ 600 200 ā€“ 300 50 ā€“ 75 300 - 400
Extending
w2v
Ā§ For each product, we have information like
category, brand, price group, etc.
- Why not add this when learning embeddings?
Ā§ Alibaba paper reported AUC-ROC
improvement from 0.9327 to 0.9575
B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
Television Sound bar Lamp Standing Fan
Sony Sony Phillips Dyson
500 ā€“ 600 200 ā€“ 300 50 ā€“ 75 300 - 400
Weighting
side info
Ā§ Two version were implemented
Ā§ 1: Equal-weighted average of embeddings
Ā§ 2: Learn weightage for each embedding
and applying weighted average
Learning
curve
(w2v with
side info)
Figure 9. AUC-ROC across epochs for word2vec with side information
Why
doesnā€™t it
work?!
Ā§ Perhaps due to sparsity of metadata
- Of 418,749 electronics, metadata available for
162,023 (39%); Of these, brand was 51% empty
Ā§ But I assumed the weights of the (useless)
embeddings would be learntā€” ĀÆ_(惄)_/ĀÆ
Ā§ An example of more data ā‰  better
Why
doesnā€™t it
work?!
Ā§ Perhaps due to sparsity of metadata
- Of 418,749 electronics, metadata available for
162,023 (39%); Of these, brand was 51% empty
Ā§ But I assumed the weights of the (useless)
embeddings would be learntā€” ĀÆ_(惄)_/ĀÆ
Ā§ An example of more data ā‰  better
Why w2v > MF?
Is it skip-gram? Or sequences?
Mixing it
up to pull
it apart
Ā§ Why does w2v perform so much better?
Ā§ For the fun of it, lets use the MF-bias model
with sequence data (used in w2v)
Results &
learning
curve
Figure 10a and 10b. Precision recall curve and learning curve
for PyTorch MF-bias with sequences
All products
AUC-ROC = 0.9320
Time for 5 epochs = 70.39 min
Further Extensions
What Airbnb, Facebook, and Uber are doing
Embed
everything
Ā§ Building user embeddings in the same vector
space as products (Airbnb)
- Train user embeddings based on interactions with
products (e.g., click, ignore, purchase)
Ā§ Embed all discrete features and just learn
similarities (Facebook)
Ā§ Graph Neural Networks for embeddings;
node neighbors as representation (Uber Eats)
Key Takeaways
Last two tables, I promise
Overall
results
(electronics)
All products
Seen products
only
Runtime (min)
PyTorch MF 0.7951 - 45
Gensim w2v 0.9082 0.9735 2.58
PyTorch w2v 0.9554 0.9855 23.63
PyTorch w2v
with side info
NA NA NA
PyTorch MF with
sequences
0.9320 - 70.39
Alibaba Paper* 0.9327 - -
* Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (https://arxiv.org/abs/1803.02349)
Table 5. AUC-ROC across various implementations (electronics)
Overall
results
(books)
All products
Seen products
only
Runtime (min)
PyTorch MF 0.4996 - 1353.12
Gensim w2v 0.9701 0.9892 16.24
PyTorch w2v 0.9775 - 122.66
PyTorch w2v
with side info
NA NA NA
PyTorch MF with
sequences
0.7196 - 1393.08
Table 6. AUC-ROC across various implementations (books)
Ā§ Donā€™t just look at numeric metricsā€”plot some curves!
- Especially if you need some arbitrary threshold (i.e., classification)
Ā§ Matrix Factorization is an okay-ish baseline
Ā§ Word2vec is a great baseline
Ā§ Training on sequences is epic
Ā§ Donā€™t just look at numeric metricsā€”plot some curves!
- Especially if you need some arbitrary threshold (i.e., classification)
Ā§ Matrix Factorization is an okay-ish baseline
Ā§ Word2vec is a great baseline
Ā§ Training on sequences is epic
Ā§ Donā€™t just look at numeric metricsā€”plot some curves!
- Especially if you need some arbitrary threshold (i.e., classification)
Ā§ Matrix Factorization is an okay-ish baseline
Ā§ Word2vec is a great baseline
Ā§ Training on sequences is epic
Ā§ Donā€™t just look at numeric metricsā€”plot some curves!
- Especially if you need some arbitrary threshold (i.e., classification)
Ā§ Matrix Factorization is an okay-ish baseline
Ā§ Word2vec is a great baseline
Ā§ Training on sequences is epic
Thank you!
eugene@eugeneyan.com
References
McAuley, J., Targett, C., Shi, Q., & Van Den Hengel, A. (2015, August). Image-based
recommendations on styles and substitutes. In Proceedings of the 38th International ACM
SIGIR Conference on Research and Development in Information Retrieval (pp. 43-52). ACM.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed
representations of words and phrases and their compositionality. In Advances in neural
information processing systems (pp. 3111-3119).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.
Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social
representations. In Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 701-710). ACM.
Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks.
In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery
and data mining (pp. 855-864). ACM.
References
Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B., & Lee, D. L. (2018, July). Billion-scale
commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the
24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp.
839-848). ACM.
Grbovic, M., & Cheng, H. (2018, July). Real-time personalization using embeddings for search
ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (pp. 311-320). ACM.
Wu, L. Y., Fisch, A., Chopra, S., Adams, K., Bordes, A., & Weston, J. (2018, April). Starspace:
Embed all the things!. In Thirty-Second AAAI Conference on Artificial Intelligence.
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations,
https://eng.uber.com/uber-eats-graph-learning/, retrieved 10 Jan 2020

More Related Content

What's hot

Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Bernard Marr
Ā 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
Ā 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
Ā 
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ..."Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...Edge AI and Vision Alliance
Ā 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
Ā 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedYugal Kumar
Ā 
Graph databases
Graph databasesGraph databases
Graph databasesVinoth Kannan
Ā 
PostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | EdurekaPostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | EdurekaEdureka!
Ā 
Content Management with MongoDB by Mark Helmstetter
 Content Management with MongoDB by Mark Helmstetter Content Management with MongoDB by Mark Helmstetter
Content Management with MongoDB by Mark HelmstetterMongoDB
Ā 
Functional Domain Modeling - The ZIO 2 Way
Functional Domain Modeling - The ZIO 2 WayFunctional Domain Modeling - The ZIO 2 Way
Functional Domain Modeling - The ZIO 2 WayDebasish Ghosh
Ā 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
Ā 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
Ā 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsSC5.io
Ā 
Zipline: Airbnbā€™s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnbā€™s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnbā€™s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnbā€™s Machine Learning Data Management Platform with Nikhil Simha...Databricks
Ā 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
Ā 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Vicente Orjales
Ā 
IntroduĆ§Ć£o a JPA (2010)
IntroduĆ§Ć£o a JPA (2010)IntroduĆ§Ć£o a JPA (2010)
IntroduĆ§Ć£o a JPA (2010)Helder da Rocha
Ā 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
Ā 

What's hot (20)

Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?
Ā 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
Ā 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Ā 
Recommender system
Recommender systemRecommender system
Recommender system
Ā 
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ..."Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
Ā 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
Ā 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
Ā 
Graph databases
Graph databasesGraph databases
Graph databases
Ā 
ML Basics
ML BasicsML Basics
ML Basics
Ā 
PostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | EdurekaPostgreSQL Tutorial For Beginners | Edureka
PostgreSQL Tutorial For Beginners | Edureka
Ā 
Content Management with MongoDB by Mark Helmstetter
 Content Management with MongoDB by Mark Helmstetter Content Management with MongoDB by Mark Helmstetter
Content Management with MongoDB by Mark Helmstetter
Ā 
Functional Domain Modeling - The ZIO 2 Way
Functional Domain Modeling - The ZIO 2 WayFunctional Domain Modeling - The ZIO 2 Way
Functional Domain Modeling - The ZIO 2 Way
Ā 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Ā 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
Ā 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
Ā 
Zipline: Airbnbā€™s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnbā€™s Machine Learning Data Management Platform with Nikhil Simha...Zipline: Airbnbā€™s Machine Learning Data Management Platform with Nikhil Simha...
Zipline: Airbnbā€™s Machine Learning Data Management Platform with Nikhil Simha...
Ā 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Ā 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
Ā 
IntroduĆ§Ć£o a JPA (2010)
IntroduĆ§Ć£o a JPA (2010)IntroduĆ§Ć£o a JPA (2010)
IntroduĆ§Ć£o a JPA (2010)
Ā 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
Ā 

Similar to Recommender Systems: Beyond the user-item matrix

PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docxPREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docxChantellPantoja184
Ā 
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfAssumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfErin Shellman
Ā 
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docxThe projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docxssusera34210
Ā 
IPO Framework PowerPoint Presentation Slides
IPO Framework PowerPoint Presentation SlidesIPO Framework PowerPoint Presentation Slides
IPO Framework PowerPoint Presentation SlidesSlideTeam
Ā 
Lesson 9--production[1]
Lesson 9--production[1]Lesson 9--production[1]
Lesson 9--production[1]Ashley Birmingham
Ā 
Repositioning Assignment 1. All students are required to c.docx
Repositioning Assignment 1. All students are required to c.docxRepositioning Assignment 1. All students are required to c.docx
Repositioning Assignment 1. All students are required to c.docxkellet1
Ā 
Introduction to Personalisation - Stephen Tucker
Introduction to Personalisation - Stephen TuckerIntroduction to Personalisation - Stephen Tucker
Introduction to Personalisation - Stephen TuckerDuncan Heath
Ā 
Recommend Products To Intsacart Customers
Recommend Products To Intsacart CustomersRecommend Products To Intsacart Customers
Recommend Products To Intsacart CustomersOindrila Sen
Ā 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxLSURYAPRAKASHREDDY
Ā 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation enginehkbhadraa
Ā 
Failure Rate Prediction with Deep Learning
Failure Rate Prediction with Deep LearningFailure Rate Prediction with Deep Learning
Failure Rate Prediction with Deep LearningAlina Astrakova
Ā 
Im posting this again because the answer wasnt correct.Please .pdf
Im posting this again because the answer wasnt correct.Please .pdfIm posting this again because the answer wasnt correct.Please .pdf
Im posting this again because the answer wasnt correct.Please .pdfmaheshkumar12354
Ā 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxHarshavardhan851231
Ā 
Deep recommendations in PyTorch
Deep recommendations in PyTorchDeep recommendations in PyTorch
Deep recommendations in PyTorchStitch Fix Algorithms
Ā 
APIs for catalogs
APIs for catalogsAPIs for catalogs
APIs for catalogsX.commerce
Ā 
Magento 2 Automatic Related Products Extension by itoris inc
Magento 2 Automatic Related Products Extension by itoris incMagento 2 Automatic Related Products Extension by itoris inc
Magento 2 Automatic Related Products Extension by itoris incItexus LLC
Ā 
A/B testing in Firebase. Intermediate and advanced approach
A/B testing in Firebase. Intermediate and advanced approachA/B testing in Firebase. Intermediate and advanced approach
A/B testing in Firebase. Intermediate and advanced approachGameCamp
Ā 
Promotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: DataPromotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: DataMinha Hwang
Ā 

Similar to Recommender Systems: Beyond the user-item matrix (20)

PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docxPREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
Ā 
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfAssumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
Ā 
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docxThe projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
Ā 
IPO Framework PowerPoint Presentation Slides
IPO Framework PowerPoint Presentation SlidesIPO Framework PowerPoint Presentation Slides
IPO Framework PowerPoint Presentation Slides
Ā 
Lesson 9--production[1]
Lesson 9--production[1]Lesson 9--production[1]
Lesson 9--production[1]
Ā 
Repositioning Assignment 1. All students are required to c.docx
Repositioning Assignment 1. All students are required to c.docxRepositioning Assignment 1. All students are required to c.docx
Repositioning Assignment 1. All students are required to c.docx
Ā 
Fast Distributed Online Classification
Fast Distributed Online Classification Fast Distributed Online Classification
Fast Distributed Online Classification
Ā 
Introduction to Personalisation - Stephen Tucker
Introduction to Personalisation - Stephen TuckerIntroduction to Personalisation - Stephen Tucker
Introduction to Personalisation - Stephen Tucker
Ā 
Recommend Products To Intsacart Customers
Recommend Products To Intsacart CustomersRecommend Products To Intsacart Customers
Recommend Products To Intsacart Customers
Ā 
BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
Ā 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
Ā 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation engine
Ā 
Failure Rate Prediction with Deep Learning
Failure Rate Prediction with Deep LearningFailure Rate Prediction with Deep Learning
Failure Rate Prediction with Deep Learning
Ā 
Im posting this again because the answer wasnt correct.Please .pdf
Im posting this again because the answer wasnt correct.Please .pdfIm posting this again because the answer wasnt correct.Please .pdf
Im posting this again because the answer wasnt correct.Please .pdf
Ā 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
Ā 
Deep recommendations in PyTorch
Deep recommendations in PyTorchDeep recommendations in PyTorch
Deep recommendations in PyTorch
Ā 
APIs for catalogs
APIs for catalogsAPIs for catalogs
APIs for catalogs
Ā 
Magento 2 Automatic Related Products Extension by itoris inc
Magento 2 Automatic Related Products Extension by itoris incMagento 2 Automatic Related Products Extension by itoris inc
Magento 2 Automatic Related Products Extension by itoris inc
Ā 
A/B testing in Firebase. Intermediate and advanced approach
A/B testing in Firebase. Intermediate and advanced approachA/B testing in Firebase. Intermediate and advanced approach
A/B testing in Firebase. Intermediate and advanced approach
Ā 
Promotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: DataPromotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: Data
Ā 

More from Eugene Yan Ziyou

Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionEugene Yan Ziyou
Ā 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsEugene Yan Ziyou
Ā 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Eugene Yan Ziyou
Ā 
INSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my JourneyINSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my JourneyEugene Yan Ziyou
Ā 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceEugene Yan Ziyou
Ā 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data ScienceEugene Yan Ziyou
Ā 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Eugene Yan Ziyou
Ā 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionEugene Yan Ziyou
Ā 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaEugene Yan Ziyou
Ā 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)Eugene Yan Ziyou
Ā 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Eugene Yan Ziyou
Ā 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveEugene Yan Ziyou
Ā 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communityEugene Yan Ziyou
Ā 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
Ā 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Eugene Yan Ziyou
Ā 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsEugene Yan Ziyou
Ā 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsEugene Yan Ziyou
Ā 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and DistributionEugene Yan Ziyou
Ā 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USEugene Yan Ziyou
Ā 
Diving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsDiving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsEugene Yan Ziyou
Ā 

More from Eugene Yan Ziyou (20)

Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admission
Ā 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
Ā 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Ā 
INSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my JourneyINSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my Journey
Ā 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data Science
Ā 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data Science
Ā 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Ā 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversion
Ā 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at Lazada
Ā 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)
Ā 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Ā 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDive
Ā 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG community
Ā 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Ā 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)
Ā 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Ā 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
Ā 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
Ā 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the US
Ā 
Diving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsDiving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brands
Ā 

Recently uploaded

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
Ā 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
Ā 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
Ā 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
Ā 
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...amitlee9823
Ā 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
Ā 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
Ā 
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...amitlee9823
Ā 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
Ā 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
Ā 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
Ā 
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
Ā 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
Ā 
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Delhi Call girls
Ā 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
Ā 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
Ā 
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceBDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceDelhi Call girls
Ā 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
Ā 

Recently uploaded (20)

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Ā 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
Ā 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
Ā 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
Ā 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
Ā 
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: šŸ“ 7737669865 šŸ“ High Profile Model Escorts | Bangalore...
Ā 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Ā 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
Ā 
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call šŸ‘— 7737669865 šŸ‘— Top Class Call Girl Service Ba...
Ā 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
Ā 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
Ā 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
Ā 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Ā 
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ā˜Žāœ”šŸ‘Œāœ” Whatsapp Hard And Sexy Vip Call
Ā 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
Ā 
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi šŸ’Æ Call Us šŸ”9205541914 šŸ”( Delhi) Escorts S...
Ā 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
Ā 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
Ā 
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort ServiceBDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
BDSMāš”Call Girls in Mandawali Delhi >ą¼’8448380779 Escort Service
Ā 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Ā 

Recommender Systems: Beyond the user-item matrix

  • 1. Recommender Systems 102 Beyond the (usual) user-item matrixā€”implementation & results DataScience SG Meetup Jan 2020
  • 2. About me Ā§ Lead Data Scientist @ health-tech startup - Early detection of preventable diseases - Healthcare resource allocation Ā§ Previously: VP, Data Science @ Lazada - E-commerce ML systems - Facilitated integration with Alibaba Ā§ More at https://eugeneyan.com
  • 3. RecSys Overview Figure 1. Obligatory (clichĆ©) recsys representation
  • 4. Definition: Use behavior data to predict what other users will like based on user/item similarity
  • 5. Topics* Ā§ Data Acquisition, Preparation, Split, etc. Ā§ Conventional Baseline Ā§ Applying Graph and NLP approaches * Implementation and results discussed throughout
  • 6. Laying the Groundwork Data acquisition, preparation, train-val-split, etc.
  • 8. { "asin": "0000031852", "title": "Girls Ballet Tutu Zebra Hot Pink", "price": 3.17, "imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg", "relatedā€: { "also_bought":[ "B00JHONN1S", "B002BZX8Z6", "B00D2K1M3O", ... "B007R2RM8W" ], "also_viewed":[ "B002BZX8Z6", "B00JHONN1S", "B008F0SU0Y", ... "B00BFXLZ8M" ], "bought_together":[ "B002BZX8Z6" ] }, "salesRank": { "Toys & Games":211836 }, "brand": "Coxlures", "categories":[ [ "Sports & Outdoors", "Other Sports", "Dance" ] ] }
  • 9. { "asin": "0000031852", "title": "Girls Ballet Tutu Zebra Hot Pink", "price": 3.17, "imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg", "relatedā€: { "also_bought":[ "B00JHONN1S", "B002BZX8Z6", "B00D2K1M3O", ... "B007R2RM8W" ], "also_viewed":[ "B002BZX8Z6", "B00JHONN1S", "B008F0SU0Y", ... "B00BFXLZ8M" ], "bought_together":[ "B002BZX8Z6" ] }, "salesRank": { "Toys & Games":211836 }, "brand": "Coxlures", "categories":[ [ "Sports & Outdoors", "Other Sports", "Dance" ] ] }
  • 10. Parsing json Ā§ Require parsing json to tabular form Ā§ Fairly large, with the largest having 142.8 million rows and 20gb on disk Ā§ Not able to load into ram fully on regular laptop (16gb ram)
  • 11. def parse_json_to_csv(read_path: str, write_path: str) -> None: csv_writer = csv.writer(open(write_path, 'w')) i = 0 for d in parse(read_path): if i == 0: header = d.keys() csv_writer.writerow(header) csv_writer.writerow(d.values().lower()) i += 1 if i % 10000 == 0: logger.info('Rows processed: {:,}'.format(i)) logger.info('Csv saved to {}'.format(write_path))
  • 12. Getting product- pairs Ā§ Evaluate string and convert to dictionary Ā§ Get product-pairs for each relationship Ā§ Explode each product-pair into a row
  • 13. Getting product- pairs Ā§ Evaluate string and convert to dictionary Ā§ Get product-pairs for each relationship Ā§ Explode each product-pair into a row product1 | product2 | relationship -------------------------------------- B001T9NUFS | B003AVEU6G | also_viewed 0000031895 | B002R0FA24 | also_viewed B007ZN5Y56 | B005C4Y4F6 | also_viewed 0000031909 | B00538F5OK | also_bought B00CYBULSO | B00B608000 | also_bought B004FOEEHC | B00D9C32NI | bought_together Table 1. Product-pairs and relationships (sample)
  • 14. Scoring product- pairs Ā§ Simple way: Assign 1.0 if product-pair has any/multiple relationships, 0.0 otherwise Ā§ My approach: Score relationships differently* - Bought together: 1.2, Also bought: 1.0, Also viewed: 0.5
  • 15. Scoring product- pairs Ā§ Simple way: Assign 1.0 if product-pair has any/multiple relationships, 0.0 otherwise Ā§ My approach: Score relationships differently* - Bought together: 1.2, Also bought: 1.0, Also viewed: 0.5 product1 | product2 | weight -------------------------------- B001T9NUFS | B003AVEU6G | 0.5 0000031895 | B002R0FA24 | 0.5 B007ZN5Y56 | B005C4Y4F6 | 0.5 0000031909 | B00538F5OK | 1.0 B00CYBULSO | B00B608000 | 1.0 B004FOEEHC | B00D9C32NI | 1.2 Table 2. Product-pairs and weights (sample) * Assume relationships are symmetrical
  • 16. Electronics Books Unique products 418,749 1,948,370 Product-pairs 4,005,262 26,595,848 Sparsity 0.9999 0.9999 š‘†š‘š‘Žš‘Ÿš‘ š‘–š‘”š‘¦ = 1 āˆ’ š¶š‘œš‘¢š‘›š‘”(š‘›š‘œš‘›š‘§š‘’š‘Ÿš‘œ š‘’š‘™š‘’š‘šš‘’š‘›š‘”š‘ ) š¶š‘œš‘¢š‘›š‘”(š‘”š‘œš‘”š‘Žš‘™ š‘’š‘™š‘’š‘šš‘’š‘›š‘”š‘ ) Table 3. Unique products and sparsity for electronics and books
  • 17. Train-Validation Split Or how to create negative samples (at scale)
  • 18. Splitting the data Ā§ Random split: 2/3 train, 1/3 validation Ā§ Easy, right? Ā§ But our dataset only consists of positive product-pairsā€”how do we validate?
  • 19. Splitting the data Ā§ Random split: 2/3 train, 1/3 validation Ā§ Easy, right? Ā§ Not so fast! Our dataset only has positive product-pairsā€”how do we validate?
  • 20. Creating negative samples Ā§ Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million timesā€”very slow! Ā§ Hack: Add products in array, shuffle, slice to sample; shuffle when exhaustedā€”fast!
  • 21. Creating negative samples Ā§ Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million timesā€”very slow! Ā§ Hack: Add products in array, shuffle, slice to sample; re-shuffle when exhaustedā€”fast!
  • 22. Creating negative samples Ā§ Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million timesā€”very slow! Ā§ Hack: Add products in array, shuffle, slice to sample; re-shuffle when exhaustedā€”fast! products ---------- B001T9NUFS 0000031895 B007ZN5Y56 0000031909 B00CYBULSO B004FOEEHC Negative product-pair 1
  • 23. Creating negative samples Ā§ Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million timesā€”very slow! Ā§ Hack: Add products in array, shuffle, slice to sample; re-shuffle when exhaustedā€”fast! products ---------- B001T9NUFS 0000031895 B007ZN5Y56 0000031909 B00CYBULSO B004FOEEHC Negative product-pair 2
  • 24. Creating negative samples Ā§ Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million timesā€”very slow! Ā§ Hack: Add products in array, shuffle, slice to sample; re-shuffle when exhaustedā€”fast! products ---------- B001T9NUFS 0000031895 B007ZN5Y56 0000031909 B00CYBULSO B004FOEEHC Negative product-pair 3
  • 26. Batch MF Ā§ Common approach 1: Load matrix in memory; apply Python package (e.g., scipy.svd, surprise, etc.) Ā§ Common approach 2: Run on cluster with SparkML Alternating Least Squares Ā§ Very resource intensive! - Is there a smarter way, given the sparse data?
  • 27. Batch MF Ā§ Common approach 1: Load matrix in memory; apply Python package (e.g., scipy.svd, surprise, etc.) Ā§ Common approach 2: Run on cluster with SparkML Alternating Least Squares Ā§ Very resource intensive! - Is there a smarter way, given the sparse data?
  • 28. Iterative MF Ā§ Only load (or read from disk) product-pairs, instead of entire matrix that contains zeros Ā§ Matrix factorization by iterating through each product-pair
  • 29. Iterative MF (numeric labels, step 0) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (multiply embeddings and sum) prediction = sum(product1_emb * product2_emb, dim=1) # Minimize loss loss = MeanSquaredErrorLoss(prediction, label) loss.backward() optimizer.step()
  • 30. Iterative MF (numeric labels, step 1) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sum(product1_emb * product2_emb, dim=1) # Minimize loss loss = MeanSquaredErrorLoss(prediction, label) loss.backward() optimizer.step()
  • 31. Iterative MF (numeric labels, step 2) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sum(product1_emb * product2_emb, dim=1) # Minimize loss loss = MeanSquaredErrorLoss(prediction, label) loss.backward() optimizer.step()
  • 32. Iterative MF (numeric labels, step 3) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sum(product1_emb * product2_emb, dim=1) # Minimize loss loss = MeanSquaredErrorLoss(prediction, label) loss.backward() optimizer.step()
  • 33. Iterative MF (binary labels) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sig(sum(product1_emb * product2_emb, dim=1)) # Minimize loss loss = BinaryCrossEntropyLoss(prediction, label) loss.backward() optimizer.step()
  • 34. Regularize! for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sig(sum(product1_emb * product2_emb, dim=1)) l2_reg = lambda * sum(embedding.weight ** 2) # Minimize loss loss = BinaryCrossEntropyLoss(prediction, label) loss += l2_reg loss.backward() optimizer.step()
  • 35. Training Schedule Figure 2. Cosine Annealing training schedule
  • 36. Results (MF) Binary labels AUC-ROC = 0.8083 Time for 5 epochs = 45 min Figure 3a and 3b. Precision recall curves for Matrix Factorization
  • 37. Results (MF) Binary labels AUC-ROC = 0.8083 Time for 5 epochs = 45 min Figure 3a and 3b. Precision recall curves for Matrix Factorization
  • 38. Results (MF) Binary labels AUC-ROC = 0.8083 Time for 5 epochs = 45 min Continuous labels AUC-ROC = 0.9225 Time for 5 epochs = 45 min Figure 3a and 3b. Precision recall curves for Matrix Factorization
  • 39. Results (MF) Figure 3a and 3b. Precision recall curves for Matrix Factorization ā€Cliff of Deathā€
  • 40. Learning curve (MF) Figure 4. AUC-ROC across epochs for matrix factorization; Each time learning rate is reset, the model seems to ā€forgetā€, causing AUC-ROC to revert to ~0.5. Also, a single epoch seems sufficient
  • 41. Matrix Factorization + bias Incremental improvement on the baseline
  • 42. Adding bias Ā§ What if a product is generally popular or unpopular? Ā§ Learn a bias factor (i.e., single number for each product)
  • 43. Results (MF-bias) Binary labels AUC-ROC = 0.7951 Time for 5 epochs = 45 min Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
  • 44. Results (MF-bias) Binary labels AUC-ROC = 0.7951 Time for 5 epochs = 45 min Continuous labels AUC-ROC = 0.8319 Time for 5 epochs = 45 min Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
  • 45. Results (MF-bias) Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias More ā€œproduction friendlyā€
  • 46. Off the Beaten Path Natural language processing (ā€œNLPā€) and Graphs in RecSys
  • 47. Word2Vec Ā§ In 2013, two seminal papers by Tomas Mikolov on Word2Vec (ā€w2vā€) Ā§ Demonstrated w2v could learn semantic and syntactic word vector representations Ā§ TL; DR: Converts words into numbers (array)
  • 48. DeepWalk Ā§ Unsupervised learning of representations of nodes (i.e., vertices) in a social network Ā§ Generate sequences from random walks on (social) graph Ā§ Learn vector representations of nodes (e.g., profiles, content)
  • 50. How do NLP and Graphs matter? Ā§ Create graph from product-pairs + weights Ā§ Generate sequences from graph (via random walk) Ā§ Learn product embeddings (via word2vec) Ā§ Recommend based on embedding similarity (e.g., cosine similarity, dot product)
  • 51. How do NLP and Graphs matter? Ā§ Create graph from product-pairs + weights Ā§ Generate sequences from graph (via random walk) Ā§ Learn product embeddings (via word2vec) Ā§ Recommend based on embedding similarity (e.g., cosine similarity, dot product)
  • 52. How do NLP and Graphs matter? Ā§ Create graph from product-pairs + weights Ā§ Generate sequences from graph (via random walk) Ā§ Learn product embeddings (via word2vec) Ā§ Recommend based on embedding similarity (e.g., cosine similarity, dot product)
  • 53. How do NLP and Graphs matter? Ā§ Create graph from product-pairs + weights Ā§ Generate sequences from graph (via random walk) Ā§ Learn product embeddings (via word2vec) Ā§ Recommend based on embedding similarity (e.g., cosine similarity, dot product)
  • 55. Creating a product graph Ā§ We have product-pairs and weights - These are our graph edges Ā§ Create a weighted graph with networkx - Each graph edge is given a numerical weight, instead of all edges having same weight product1 | product2 | weight -------------------------------- B001T9NUFS | B003AVEU6G | 0.5 0000031895 | B002R0FA24 | 0.5 B007ZN5Y56 | B005C4Y4F6 | 0.5 0000031909 | B00538F5OK | 1.0 B00CYBULSO | B00B608000 | 1.1 B004FOEEHC | B00D9C32NI | 1.2 Table 2. Product-pairs and weights
  • 56. Random Walks Ā§ Direct approach: Traverse networkx graph - For 10 sequences of length 10 for a starting node, need to traverse 100 times - 2 mil nodes for books graph = 200 mil queries - Very slow and memory intensive Ā§ Hack: Work with transition probabilities
  • 57. Random Walks Ā§ Direct approach: Traverse networkx graph - For 10 sequences of length 10 for a starting node, need to traverse 100 times - 2 mil nodes for books graph = 200 mil queries - Very slow and memory intensive Ā§ Hack: Work directly on transition probabilities
  • 59. Random Walks (Weighted- adjacency matrix) Product1 Product2 Product3 Product4 Product5 Product1 1 1 3 Product2 1 1 Product3 1 2 Product4 3 2 Product5 1
  • 60. Random Walks (Transition matrix) Product1 Product2 Product3 Product4 Product5 Product1 .2 .2 .6 Product2 .5 .5 Product3 .33 .67 Product4 .6 .4 Product5 1.0
  • 61. Random Walks (Transition matrix) Product1 Product2 Product3 Product4 Product5 Product1 .2 .2 .6 Product2 .5 .5 Product3 .33 .67 Product4 .6 .4 Product5 1.0 Transition-probability(Product3)
  • 62. B001T9NUFS B003AVEU6G B005C4Y4F6 B007ZN5Y56 ... B007ZN5Y56 0000031895 B00538F5OK B004FOEEHC B001T9NUFS ... 0000031895 B005C4Y4F6 0000031909 B00CYBULSO B003AVEU6G ... B00D9C32NI B00CYBULSO B001T9NUFS B002R0FA24 B00CYBULSO ... B007ZN5Y56 B004FOEEHC B00CYBULSO B001T9NUFS B002R0FA24 ... B00B608000 ... ... ... 0000031909 B00B608000 B00D9C32NI B00CYBULSO ... B007ZN5Y56 Length of sequence (10) No. of nodes (420k) * samples per node (10)
  • 63. Pre-canned Node2Vec Readily available open-sourced implementations
  • 64. Node2Vec Ā§ Seemed to work out of the box - Just need to provide edges - Uses networkx and gensim under the hood Ā§ But very memory intensive and slow - Could not run to completion even on 64gb ram https://github.com/aditya-grover/node2vec
  • 65. Gensim Word2Vec Using a trusted package as baseline
  • 66. Gensim w2v Ā§ Very easy to use - Takes in a list of sequences - Can be multithreaded - CPU-only Ā§ Fastest to complete 5 epochs
  • 67. Results (gensim w2v) All products AUC-ROC = 0.9082 Time for 5 epochs = 2.58 min Figure 6a and 6b. Precision recall curves for gensim.word2vec
  • 68. Results (gensim w2v) All products AUC-ROC = 0.9082 Time for 5 epochs = 2.58 min Seen products only AUC-ROC = 0.9735 Time for 5 epochs = 2.58 min Figure 6a and 6b. Precision recall curves for gensim.word2vec
  • 69. Results (gensim w2v) Figure 6a and 6b. Precision recall curves for gensim.word2vec Unseen products without embeddings
  • 70. Building w2v from Scratch To plot learning curves and extend it
  • 71. Data Loader Ā§ Input sequences instead of product-pairs Ā§ Implements two features from w2v papers - Subsampling of frequent words - Negative sampling
  • 72. Data Loader (sub- sampling) Ā§ Drop out words of higher frequency - Frequency of 0.0026 = 0.0 dropout - Frequency of 0.00746 = 0.5 dropout - Frequency of 1.0 = 0.977 dropout Ā§ Accelerated learning and improved vectors of rare words š·š‘Ÿš‘œš‘š‘œš‘¢š‘” š‘ƒš‘Ÿš‘œš‘ š‘¤š‘œš‘Ÿš‘‘ = 1 āˆ’ š¹š‘Ÿš‘’š‘ž š‘¤š‘œš‘Ÿš‘‘ 0.001 + 1 Ɨ 0.001 š¹š‘Ÿš‘’š‘ž(š‘¤š‘œš‘Ÿš‘‘)
  • 73. Data Loader (sub- sampling) Ā§ Drop out words of higher frequency - Frequency of 0.0026 = 0.0 dropout - Frequency of 0.00746 = 0.5 dropout - Frequency of 1.0 = 0.977 dropout Ā§ Accelerated learning and improved vectors of rare words š·š‘Ÿš‘œš‘š‘œš‘¢š‘” š‘ƒš‘Ÿš‘œš‘ š‘¤š‘œš‘Ÿš‘‘ = 1 āˆ’ š¹š‘Ÿš‘’š‘ž š‘¤š‘œš‘Ÿš‘‘ 0.001 + 1 Ɨ 0.001 š¹š‘Ÿš‘’š‘ž(š‘¤š‘œš‘Ÿš‘‘)
  • 74. Data Loader (Negative sampling) Ā§ Original skip-gram ends with SoftMax - If vocab = 10k words, embedding dim = 128, 1.28 million weights to updateā€”expensive! - In RecSys, ā€vocabā€ in the millions Ā§ Negative sampling - Only modify weights of negative pair samples - If 6 pairs (1 pos, 5 neg) and 1 mil products, only update 0.0006 weightsā€”efficient!
  • 75. Data Loader (Negative sampling) Ā§ Original skip-gram ends with SoftMax - If vocab = 10k words, embedding dim = 128, 1.28 million weights to updateā€”expensive! - In RecSys, ā€vocabā€ in the millions Ā§ Negative sampling - Only modify weights of negative pair samples - If 6 pairs (1 pos, 5 neg) and 1 mil products, only update 0.0006% weightsā€”very efficient!
  • 76. PyTorch Word2Vec (step 0) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs score = torch.mul(emb_center, emb_context) score = torch.sum(score, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 77. PyTorch Word2Vec (step 1) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs score = torch.mul(emb_center, emb_context) score = torch.sum(score, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 78. PyTorch Word2Vec (step 2) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs (interaction term and sum) score = torch.sum(emb_center * emb_context, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 79. PyTorch Word2Vec (step 3) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs (interaction term and sum) score = torch.sum(emb_center * emb_context, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs (batch interaction term and sum) neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 80. PyTorch Word2Vec (step 4) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs score = torch.sum(emb_center * emb_context, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 81. Results (w2v) Figure 7a and 7b. Precision recall curves for PyTorch Word2Vec All products AUC-ROC = 0.9554 Time for 5 epochs = 23.63 min
  • 82. Results (w2v) Figure 7a and 7b. Precision recall curves for PyTorch Word2Vec All products AUC-ROC = 0.9554 Time for 5 epochs = 23.63 min Seen products only AUC-ROC = 0.9855 Time for 5 epochs = 23.63 min
  • 83. Learning curve (w2v) Figure 8. AUC-ROC across epochs for word2vec; a single epoch seems sufficient
  • 84. Overall results so far Ā§ Improvement on gensim.word2vec and Alibaba paper All products Seen products only PyTorch MF 0.7951 - Gensim w2v 0.9082 0.9735 PyTorch w2v 0.9554 0.9855 Alibaba Paper* 0.9327 - * Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (https://arxiv.org/abs/1803.02349) Table 4. AUC-ROC across various implementations
  • 85. Adding side info to w2v To help solve the cold start problem
  • 86. Extending w2v Ā§ For each product, we have information like category, brand, price group, etc. - Why not add this when learning embeddings? B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
  • 87. Extending w2v Ā§ For each product, we have information like category, brand, price group, etc. - Why not add this when learning embeddings? B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
  • 88. Extending w2v Ā§ For each product, we have information like category, brand, price group, etc. - Why not add this when learning embeddings? B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56 Television Sound bar Lamp Standing Fan Sony Sony Phillips Dyson 500 ā€“ 600 200 ā€“ 300 50 ā€“ 75 300 - 400
  • 89. Extending w2v Ā§ For each product, we have information like category, brand, price group, etc. - Why not add this when learning embeddings? Ā§ Alibaba paper reported AUC-ROC improvement from 0.9327 to 0.9575 B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56 Television Sound bar Lamp Standing Fan Sony Sony Phillips Dyson 500 ā€“ 600 200 ā€“ 300 50 ā€“ 75 300 - 400
  • 90. Weighting side info Ā§ Two version were implemented Ā§ 1: Equal-weighted average of embeddings Ā§ 2: Learn weightage for each embedding and applying weighted average
  • 91. Learning curve (w2v with side info) Figure 9. AUC-ROC across epochs for word2vec with side information
  • 92. Why doesnā€™t it work?! Ā§ Perhaps due to sparsity of metadata - Of 418,749 electronics, metadata available for 162,023 (39%); Of these, brand was 51% empty Ā§ But I assumed the weights of the (useless) embeddings would be learntā€” ĀÆ_(惄)_/ĀÆ Ā§ An example of more data ā‰  better
  • 93. Why doesnā€™t it work?! Ā§ Perhaps due to sparsity of metadata - Of 418,749 electronics, metadata available for 162,023 (39%); Of these, brand was 51% empty Ā§ But I assumed the weights of the (useless) embeddings would be learntā€” ĀÆ_(惄)_/ĀÆ Ā§ An example of more data ā‰  better
  • 94. Why w2v > MF? Is it skip-gram? Or sequences?
  • 95. Mixing it up to pull it apart Ā§ Why does w2v perform so much better? Ā§ For the fun of it, lets use the MF-bias model with sequence data (used in w2v)
  • 96. Results & learning curve Figure 10a and 10b. Precision recall curve and learning curve for PyTorch MF-bias with sequences All products AUC-ROC = 0.9320 Time for 5 epochs = 70.39 min
  • 97. Further Extensions What Airbnb, Facebook, and Uber are doing
  • 98. Embed everything Ā§ Building user embeddings in the same vector space as products (Airbnb) - Train user embeddings based on interactions with products (e.g., click, ignore, purchase) Ā§ Embed all discrete features and just learn similarities (Facebook) Ā§ Graph Neural Networks for embeddings; node neighbors as representation (Uber Eats)
  • 99. Key Takeaways Last two tables, I promise
  • 100. Overall results (electronics) All products Seen products only Runtime (min) PyTorch MF 0.7951 - 45 Gensim w2v 0.9082 0.9735 2.58 PyTorch w2v 0.9554 0.9855 23.63 PyTorch w2v with side info NA NA NA PyTorch MF with sequences 0.9320 - 70.39 Alibaba Paper* 0.9327 - - * Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (https://arxiv.org/abs/1803.02349) Table 5. AUC-ROC across various implementations (electronics)
  • 101. Overall results (books) All products Seen products only Runtime (min) PyTorch MF 0.4996 - 1353.12 Gensim w2v 0.9701 0.9892 16.24 PyTorch w2v 0.9775 - 122.66 PyTorch w2v with side info NA NA NA PyTorch MF with sequences 0.7196 - 1393.08 Table 6. AUC-ROC across various implementations (books)
  • 102. Ā§ Donā€™t just look at numeric metricsā€”plot some curves! - Especially if you need some arbitrary threshold (i.e., classification) Ā§ Matrix Factorization is an okay-ish baseline Ā§ Word2vec is a great baseline Ā§ Training on sequences is epic
  • 103. Ā§ Donā€™t just look at numeric metricsā€”plot some curves! - Especially if you need some arbitrary threshold (i.e., classification) Ā§ Matrix Factorization is an okay-ish baseline Ā§ Word2vec is a great baseline Ā§ Training on sequences is epic
  • 104. Ā§ Donā€™t just look at numeric metricsā€”plot some curves! - Especially if you need some arbitrary threshold (i.e., classification) Ā§ Matrix Factorization is an okay-ish baseline Ā§ Word2vec is a great baseline Ā§ Training on sequences is epic
  • 105. Ā§ Donā€™t just look at numeric metricsā€”plot some curves! - Especially if you need some arbitrary threshold (i.e., classification) Ā§ Matrix Factorization is an okay-ish baseline Ā§ Word2vec is a great baseline Ā§ Training on sequences is epic
  • 107. References McAuley, J., Targett, C., Shi, Q., & Van Den Hengel, A. (2015, August). Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 43-52). ACM. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM. Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855-864). ACM.
  • 108. References Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B., & Lee, D. L. (2018, July). Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 839-848). ACM. Grbovic, M., & Cheng, H. (2018, July). Real-time personalization using embeddings for search ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 311-320). ACM. Wu, L. Y., Fisch, A., Chopra, S., Adams, K., Bordes, A., & Weston, J. (2018, April). Starspace: Embed all the things!. In Thirty-Second AAAI Conference on Artificial Intelligence. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations, https://eng.uber.com/uber-eats-graph-learning/, retrieved 10 Jan 2020