SlideShare a Scribd company logo
1 of 33
Download to read offline
Embedding Based Frequently Bought
Together Recommendations:
A Production Use Case
Agenda
Mehmet Selman Sezgin
Senior Data Engineer at Hepsiburada
Ulukbek Attokurov
Data Scientist at Hepsiburada
Content
● Embedding Based Recommendations
● Modeling (Frequently Bought Together)
● Arithmetic operations on Embeddings
● Architecture Overview
● Serving Layer
● Experimental UI
● Online Metrics
● Conclusion
▪ 40+ categories▪ 200M+ visitors
per month
▪ 30M+ products
https://developers.google.com/machine-learning/crash-course/embeddings
An embedding is a relatively low-
dimensional space into which you can
translate high-dimensional vectors.
Co-occurrence vs Embeddings
▪ Uses raw co-occurrence statistics
(Salton89 which is TF-IDF based
metric)
▪ Uses behavior data (product views,
order, add-to-cart)
▪ Generates item based
recommendation
▪ Can not project users and items into
the same space
▪ Uses advanced methods (Resnet,
Inception, VGG, Word2Vec, Doc2Vec,
AutoEncoders , BERT etc.)
▪ Generates item and user based
recommendations
▪ Uses content information as product
image, product description, product
name, product attributes
▪ Image, text, behavior embeddings
can be projected into the same space
EmbeddingsCo-occurrence statistics
Co-occurrence vs Embeddings
▪ Products are not recommended if
they do not appear in the same
context
▪ Context information such as
products appeared in the same
session, transaction etc. is not
employed
▪ Content information (image, text
etc.) is not used
▪ Similarity metrics can be calculated
▪ Use as features in unsupervised and
supervised methods to optimize a
business metric as propensity score.
▪ Use as features in Neural Networks
as LSTM to model behavior of
customers over time.
▪ Use as features in KNN to
recommend the most similar items
EmbeddingsCo-occurrence statistics
Frequently Bought Together
▪ Goal: building recommendations to offer complementary products to
our customers
▪ Challenges:
▪ Orders might contain products from diverse categories
▪ Generating recommendations using 30M+ products distributed over 40+ categories
▪ Tips: bought together does not mean that the items which co-occur in
the sequence are similar
▪ Our model choice: Word2Vec
Word2Vec
▪ Easy to use
▪ Easy to train
▪ Simple format of training samples
▪ User friendly libraries like Gensim
▪ A few parameters to optimize
▪ A lot of practical use cases
Data Preparation
▪ Sentence
▪ Bag-of-Words
▪ “I am attending a conference”
▪ [“I”, “attending”, “conference”]
▪ User behavior (views, purchases etc.)
▪ Set of purchased items
▪ Orders: Keyboard, Computer, Mouse
▪ [“Keyboard”, “Computer”, “Mouse”]
Frequently Bought TogetherNLP
Data Preparation - Context Separation
▪ Sequences may contain the products
from diverse categories
▪ [“Keyboard”, “Mouse”, “Shoes”, “Socks”]
▪ Sub-sequences may be created depending
on labels as category, brand etc.
▪ [“Keyboard”, “Mouse”] and [“Shoes”,Socks”]
Sub-sequenceSequence
Code Sample for Data Preparation
Word2Vec Parameters
▪ Random Search is applied to restrict a parameter search space
▪ Grid Search is applied to select optimal parameters
▪ Following Word2Vec parameters are optimized
▪ min_count: it is preferable to set lower otherwise coverage will decrease
▪ sample: the most frequent items dominates sequences; it might yield noisy embeddings;
computationally not efficient.
▪ window: the length of context is set to be the maximum length of sequences since order of
items in the sequence is random.
▪ size: tradeoff between network size, storage and computational cost; it is set to be as minimum
as possible without losing the quality of recommendations
▪ iter: default value is very low and thus it is set to be between 50 and 80; model is not trained
well when iter is set to low values;
▪ KNN algorithm is employed to find
the most similar items
▪ Different similarity metrics are used :
Euclidean, Cosine Similarity
▪ Euclidean distance measures the distance between two points and it
is affected by the length of vectors. Thus, it is needed to normalize
vectors in order to obtain more accurate results.
▪ Angle between two vectors determine the similarity of two vectors in
cosine similarity.
Similarity Functions
Offline Metrics
▪ We need simple statistical metrics to be able to check the
performance of the model and to tune parameters
▪ Precision@k
▪ (# of recommended items @k that are relevant) / (# of recommended items @k)
▪ Recall@k
▪ (# of recommended items @k that are relevant) / (total # of relevant items)
▪ HitRate@k
▪ (# of hits @k recommendations ) / (total # of test users)
MLFlow Tracks the Model
▪ It is easy visually inspect the
parameters
▪ Evaluation metrics can be
investigated graphically
▪ It is easy to integrate into the
source code
▪ It is effective for team
collaboration through the
central server
Word2Vec Hyperparameter Tuning
Word2Vec Hyperparameter Tuning
Arithmetic Operations on Embeddings
▪ Is it possible to create new business dimensions using simple
arithmetics on existing product embeddings?
▪ Similarity( AVG(Adidas_Shoes) , AVG(Nike_Shoes)) ≃ 1 ?
▪ Similarity( AVG(Camping tents) , AVG(Outdoor chairs)) ≃ 1 ?
▪ 1_Adidas_Shoe - Adidas_Brand + Nike_Brand ≃ 1_Similar_Nike_Shoe ?
▪ Relevancy is decreasing while entities in higher levels of hierarchy as
categories(Sport, Baby, Women Clothes etc.) are represented using
low level entities as products.
Arithmetic Operations on Embeddings
▪ Brand similarity is relevant if a brand contains homogeneous products
in terms of categories(Upper body clothes, Lower body clothes etc.) .
Architecture Overview
Implementation Tips
▪ PySpark
▪ Enables to work with any python modelling library through spark to pandas dataframe conversion
▪ Pandas UDFs are very useful for parallelization
▪ Conversion from Spark DF to Pandas DF is still costly in terms of memory in spite of using Arrow
Implementation Tips
▪ Model Quality
▪ Offline metrics, experimental UI and online metrics should be used for quality analysis
▪ Process
▪ Useful to use notebooks in experimental stage but it is preferable not to use in production
▪ Transition from experimental stage to production should have minimum cost
▪ Metric Validation should be a part of the flow, not a background analysis in production phase
Model Serving Layer
▪ Approximate Nearest Neighbour
Search Algorithms
▪ Annoy, Faiss, Hnswlib, ScaNN and many others
▪ Choose the library considering
▪ Open source benchmarks
▪ Programming language
▪ Similarity functions
▪ Distributed Index
▪ Incremental item insertion / deletion
▪ Ability to customize
▪ Our choice
▪ Hnswlib + Custom Post-Processing Layer
http://ann-benchmarks.com/
Model Serving Layer - HNSWLIB
▪ Trade-off between hierarchical navigable small world graph
construction and search parameters
▪ Simple tree, weak search: less indexing time, less memory, less cpu usage, low recall
▪ Simple tree, strong search: less indexing time, less memory, more cpu usage, acceptable recall
▪ Complex tree, weak search: more indexing time, more memory, less cpu usage, high recall
▪ Complex tree, complex search: more indexing time, more memory, high cpu usage (waste), high recall
▪ Consider the following metrics to select optimal parameters
▪ Index size / Memory consumption
▪ Build time
▪ Cpu usage
▪ Query per seconds
▪ Recall
Model Serving Layer - Post Processing
▪ Only similarity search will not be enough
▪ You will need to make some post-processing after retrieving result
▪ Implement your custom solution
▪ Make post-processing in the consuming service
▪ Use metadata and post-process supporting solution
▪ ex: opendistro-for-elasticsearch which supports hnswlib index and brings post-processing
functions
▪ Every solution has it’s own pros. and cons. We implemented our
custom solution which enhances the index with metadata and you can
inject any filtering or ranking methods that you need.
Post Filtering Validation Methods
Experimental UI
▪ Reveal what you need
▪ Variant level exclusions
▪ Category level restrictions and exclusions
▪ Brand level restrictions and exclusions
▪ Price aware filters
▪ Gender filters
▪ Top-N category diverse ranking
▪ Etc.
▪ Implement in serving layer
▪ Experiment again
Model Serving Layer - Performance
▪ Single instance
▪ 8K request per second
▪ Under 1ms (~400µs)
▪ Using assembly code
instead of default
distance function
implementations may
improve indexing and
query performance
considerably
(vectorization)
Model Serving Layer - Results on Production
Two FBT Examples on Production (Shown after add to cart action)
Online Metrics
CTR
CR
Coverage
Diversity
Revenue
Usage Ratio
Order Ratio
▪ Placement Title
Placement Location
Position in Placement
Category Levels
Channel
Time of Week/Day
Gender
DimensionsKey Metrics
Online Metrics
▪ Calculate your overall impact
▪ Make your detailed analysis to increase domain knowledge which leads
to improvement of your recommendations
▪ If you only rely on CTR and CR you may lose the big picture
▪ Popular products and their relatively higher CTRs may put you in a
vicious circle in a narrow space.
▪ You should interpret CR metric differently for different categories.
Take Aways
▪ Use embedding representations in recommendation domain as much
as possible
▪ Word2Vec is easy to use and train (without using GPUs) but tune
parameters wisely and asses offline metrics taking into account your
business requirements.
▪ Be careful when applying arithmetic operations on embeddings
▪ Follow small cycles during the experimental and production stages
▪ Design serving layer considering your scale
▪ Use experimental UI and apply post-filtering for more relevant results
▪ Track online metrics to understand real impact of your solution

More Related Content

What's hot

Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsXavier Amatriain
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Recommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life ApplicationsRecommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life ApplicationsLiron Zighelnic
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Machine Learning for retail and ecommerce
Machine Learning for retail and ecommerceMachine Learning for retail and ecommerce
Machine Learning for retail and ecommerceAndrei Lopatenko
 

What's hot (20)

Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
User behavior analytics
User behavior analyticsUser behavior analytics
User behavior analytics
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Recommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life ApplicationsRecommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life Applications
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Developing Movie Recommendation System
Developing Movie Recommendation SystemDeveloping Movie Recommendation System
Developing Movie Recommendation System
 
Machine Learning for retail and ecommerce
Machine Learning for retail and ecommerceMachine Learning for retail and ecommerce
Machine Learning for retail and ecommerce
 

Similar to Frequently Bought Together Recommendations Based on Embeddings

Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!Richard Robinson
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks
 
Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Klas Berlič Fras
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Building A Product Assortment Recommendation Engine
Building A Product Assortment Recommendation EngineBuilding A Product Assortment Recommendation Engine
Building A Product Assortment Recommendation EngineDatabricks
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Niko Neugebauer
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
Pay pal paypal continuous performance as a self-service with fully-automated...
Pay pal  paypal continuous performance as a self-service with fully-automated...Pay pal  paypal continuous performance as a self-service with fully-automated...
Pay pal paypal continuous performance as a self-service with fully-automated...Dynatrace
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Metadata Modeling Best Practices with IBM Cognos Framework Manager
Metadata Modeling Best Practices with IBM Cognos Framework ManagerMetadata Modeling Best Practices with IBM Cognos Framework Manager
Metadata Modeling Best Practices with IBM Cognos Framework ManagerSenturus
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterMongoDB
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems MongoDB
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesMarco Parenzan
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaSease
 
Cloud-based Test Microservices JavaOne 2014
Cloud-based Test Microservices JavaOne 2014Cloud-based Test Microservices JavaOne 2014
Cloud-based Test Microservices JavaOne 2014Shelley Lambert
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaSease
 

Similar to Frequently Bought Together Recommendations Based on Embeddings (20)

Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Building A Product Assortment Recommendation Engine
Building A Product Assortment Recommendation EngineBuilding A Product Assortment Recommendation Engine
Building A Product Assortment Recommendation Engine
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Pay pal paypal continuous performance as a self-service with fully-automated...
Pay pal  paypal continuous performance as a self-service with fully-automated...Pay pal  paypal continuous performance as a self-service with fully-automated...
Pay pal paypal continuous performance as a self-service with fully-automated...
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Metadata Modeling Best Practices with IBM Cognos Framework Manager
Metadata Modeling Best Practices with IBM Cognos Framework ManagerMetadata Modeling Best Practices with IBM Cognos Framework Manager
Metadata Modeling Best Practices with IBM Cognos Framework Manager
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
ATP 2014
ATP 2014ATP 2014
ATP 2014
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data Services
 
L19 Application Architecture
L19 Application ArchitectureL19 Application Architecture
L19 Application Architecture
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 
Cloud-based Test Microservices JavaOne 2014
Cloud-based Test Microservices JavaOne 2014Cloud-based Test Microservices JavaOne 2014
Cloud-based Test Microservices JavaOne 2014
 
How To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With KibanaHow To Implement Your Online Search Quality Evaluation With Kibana
How To Implement Your Online Search Quality Evaluation With Kibana
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

Frequently Bought Together Recommendations Based on Embeddings

  • 1. Embedding Based Frequently Bought Together Recommendations: A Production Use Case
  • 2. Agenda Mehmet Selman Sezgin Senior Data Engineer at Hepsiburada Ulukbek Attokurov Data Scientist at Hepsiburada
  • 3. Content ● Embedding Based Recommendations ● Modeling (Frequently Bought Together) ● Arithmetic operations on Embeddings ● Architecture Overview ● Serving Layer ● Experimental UI ● Online Metrics ● Conclusion
  • 4. ▪ 40+ categories▪ 200M+ visitors per month ▪ 30M+ products
  • 5. https://developers.google.com/machine-learning/crash-course/embeddings An embedding is a relatively low- dimensional space into which you can translate high-dimensional vectors.
  • 6. Co-occurrence vs Embeddings ▪ Uses raw co-occurrence statistics (Salton89 which is TF-IDF based metric) ▪ Uses behavior data (product views, order, add-to-cart) ▪ Generates item based recommendation ▪ Can not project users and items into the same space ▪ Uses advanced methods (Resnet, Inception, VGG, Word2Vec, Doc2Vec, AutoEncoders , BERT etc.) ▪ Generates item and user based recommendations ▪ Uses content information as product image, product description, product name, product attributes ▪ Image, text, behavior embeddings can be projected into the same space EmbeddingsCo-occurrence statistics
  • 7. Co-occurrence vs Embeddings ▪ Products are not recommended if they do not appear in the same context ▪ Context information such as products appeared in the same session, transaction etc. is not employed ▪ Content information (image, text etc.) is not used ▪ Similarity metrics can be calculated ▪ Use as features in unsupervised and supervised methods to optimize a business metric as propensity score. ▪ Use as features in Neural Networks as LSTM to model behavior of customers over time. ▪ Use as features in KNN to recommend the most similar items EmbeddingsCo-occurrence statistics
  • 8. Frequently Bought Together ▪ Goal: building recommendations to offer complementary products to our customers ▪ Challenges: ▪ Orders might contain products from diverse categories ▪ Generating recommendations using 30M+ products distributed over 40+ categories ▪ Tips: bought together does not mean that the items which co-occur in the sequence are similar ▪ Our model choice: Word2Vec
  • 9. Word2Vec ▪ Easy to use ▪ Easy to train ▪ Simple format of training samples ▪ User friendly libraries like Gensim ▪ A few parameters to optimize ▪ A lot of practical use cases
  • 10. Data Preparation ▪ Sentence ▪ Bag-of-Words ▪ “I am attending a conference” ▪ [“I”, “attending”, “conference”] ▪ User behavior (views, purchases etc.) ▪ Set of purchased items ▪ Orders: Keyboard, Computer, Mouse ▪ [“Keyboard”, “Computer”, “Mouse”] Frequently Bought TogetherNLP
  • 11. Data Preparation - Context Separation ▪ Sequences may contain the products from diverse categories ▪ [“Keyboard”, “Mouse”, “Shoes”, “Socks”] ▪ Sub-sequences may be created depending on labels as category, brand etc. ▪ [“Keyboard”, “Mouse”] and [“Shoes”,Socks”] Sub-sequenceSequence
  • 12. Code Sample for Data Preparation
  • 13. Word2Vec Parameters ▪ Random Search is applied to restrict a parameter search space ▪ Grid Search is applied to select optimal parameters ▪ Following Word2Vec parameters are optimized ▪ min_count: it is preferable to set lower otherwise coverage will decrease ▪ sample: the most frequent items dominates sequences; it might yield noisy embeddings; computationally not efficient. ▪ window: the length of context is set to be the maximum length of sequences since order of items in the sequence is random. ▪ size: tradeoff between network size, storage and computational cost; it is set to be as minimum as possible without losing the quality of recommendations ▪ iter: default value is very low and thus it is set to be between 50 and 80; model is not trained well when iter is set to low values;
  • 14. ▪ KNN algorithm is employed to find the most similar items ▪ Different similarity metrics are used : Euclidean, Cosine Similarity ▪ Euclidean distance measures the distance between two points and it is affected by the length of vectors. Thus, it is needed to normalize vectors in order to obtain more accurate results. ▪ Angle between two vectors determine the similarity of two vectors in cosine similarity. Similarity Functions
  • 15. Offline Metrics ▪ We need simple statistical metrics to be able to check the performance of the model and to tune parameters ▪ Precision@k ▪ (# of recommended items @k that are relevant) / (# of recommended items @k) ▪ Recall@k ▪ (# of recommended items @k that are relevant) / (total # of relevant items) ▪ HitRate@k ▪ (# of hits @k recommendations ) / (total # of test users)
  • 16. MLFlow Tracks the Model ▪ It is easy visually inspect the parameters ▪ Evaluation metrics can be investigated graphically ▪ It is easy to integrate into the source code ▪ It is effective for team collaboration through the central server
  • 19. Arithmetic Operations on Embeddings ▪ Is it possible to create new business dimensions using simple arithmetics on existing product embeddings? ▪ Similarity( AVG(Adidas_Shoes) , AVG(Nike_Shoes)) ≃ 1 ? ▪ Similarity( AVG(Camping tents) , AVG(Outdoor chairs)) ≃ 1 ? ▪ 1_Adidas_Shoe - Adidas_Brand + Nike_Brand ≃ 1_Similar_Nike_Shoe ? ▪ Relevancy is decreasing while entities in higher levels of hierarchy as categories(Sport, Baby, Women Clothes etc.) are represented using low level entities as products.
  • 20. Arithmetic Operations on Embeddings ▪ Brand similarity is relevant if a brand contains homogeneous products in terms of categories(Upper body clothes, Lower body clothes etc.) .
  • 22. Implementation Tips ▪ PySpark ▪ Enables to work with any python modelling library through spark to pandas dataframe conversion ▪ Pandas UDFs are very useful for parallelization ▪ Conversion from Spark DF to Pandas DF is still costly in terms of memory in spite of using Arrow
  • 23. Implementation Tips ▪ Model Quality ▪ Offline metrics, experimental UI and online metrics should be used for quality analysis ▪ Process ▪ Useful to use notebooks in experimental stage but it is preferable not to use in production ▪ Transition from experimental stage to production should have minimum cost ▪ Metric Validation should be a part of the flow, not a background analysis in production phase
  • 24. Model Serving Layer ▪ Approximate Nearest Neighbour Search Algorithms ▪ Annoy, Faiss, Hnswlib, ScaNN and many others ▪ Choose the library considering ▪ Open source benchmarks ▪ Programming language ▪ Similarity functions ▪ Distributed Index ▪ Incremental item insertion / deletion ▪ Ability to customize ▪ Our choice ▪ Hnswlib + Custom Post-Processing Layer http://ann-benchmarks.com/
  • 25. Model Serving Layer - HNSWLIB ▪ Trade-off between hierarchical navigable small world graph construction and search parameters ▪ Simple tree, weak search: less indexing time, less memory, less cpu usage, low recall ▪ Simple tree, strong search: less indexing time, less memory, more cpu usage, acceptable recall ▪ Complex tree, weak search: more indexing time, more memory, less cpu usage, high recall ▪ Complex tree, complex search: more indexing time, more memory, high cpu usage (waste), high recall ▪ Consider the following metrics to select optimal parameters ▪ Index size / Memory consumption ▪ Build time ▪ Cpu usage ▪ Query per seconds ▪ Recall
  • 26. Model Serving Layer - Post Processing ▪ Only similarity search will not be enough ▪ You will need to make some post-processing after retrieving result ▪ Implement your custom solution ▪ Make post-processing in the consuming service ▪ Use metadata and post-process supporting solution ▪ ex: opendistro-for-elasticsearch which supports hnswlib index and brings post-processing functions ▪ Every solution has it’s own pros. and cons. We implemented our custom solution which enhances the index with metadata and you can inject any filtering or ranking methods that you need.
  • 28. Experimental UI ▪ Reveal what you need ▪ Variant level exclusions ▪ Category level restrictions and exclusions ▪ Brand level restrictions and exclusions ▪ Price aware filters ▪ Gender filters ▪ Top-N category diverse ranking ▪ Etc. ▪ Implement in serving layer ▪ Experiment again
  • 29. Model Serving Layer - Performance ▪ Single instance ▪ 8K request per second ▪ Under 1ms (~400µs) ▪ Using assembly code instead of default distance function implementations may improve indexing and query performance considerably (vectorization)
  • 30. Model Serving Layer - Results on Production Two FBT Examples on Production (Shown after add to cart action)
  • 31. Online Metrics CTR CR Coverage Diversity Revenue Usage Ratio Order Ratio ▪ Placement Title Placement Location Position in Placement Category Levels Channel Time of Week/Day Gender DimensionsKey Metrics
  • 32. Online Metrics ▪ Calculate your overall impact ▪ Make your detailed analysis to increase domain knowledge which leads to improvement of your recommendations ▪ If you only rely on CTR and CR you may lose the big picture ▪ Popular products and their relatively higher CTRs may put you in a vicious circle in a narrow space. ▪ You should interpret CR metric differently for different categories.
  • 33. Take Aways ▪ Use embedding representations in recommendation domain as much as possible ▪ Word2Vec is easy to use and train (without using GPUs) but tune parameters wisely and asses offline metrics taking into account your business requirements. ▪ Be careful when applying arithmetic operations on embeddings ▪ Follow small cycles during the experimental and production stages ▪ Design serving layer considering your scale ▪ Use experimental UI and apply post-filtering for more relevant results ▪ Track online metrics to understand real impact of your solution