SlideShare a Scribd company logo
1 of 26
Arun Agarwal
Vaibhav Srivastava
Just In Time Personalization
Flipkart confidential - For Internal use only. Not to be shared externally.
Agenda
● Why personalize - The curious case of White Car
● How do we do it
● Just in Time personalization
● User view @ Flipkart
● System requirements for such scale
● The Architecture - Systems view - Scale
● Learnings on the way
Flipkart confidential - For Internal use only. Not to be shared externally.
A typical shopping activity
Online Shoppers
Most sold car so statistically - “popular”
Looking for
White Car
Flipkart confidential - For Internal use only. Not to be shared externally.
Who is really behind the app
Online Shoppers
Funky ‘Dudes’
different taste with huge purchase power
very limited in quantities
Professionals, Service
Common taste with limited purchase power - but masses
Executives
Refined taste with enormous purchase power - but no time
Above average availability in quantities
Strong Brand and Price Affinities
Flipkart confidential - For Internal use only. Not to be shared externally.
Be intelligent - cater to the person
Looking for
White Car
Looking for
White Car
Flipkart confidential - For Internal use only. Not to be shared externally.
User Activities - Such as Navigating on Flipkart,
Product Page Views, Card Clicks, Purchases
and Payments
1 2 3 4 5
User Insights Engine - Training Time & User Path Identification
Search Engine - Run Time (scoring time)
Inexpensive Category Expensive Category
Example Affinity Calculation and Usage - Price Affinity
A typical example
Flipkart confidential - For Internal use only. Not to be shared externally.
So what can we do more ?
Ok...so ?
B B B O B B B B O B B B
Chronology of Events...
Various PeopleVaried Intent
Flipkart confidential - For Internal use only. Not to be shared externally.
What do we need next?
We need to identify (and quickly) for whom is the current session all about
-Inf
Now
Few
hours
BigData
ML Algos
Historical
activities of
account
Flipkart confidential - For Internal use only. Not to be shared externally.
In-Session Affinities - Example 1 - Store
Same User Account
Long Term History
Gender - Male
Size - 11(UK)
Brand - Nike
Gender - Female
Size - 7(UK)
Brand - Adidas
Color - Pink
Male identified looking for - “Camera”
Session Affinities Learnings
For You
For You
Historical
Historical
Popular
Popular
Flipkart confidential - For Internal use only. Not to be shared externally.
In-Session Affinities - Example 2 - Gender
Same User Account
Long Term History
Gender - Male
Size - 11(UK)
Brand -Nike
Gender - Female
Size - 7(UK)
Brand - Adidas
Color - Pink
Female identified looking for - “Shoes”
Session Affinities Learnings
Flipkart confidential - For Internal use only. Not to be shared externally.
In-Session Affinities - Example 3 - Brand
Same User Account
Long Term History
Gender - Kid Girl
Age - 5
Brand - Disney
Is Parent - True
Female identified looking for - “Toys”
Session Affinities Learnings
Flipkart confidential - For Internal use only. Not to be shared externally.
In-Session Location - Example 4 - GeoSensitivity
Same User Account
Long Term History
Gender - Kid Girl
Age - 5
Brand - Disney
Location
Location identified is Bangalore
Session Affinities Learnings
Delhi - 0.5
Bangalore - 0.3
Flipkart confidential - For Internal use only. Not to be shared externally.
User view @ Flipkart
Store Affinity Price Affinity
Brand Affinity
Gender
Parent
Browse
Profile
Order Profile
Behaviour
Gender
JIT - Gender
Location
Home page
Push
notification
Advertizing
Search
Returns
Flipkart Pay
Later
Trust &
Safety
RFM
Age
Married
Student
Kids
Age/Gender
Name
LTV
JIT Store
Affinity
JIT Age
Category
Diversity
Demographics
Behavioural
Live
Aggregates
Just In Time
(JIT)
Flipkart confidential - For Internal use only. Not to be shared externally.
Capabilities Needed
Understand Historical events of a user to
identify his areas of interest and affinities
Need - Large volumes of data needs to be
computed
Real time systems, user needs response in few
secs
Need: Ability to serve user path scale with low
latency
Machine learning models help get predictions
Need: Near real time, predict the output
Glean through the current intent of the user
very very fast
Need - High scale of events, compute fast !
Flipkart confidential - For Internal use only. Not to be shared externally.
Win-Win formula for customer reflection*
What makes “all this” challenging -
100 Million products X 100 Million Users
And Challenge for System Designers and Problem Solvers:
*reflection on various dimensions e.g. price, brand,
gender
Why is this so hard?
Flipkart confidential - For Internal use only. Not to be shared externally.
How do we do it - The Architecture v1
App
Servers
Flipkart Data Platform
Persisted Raw
& Learnt Data
App
Servers
App
Servers
App
Servers
App
Servers
User Interactions
Search/Home/
Order
1
2
3
4
6
7
8
5
Insights computation
in Batch mode
Live Cache with Insights
Flipkart confidential - For Internal use only. Not to be shared externally.
App
Servers
Flipkart Data Platform
Persisted Raw
& Learnt Data
Fstream Platform
- InStream joins over
large windows
-Time partition
aggregates
-Compute over time
range
App
Servers
App
Servers
App
Servers
App
Servers
User Interactions
Search/Home/
Order
1
2
3
4
6
7
8
Machine
Learning
Platform
5
How do we do it - The Architecture v2
Live Cache with Insights
Freshness
Aggregation
Noise
Flipkart confidential - For Internal use only. Not to be shared externally.
Systems view
App
Serve
rs
Flipkart Data Platform
Streaming platform:
Converts Raw Events ->
Signals -> Insights
App
Serve
rs
App
Server
App
Servers
App
Servers
1
2
3
4
6
7
8
Train
Predict
Features
5
Live Cache with Insights
Flipkart confidential - For Internal use only. Not to be shared externally.
Static Storage of Quantiles/Ranks - Type 1
● Relying on User Insights Engine as fundamental capability around Customer Price Affinities.
● Rank per document for each bin is precomputed
● Ranks are kept in indexes where they can be used in sorting - makes it request path User
Insights caller.
● Complex Analog filters are used for scoring and mixing cross bucket results.
● Can scale to Millions of documents in Price sensitive ranking
Search Interaction with User Insights - I
Thousands RPS - BAU
Hundreds of Thousands RPS - peak
Flipkart confidential - For Internal use only. Not to be shared externally.
Static Storage of Quantiles/Ranks - Type 1 - Example
Search Interaction with User Insights - I
Affinity Segment 1
(Inexpensive)
Affinity Segment 2 Affinity Segment 3 Affinity Segment 4 Affinity Segment 5
(Expensive)
P1 (Bucket 1
Product)
10 8 6 7 6
P2 9 10 8 8 7
P3 8 9 10 9 8
P4 7 8 8 10 9
P5 (Bucket 5
Product)
6 7 6 8 10
Request time path:
User Affinity returned, Bucket 1 Buyer - Ranking - P1, P2, P3, P4, P5
User Affinity returned, Bucket 5 Buyer - Ranking - P5, P4, P3, P2, P1
Trade off with precision is amount of data stored and indexed.
Flipkart confidential - For Internal use only. Not to be shared externally.
Dynamic Re-ranking Based on Quantiles - Type 2
● Relying on User Insights Engine as fundamental capability around Customer Price Affinities.
● Real time price sensitive ranking - sensitive to offers/discounts/sale events
● No Pre-computation of rank, but on-demand rank computation and reranking
● Ranking of documents happen in real-time on a smaller batch of products.
● Can’t scale to entire recall set - makes it search response path User Insights caller.
Search Interaction with User Insights - II
Thousands RPS - BAU
Hundreds of Thousands RPS - peak
Flipkart confidential - For Internal use only. Not to be shared externally.
The Fallback
In Search side design - before consuming any Affinities
Q: What happens if User Insights Engine degrades in
production(sale?) OR if the confidence scores are low?
Solve: A. In both cases, we played conservative and
Intelligence of Affinities didn’t have any play. Also we used
Hystrix circuit-breaker for 10ms timeout on UIE affinities
call - This rendered default ordering as fallback.
Flipkart confidential - For Internal use only. Not to be shared externally.
Scale we hit
Interaction Ingestion rates Billion+ Request per day
Ingested data size 10s of Tbs of data per day
Processed data size 100s of GBs of data
Compute pipeline time p99 -> 10s p95 -> 5s
User Path Cache serving <10ms, with scaling to serve 100s of
K rps
Insights Computation Requirements
Search QPS 10s of thousands in
Request ps - peak
Search Latency nos p99 under 3 seconds
Insights Serving Requirements
Serving and Ranking on Search Requirements
Flipkart confidential - For Internal use only. Not to be shared externally.
Challenges & Learnings
The Domain Challenges Why were they important Our approach
Cold Start problems, no
insight
New users who don’t have a lot of
history
● Location centric insight act as proxy
● Device centric insight act as proxy
● JIT ensures the scores starts soon
● Fallback solution
Borderline bucket problems Same/Similar affinity in two
price/store/brand buckets
● Believe the data, product solve to show both
bucket centric products
Store labeling for Gender Labelled dataset for store was
needed to be able to predict
Gender JIT
● Analytical approach
● Hand-curated set
● Product labeling.
● Look at previous sessions
● Used MAD for label propagation
Broad Queries Broad queries don’t have a store-
path so mapping them to users JIT
store affinity, JIT gender is hard
● Look for other signals
● Filters applied
● Products seen and reverse map though it
increases one hop
Flipkart confidential - For Internal use only. Not to be shared externally.
Challenges & Learnings
The Scale / Tech
Challenges
Why were they important Our approach
Large volume of data MR job took 18 hours which means insight
refreshes only once a day
● Trim down the data
● Removed outlier
● Address top stores only 80-20 rule
● Move from HBase to HDFS writes
Low latency requirements on
JIT Insights
Short session time on ecommerce sites,
unless identified in first few seconds,
targeting with them is impossible
● Build Lean systems
● Tune Kafka
● Tune Spark & reduce micro-batching
● Scaled prediction systems
● Direct write to cache
See N buy M People like to see variety of brands/price
ranges and purchase one of them.
Affinities can get skewed
● Improve the ML algorithm
● AB Testing Hypothesis
● Incorporate learnings back to ML
Flipkart confidential - For Internal use only. Not to be shared externally.
Q&A

More Related Content

Similar to Slash n 2018 - Just In Time Personalization

Endouble Kennissessie analytics 2.0
Endouble Kennissessie analytics 2.0Endouble Kennissessie analytics 2.0
Endouble Kennissessie analytics 2.0Endouble
 
Big Data for the Little People | Guy Tomer
Big Data for the Little People | Guy TomerBig Data for the Little People | Guy Tomer
Big Data for the Little People | Guy TomerJessica Tams
 
Data - How to Use it & When by Square and Call Rail Product Leader
Data - How to Use it & When by Square and Call Rail Product LeaderData - How to Use it & When by Square and Call Rail Product Leader
Data - How to Use it & When by Square and Call Rail Product LeaderProduct School
 
How GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsHow GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsBernardo Srulzon
 
Artificial Intelligence Impact - What AI is (and isn't) Helping Startups Scal...
Artificial Intelligence Impact - What AI is (and isn't) Helping Startups Scal...Artificial Intelligence Impact - What AI is (and isn't) Helping Startups Scal...
Artificial Intelligence Impact - What AI is (and isn't) Helping Startups Scal...Daniel Faggella
 
WSO2Con USA 2015: Keynote - The Future of Real-Time Analytics and IoT
WSO2Con USA 2015: Keynote - The Future of Real-Time Analytics and IoTWSO2Con USA 2015: Keynote - The Future of Real-Time Analytics and IoT
WSO2Con USA 2015: Keynote - The Future of Real-Time Analytics and IoTWSO2
 
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...Authoritas
 
SMX Munich 2018 - In A Nutshell: Advanced Shopping Campaigns
SMX Munich 2018 - In A Nutshell: Advanced Shopping CampaignsSMX Munich 2018 - In A Nutshell: Advanced Shopping Campaigns
SMX Munich 2018 - In A Nutshell: Advanced Shopping CampaignsLiam Wade
 
Generating Quality Hypotheses For Higher Uplifts | Masters of Conversion by VWO
Generating Quality Hypotheses For Higher Uplifts | Masters of Conversion by VWOGenerating Quality Hypotheses For Higher Uplifts | Masters of Conversion by VWO
Generating Quality Hypotheses For Higher Uplifts | Masters of Conversion by VWOVWO
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Ayat Shukairy (Co-Founder, Invesp) - Why "Customer First" Fails, And What To ...
Ayat Shukairy (Co-Founder, Invesp) - Why "Customer First" Fails, And What To ...Ayat Shukairy (Co-Founder, Invesp) - Why "Customer First" Fails, And What To ...
Ayat Shukairy (Co-Founder, Invesp) - Why "Customer First" Fails, And What To ...Business of Software Conference
 
RFID Jewelry Management System - Irys Pte. Ltd.
RFID Jewelry Management System - Irys Pte. Ltd.RFID Jewelry Management System - Irys Pte. Ltd.
RFID Jewelry Management System - Irys Pte. Ltd.VidhyaMehta
 
Analytics and AI based Retention in e-commerce
Analytics and AI based Retention in e-commerceAnalytics and AI based Retention in e-commerce
Analytics and AI based Retention in e-commerceCleverTap
 
Designing a Program that Increases Your Intelligent Automation “Velocity”
Designing a Program that Increases Your Intelligent Automation “Velocity”Designing a Program that Increases Your Intelligent Automation “Velocity”
Designing a Program that Increases Your Intelligent Automation “Velocity”ScottMadden, Inc.
 
Artificial Intelligence in Lumber Retail (Home Depot, Lowe’s, etc)
Artificial Intelligence in Lumber Retail (Home Depot, Lowe’s, etc)Artificial Intelligence in Lumber Retail (Home Depot, Lowe’s, etc)
Artificial Intelligence in Lumber Retail (Home Depot, Lowe’s, etc)Daniel Faggella
 
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptxKickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptxElyada Wigati Pramaresti
 
Predictive Lead Scoring - What's All The Buzz About? [SF Marketo User Group P...
Predictive Lead Scoring - What's All The Buzz About? [SF Marketo User Group P...Predictive Lead Scoring - What's All The Buzz About? [SF Marketo User Group P...
Predictive Lead Scoring - What's All The Buzz About? [SF Marketo User Group P...Mintigo1
 

Similar to Slash n 2018 - Just In Time Personalization (20)

Endouble Kennissessie analytics 2.0
Endouble Kennissessie analytics 2.0Endouble Kennissessie analytics 2.0
Endouble Kennissessie analytics 2.0
 
Big Data for the Little People | Guy Tomer
Big Data for the Little People | Guy TomerBig Data for the Little People | Guy Tomer
Big Data for the Little People | Guy Tomer
 
Data - How to Use it & When by Square and Call Rail Product Leader
Data - How to Use it & When by Square and Call Rail Product LeaderData - How to Use it & When by Square and Call Rail Product Leader
Data - How to Use it & When by Square and Call Rail Product Leader
 
How GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisionsHow GetNinjas uses data to make smarter product decisions
How GetNinjas uses data to make smarter product decisions
 
Artificial Intelligence Impact - What AI is (and isn't) Helping Startups Scal...
Artificial Intelligence Impact - What AI is (and isn't) Helping Startups Scal...Artificial Intelligence Impact - What AI is (and isn't) Helping Startups Scal...
Artificial Intelligence Impact - What AI is (and isn't) Helping Startups Scal...
 
Search Analytics
Search AnalyticsSearch Analytics
Search Analytics
 
WSO2Con USA 2015: Keynote - The Future of Real-Time Analytics and IoT
WSO2Con USA 2015: Keynote - The Future of Real-Time Analytics and IoTWSO2Con USA 2015: Keynote - The Future of Real-Time Analytics and IoT
WSO2Con USA 2015: Keynote - The Future of Real-Time Analytics and IoT
 
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
Big Data graph Clustering with Laurence O'Toole - Digital Marketing Show, Nov...
 
SMX Munich 2018 - In A Nutshell: Advanced Shopping Campaigns
SMX Munich 2018 - In A Nutshell: Advanced Shopping CampaignsSMX Munich 2018 - In A Nutshell: Advanced Shopping Campaigns
SMX Munich 2018 - In A Nutshell: Advanced Shopping Campaigns
 
Generating Quality Hypotheses For Higher Uplifts | Masters of Conversion by VWO
Generating Quality Hypotheses For Higher Uplifts | Masters of Conversion by VWOGenerating Quality Hypotheses For Higher Uplifts | Masters of Conversion by VWO
Generating Quality Hypotheses For Higher Uplifts | Masters of Conversion by VWO
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Share and Tell Stanford 2016
Share and Tell Stanford 2016Share and Tell Stanford 2016
Share and Tell Stanford 2016
 
Ayat Shukairy (Co-Founder, Invesp) - Why "Customer First" Fails, And What To ...
Ayat Shukairy (Co-Founder, Invesp) - Why "Customer First" Fails, And What To ...Ayat Shukairy (Co-Founder, Invesp) - Why "Customer First" Fails, And What To ...
Ayat Shukairy (Co-Founder, Invesp) - Why "Customer First" Fails, And What To ...
 
RFID Jewelry Management System - Irys Pte. Ltd.
RFID Jewelry Management System - Irys Pte. Ltd.RFID Jewelry Management System - Irys Pte. Ltd.
RFID Jewelry Management System - Irys Pte. Ltd.
 
Analytics and AI based Retention in e-commerce
Analytics and AI based Retention in e-commerceAnalytics and AI based Retention in e-commerce
Analytics and AI based Retention in e-commerce
 
Designing a Program that Increases Your Intelligent Automation “Velocity”
Designing a Program that Increases Your Intelligent Automation “Velocity”Designing a Program that Increases Your Intelligent Automation “Velocity”
Designing a Program that Increases Your Intelligent Automation “Velocity”
 
Artificial Intelligence in Lumber Retail (Home Depot, Lowe’s, etc)
Artificial Intelligence in Lumber Retail (Home Depot, Lowe’s, etc)Artificial Intelligence in Lumber Retail (Home Depot, Lowe’s, etc)
Artificial Intelligence in Lumber Retail (Home Depot, Lowe’s, etc)
 
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptxKickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
Kickstart Career as Data Analyst - Elyada Wigati Pramaresti.pptx
 
Predictive Lead Scoring - What's All The Buzz About? [SF Marketo User Group P...
Predictive Lead Scoring - What's All The Buzz About? [SF Marketo User Group P...Predictive Lead Scoring - What's All The Buzz About? [SF Marketo User Group P...
Predictive Lead Scoring - What's All The Buzz About? [SF Marketo User Group P...
 
Retail Insights Profile
Retail Insights ProfileRetail Insights Profile
Retail Insights Profile
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Slash n 2018 - Just In Time Personalization

  • 1. Arun Agarwal Vaibhav Srivastava Just In Time Personalization
  • 2. Flipkart confidential - For Internal use only. Not to be shared externally. Agenda ● Why personalize - The curious case of White Car ● How do we do it ● Just in Time personalization ● User view @ Flipkart ● System requirements for such scale ● The Architecture - Systems view - Scale ● Learnings on the way
  • 3. Flipkart confidential - For Internal use only. Not to be shared externally. A typical shopping activity Online Shoppers Most sold car so statistically - “popular” Looking for White Car
  • 4. Flipkart confidential - For Internal use only. Not to be shared externally. Who is really behind the app Online Shoppers Funky ‘Dudes’ different taste with huge purchase power very limited in quantities Professionals, Service Common taste with limited purchase power - but masses Executives Refined taste with enormous purchase power - but no time Above average availability in quantities Strong Brand and Price Affinities
  • 5. Flipkart confidential - For Internal use only. Not to be shared externally. Be intelligent - cater to the person Looking for White Car Looking for White Car
  • 6. Flipkart confidential - For Internal use only. Not to be shared externally. User Activities - Such as Navigating on Flipkart, Product Page Views, Card Clicks, Purchases and Payments 1 2 3 4 5 User Insights Engine - Training Time & User Path Identification Search Engine - Run Time (scoring time) Inexpensive Category Expensive Category Example Affinity Calculation and Usage - Price Affinity A typical example
  • 7. Flipkart confidential - For Internal use only. Not to be shared externally. So what can we do more ? Ok...so ? B B B O B B B B O B B B Chronology of Events... Various PeopleVaried Intent
  • 8. Flipkart confidential - For Internal use only. Not to be shared externally. What do we need next? We need to identify (and quickly) for whom is the current session all about -Inf Now Few hours BigData ML Algos Historical activities of account
  • 9. Flipkart confidential - For Internal use only. Not to be shared externally. In-Session Affinities - Example 1 - Store Same User Account Long Term History Gender - Male Size - 11(UK) Brand - Nike Gender - Female Size - 7(UK) Brand - Adidas Color - Pink Male identified looking for - “Camera” Session Affinities Learnings For You For You Historical Historical Popular Popular
  • 10. Flipkart confidential - For Internal use only. Not to be shared externally. In-Session Affinities - Example 2 - Gender Same User Account Long Term History Gender - Male Size - 11(UK) Brand -Nike Gender - Female Size - 7(UK) Brand - Adidas Color - Pink Female identified looking for - “Shoes” Session Affinities Learnings
  • 11. Flipkart confidential - For Internal use only. Not to be shared externally. In-Session Affinities - Example 3 - Brand Same User Account Long Term History Gender - Kid Girl Age - 5 Brand - Disney Is Parent - True Female identified looking for - “Toys” Session Affinities Learnings
  • 12. Flipkart confidential - For Internal use only. Not to be shared externally. In-Session Location - Example 4 - GeoSensitivity Same User Account Long Term History Gender - Kid Girl Age - 5 Brand - Disney Location Location identified is Bangalore Session Affinities Learnings Delhi - 0.5 Bangalore - 0.3
  • 13. Flipkart confidential - For Internal use only. Not to be shared externally. User view @ Flipkart Store Affinity Price Affinity Brand Affinity Gender Parent Browse Profile Order Profile Behaviour Gender JIT - Gender Location Home page Push notification Advertizing Search Returns Flipkart Pay Later Trust & Safety RFM Age Married Student Kids Age/Gender Name LTV JIT Store Affinity JIT Age Category Diversity Demographics Behavioural Live Aggregates Just In Time (JIT)
  • 14. Flipkart confidential - For Internal use only. Not to be shared externally. Capabilities Needed Understand Historical events of a user to identify his areas of interest and affinities Need - Large volumes of data needs to be computed Real time systems, user needs response in few secs Need: Ability to serve user path scale with low latency Machine learning models help get predictions Need: Near real time, predict the output Glean through the current intent of the user very very fast Need - High scale of events, compute fast !
  • 15. Flipkart confidential - For Internal use only. Not to be shared externally. Win-Win formula for customer reflection* What makes “all this” challenging - 100 Million products X 100 Million Users And Challenge for System Designers and Problem Solvers: *reflection on various dimensions e.g. price, brand, gender Why is this so hard?
  • 16. Flipkart confidential - For Internal use only. Not to be shared externally. How do we do it - The Architecture v1 App Servers Flipkart Data Platform Persisted Raw & Learnt Data App Servers App Servers App Servers App Servers User Interactions Search/Home/ Order 1 2 3 4 6 7 8 5 Insights computation in Batch mode Live Cache with Insights
  • 17. Flipkart confidential - For Internal use only. Not to be shared externally. App Servers Flipkart Data Platform Persisted Raw & Learnt Data Fstream Platform - InStream joins over large windows -Time partition aggregates -Compute over time range App Servers App Servers App Servers App Servers User Interactions Search/Home/ Order 1 2 3 4 6 7 8 Machine Learning Platform 5 How do we do it - The Architecture v2 Live Cache with Insights Freshness Aggregation Noise
  • 18. Flipkart confidential - For Internal use only. Not to be shared externally. Systems view App Serve rs Flipkart Data Platform Streaming platform: Converts Raw Events -> Signals -> Insights App Serve rs App Server App Servers App Servers 1 2 3 4 6 7 8 Train Predict Features 5 Live Cache with Insights
  • 19. Flipkart confidential - For Internal use only. Not to be shared externally. Static Storage of Quantiles/Ranks - Type 1 ● Relying on User Insights Engine as fundamental capability around Customer Price Affinities. ● Rank per document for each bin is precomputed ● Ranks are kept in indexes where they can be used in sorting - makes it request path User Insights caller. ● Complex Analog filters are used for scoring and mixing cross bucket results. ● Can scale to Millions of documents in Price sensitive ranking Search Interaction with User Insights - I Thousands RPS - BAU Hundreds of Thousands RPS - peak
  • 20. Flipkart confidential - For Internal use only. Not to be shared externally. Static Storage of Quantiles/Ranks - Type 1 - Example Search Interaction with User Insights - I Affinity Segment 1 (Inexpensive) Affinity Segment 2 Affinity Segment 3 Affinity Segment 4 Affinity Segment 5 (Expensive) P1 (Bucket 1 Product) 10 8 6 7 6 P2 9 10 8 8 7 P3 8 9 10 9 8 P4 7 8 8 10 9 P5 (Bucket 5 Product) 6 7 6 8 10 Request time path: User Affinity returned, Bucket 1 Buyer - Ranking - P1, P2, P3, P4, P5 User Affinity returned, Bucket 5 Buyer - Ranking - P5, P4, P3, P2, P1 Trade off with precision is amount of data stored and indexed.
  • 21. Flipkart confidential - For Internal use only. Not to be shared externally. Dynamic Re-ranking Based on Quantiles - Type 2 ● Relying on User Insights Engine as fundamental capability around Customer Price Affinities. ● Real time price sensitive ranking - sensitive to offers/discounts/sale events ● No Pre-computation of rank, but on-demand rank computation and reranking ● Ranking of documents happen in real-time on a smaller batch of products. ● Can’t scale to entire recall set - makes it search response path User Insights caller. Search Interaction with User Insights - II Thousands RPS - BAU Hundreds of Thousands RPS - peak
  • 22. Flipkart confidential - For Internal use only. Not to be shared externally. The Fallback In Search side design - before consuming any Affinities Q: What happens if User Insights Engine degrades in production(sale?) OR if the confidence scores are low? Solve: A. In both cases, we played conservative and Intelligence of Affinities didn’t have any play. Also we used Hystrix circuit-breaker for 10ms timeout on UIE affinities call - This rendered default ordering as fallback.
  • 23. Flipkart confidential - For Internal use only. Not to be shared externally. Scale we hit Interaction Ingestion rates Billion+ Request per day Ingested data size 10s of Tbs of data per day Processed data size 100s of GBs of data Compute pipeline time p99 -> 10s p95 -> 5s User Path Cache serving <10ms, with scaling to serve 100s of K rps Insights Computation Requirements Search QPS 10s of thousands in Request ps - peak Search Latency nos p99 under 3 seconds Insights Serving Requirements Serving and Ranking on Search Requirements
  • 24. Flipkart confidential - For Internal use only. Not to be shared externally. Challenges & Learnings The Domain Challenges Why were they important Our approach Cold Start problems, no insight New users who don’t have a lot of history ● Location centric insight act as proxy ● Device centric insight act as proxy ● JIT ensures the scores starts soon ● Fallback solution Borderline bucket problems Same/Similar affinity in two price/store/brand buckets ● Believe the data, product solve to show both bucket centric products Store labeling for Gender Labelled dataset for store was needed to be able to predict Gender JIT ● Analytical approach ● Hand-curated set ● Product labeling. ● Look at previous sessions ● Used MAD for label propagation Broad Queries Broad queries don’t have a store- path so mapping them to users JIT store affinity, JIT gender is hard ● Look for other signals ● Filters applied ● Products seen and reverse map though it increases one hop
  • 25. Flipkart confidential - For Internal use only. Not to be shared externally. Challenges & Learnings The Scale / Tech Challenges Why were they important Our approach Large volume of data MR job took 18 hours which means insight refreshes only once a day ● Trim down the data ● Removed outlier ● Address top stores only 80-20 rule ● Move from HBase to HDFS writes Low latency requirements on JIT Insights Short session time on ecommerce sites, unless identified in first few seconds, targeting with them is impossible ● Build Lean systems ● Tune Kafka ● Tune Spark & reduce micro-batching ● Scaled prediction systems ● Direct write to cache See N buy M People like to see variety of brands/price ranges and purchase one of them. Affinities can get skewed ● Improve the ML algorithm ● AB Testing Hypothesis ● Incorporate learnings back to ML
  • 26. Flipkart confidential - For Internal use only. Not to be shared externally. Q&A