SlideShare a Scribd company logo
1 of 43
What’s in a Query?
Understanding query
intent
Bharat Thakarar
Subhadeep Maji
Mohit Kumar
Flipkart confidential - For Internal use only. Not to be shared externally.
E-commerce Search
Query: rectangle
room mat
Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
E-commerce Search
Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
○ Products have key-value attributes
■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’;
Place of use: ‘Living room’
E-commerce Search
Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
○ Products have key-value attributes
■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’; Place of use: ‘Living room’
● Intent of a query: ‘rectangle room mat’
○ Store: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets &
Rugs’
○ Attribute Tagging: <shape>: ‘rectangle’ <place of use>: ‘living
room’ <store>: ‘mat’
E-commerce Search
Flipkart confidential - For Internal use only. Not to be shared externally.
Life of a query - simplified view
Ranking
- Relevance
- Query independent
signals
- ...
Augmentation
- Normalisation
- Spell Correction
- Phrasing
- Stemming
- Synonymization
- ...
Intent Understanding
- Store identification
- Intent Tagging
- …
Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Customer Focused)
Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Customer Focused)
Lifestyle
Bigger Images, Less Text
Mobiles & Large
Spec heavy
Furniture
Aspect Ratio, Swatches
Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Internal)
● Establishes context for the query attribute tagging
○ Restricts labeling space
● Backend efficiency
● ...
Flipkart confidential - For Internal use only. Not to be shared externally.
● Source: click - log data (query -> products clicked ->
stores)
● Statistical aggregation of click measure
● Empirically determined confidence level for redirection
○ Sample data: ‘rectangle room mat’ : ‘Home Furnishing’ -> ‘Floor
Coverings’ -> ‘Carpets & Rugs’ : 95% confidence
Query to Store identification :
Statistical approach (baseline)
Flipkart confidential - For Internal use only. Not to be shared externally.
● Works on exact queries, memorises (no generalization)
● Cannot learn anything useful for verticals where query
volume and product clicks are low
Statistical approach -
Challenges
Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification
● ML problem setup
○ Short text multi-label multi-class classification
○ Order of 10s L1 classes
● Model: Linear SVM (One vs All)
● Feature sets
○ BOW features (tf.idf)
○ Store name overlap features (tf.idf)
Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Results
Before After
Query: canvas car body covers
Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Results
Before After
Query: T-Series led tv
Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Impact
● Backend metrics
○ Nearly 40% drop in queries without
store (saving valuable compute resources)
● First user path deployment of
ML platform’s modelhost
Backend requests without stores
Flipkart confidential - For Internal use only. Not to be shared externally.
● ML problem setup
○ Short text *multi-label* multi-class classification
○ Order of 1000s leaf stores
Leaf level store identification
Flipkart confidential - For Internal use only. Not to be shared externally.
● ML problem setup
○ Short text *multi-label* multi-class classification
○ Order of 1000s leaf stores
● Challenges in extending L1 model:
○ Data sparsity
■ Linear SVM (One vs All) scaling for 1000s of classes
○ BOW features (no generalisation, no sharing)
Leaf level store identification
Flipkart confidential - For Internal use only. Not to be shared externally.
● Approach: fastText
● Key idea(s):
○ Leverage word2vec (cbow) model
where instead of target word use label
instead
○ Hierarchical softmax - scaling to large
number of classes
Leaf level store identification
fastText: https://github.com/facebookresearch/fastText
Flipkart confidential - For Internal use only. Not to be shared externally.
Leaf level store identification:
How were challenges addressed?
● Data sparsity
○ Using catalog data for seeding the embeddings
○ Helps learn with less amount of labeled data
● BOW features (no generalisation, no sharing)
○ Embeddings help in the abstraction
Flipkart confidential - For Internal use only. Not to be shared externally.
● Significant A/B metrics
○ +3 bps Search Conversion
○ +2 bps Visit Conversion
● SQA analysis (PBAGE): 8% improvement
Leaf level store identification - Impact
Flipkart confidential - For Internal use only. Not to be shared externally.
● Classifier trained only on catalog space (lot more labeled
data) didn’t work well in query space as-is
● Seed embeddings trained with store context in catalog
space work
Leaf level store identification:
Some Learnings
Flipkart confidential - For Internal use only. Not to be shared externally.
Life of a query - simplified view
Ranking
- Relevance
- Query independent
signals
- ...
Augmentation
- Normalisation
- Spell Correction
- Phrasing
- Stemming
- Synonymization
- ...
Intent Understanding
- Store identification
- Intent Tagging
- …
Flipkart confidential - For Internal use only. Not to be shared externally.
Given a query predict the attributes that best describe
the terms (chunks) in the query
Query: kids party dress 4-5 years pack of 2
Tagging <ideal_for>: kids <occasion>: party <store>:
dress <size>: 4-5 years <pack_of>: pack of 2
Query Intent Tagging
Flipkart confidential - For Internal use only. Not to be shared externally.
● Use Query product click through logs
● For each query, click product pair
○ Identify the attributes matched from product description
to query tokens
○ Store the fraction of the match to attributes for each
query token
Statistical Aggregation
Flipkart confidential - For Internal use only. Not to be shared externally.
● Works on query token space, weak generalization
● Considers all clicks equally but clicks are noisy
● Cannot learn anything useful for verticals where
query volume is low
Limitations
Flipkart confidential - For Internal use only. Not to be shared externally.
● samsung galaxy j7
○ brand model_name model_name
● samsung galaxy j7 covers
○ designed_for designed_for category
Problem Complexity
Flipkart confidential - For Internal use only. Not to be shared externally.
Some Exploratory Analysis
● ~40 % catalog
tokens cannot be
identified
unambiguously
● “Cotton” appears
in vocabulary of 23
attributes in
“HomeFurnishing”
Flipkart confidential - For Internal use only. Not to be shared externally.
● Attribute labelling at a position depends on tokens at
other positions in the query
● Attributes have affinity (brand, model_name) more
likely than (brand, color) in mobiles
Is Sequence necessary?
Flipkart confidential - For Internal use only. Not to be shared externally.
● Let X be the query s.t X = {x1, x2, . . . , xn} where xj is
a query token
● Let Y be the intent s.t Y = {y1, y2, . . . , yn} where yj ∈
attributes
Sequence Formulation
Flipkart confidential - For Internal use only. Not to be shared externally.
Supervised - Conditional
Random Field
Flipkart confidential - For Internal use only. Not to be shared externally.
● looks_like_attribute
○ Attributes like brand, color, model_name
○ Multinomial NB to generate features
● Defined over window at each position in query
● Global feature like is_alnum, is_shortword
Feature Design
Flipkart confidential - For Internal use only. Not to be shared externally.
● Moving from query token space to attribute feature
space, improves generalization
● Can generate multiple partial labellings, better
ranking of search results
What did we gain ?
Flipkart confidential - For Internal use only. Not to be shared externally.
● Significant A/B metrics
○ +5 bps Search Conversion
○ +2 bps Visit Conversion
● SQA analysis (PBAGE): 4% improvement
What did we gain ? Metrics
Flipkart confidential - For Internal use only. Not to be shared externally.
Query : samsung galaxy s7 edge 2017
Some Examples
AfterBefore
Flipkart confidential - For Internal use only. Not to be shared externally.
Query: Watches with steel belt with square dial
Some Examples..
AfterBefore
Flipkart confidential - For Internal use only. Not to be shared externally.
● Low volume of high confidence labeled data in some
verticals
● Click noise, users sometimes click randomly,
especially for lifestyle
● The labeled data for CRF suffers from above issues
Why CRF is not enough ?
Flipkart confidential - For Internal use only. Not to be shared externally.
Some Exploratory Analysis...
● Labeled data has low
coverage of on
unique queries ~ 10
%
● A supervised model
will fail to generalize
for these stores
Flipkart confidential - For Internal use only. Not to be shared externally.
● Generative vs a Discriminative setting like CRF
● Learning from unlabeled queries
● Catalog and limited labeled data used as weak
supervision
● WIP… research paper … production
Weakly-Supervised Models
Flipkart confidential - For Internal use only. Not to be shared externally.
Summary
Flipkart confidential - For Internal use only. Not to be shared externally.
● Pattern of solution evolution
○ Statistical -> Supervised -> Supervised ++ (side information)
● Common challenges
○ Not enough labeled data (side information / weak supervision)
○ Label/presentation bias
Summary
Flipkart confidential - For Internal use only. Not to be shared externally.
Query: ‘diamond ring’
Flipkart confidential - For Internal use only. Not to be shared externally.
Query: ‘diamond ring’
Flipkart confidential - For Internal use only. Not to be shared externally.
Questions ?

More Related Content

What's hot

Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorialGanesh Venkataraman
 
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...Massimo Quadrana
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Learned Embeddings for Search and Discovery at Instacart
Learned Embeddings for  Search and Discovery at InstacartLearned Embeddings for  Search and Discovery at Instacart
Learned Embeddings for Search and Discovery at InstacartSharath Rao
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksZimin Park
 
Near RealTime search @Flipkart
Near RealTime search @FlipkartNear RealTime search @Flipkart
Near RealTime search @FlipkartUmesh Prasad
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...
The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...
The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...Homin Lee
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
bm25 demystified
bm25 demystifiedbm25 demystified
bm25 demystifiedFan Robbin
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisSunil Kandari
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
M.S. Thesis Defense
M.S. Thesis DefenseM.S. Thesis Defense
M.S. Thesis Defensepbecker1987
 

What's hot (20)

Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorial
 
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...
Personalizing Session-based Recommendations with Hierarchical Recurrent Neura...
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Learned Embeddings for Search and Discovery at Instacart
Learned Embeddings for  Search and Discovery at InstacartLearned Embeddings for  Search and Discovery at Instacart
Learned Embeddings for Search and Discovery at Instacart
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
 
Near RealTime search @Flipkart
Near RealTime search @FlipkartNear RealTime search @Flipkart
Near RealTime search @Flipkart
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
PYthon
PYthonPYthon
PYthon
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Deep Learning Recommender Systems
Deep Learning Recommender SystemsDeep Learning Recommender Systems
Deep Learning Recommender Systems
 
The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...
The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...
The Observability Graph; Knowledge Graphs for Automated Infrastructure Observ...
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
bm25 demystified
bm25 demystifiedbm25 demystified
bm25 demystified
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Tutorial on Web Scraping in Python
Tutorial on Web Scraping in PythonTutorial on Web Scraping in Python
Tutorial on Web Scraping in Python
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
M.S. Thesis Defense
M.S. Thesis DefenseM.S. Thesis Defense
M.S. Thesis Defense
 

Similar to What’s in a Query? Understanding query intent

Actionable Insight Extraction from Reviews and Images - slash n 2019
Actionable Insight Extraction from Reviews and Images - slash n 2019Actionable Insight Extraction from Reviews and Images - slash n 2019
Actionable Insight Extraction from Reviews and Images - slash n 2019FlipkartStories
 
Slash n 2018 - Just In Time Personalization
Slash n  2018 - Just In Time Personalization Slash n  2018 - Just In Time Personalization
Slash n 2018 - Just In Time Personalization FlipkartStories
 
Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkartStories
 
Emergency SEO: How To Recover When SERP Rankings Suddenly Drop
Emergency SEO: How To Recover When SERP Rankings Suddenly DropEmergency SEO: How To Recover When SERP Rankings Suddenly Drop
Emergency SEO: How To Recover When SERP Rankings Suddenly DropSearch Engine Journal
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Somnath Banerjee
 
JAB2012 Smart Search Presentation
JAB2012 Smart Search PresentationJAB2012 Smart Search Presentation
JAB2012 Smart Search PresentationChris Davenport
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...Databricks
 
How to Use AI in Your Digital Marketing (1).pdf
How to Use AI in Your Digital Marketing (1).pdfHow to Use AI in Your Digital Marketing (1).pdf
How to Use AI in Your Digital Marketing (1).pdfVolume Nine
 
Course outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seoCourse outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seozameerulhasaann
 
Selling on Walmart.com: Navigating Through the Extensive Style Guide
Selling on Walmart.com: Navigating Through the Extensive Style GuideSelling on Walmart.com: Navigating Through the Extensive Style Guide
Selling on Walmart.com: Navigating Through the Extensive Style GuideTinuiti
 
Startup Secrets - Getting Behind the Perfect Investor Pitch
Startup Secrets - Getting Behind the Perfect Investor PitchStartup Secrets - Getting Behind the Perfect Investor Pitch
Startup Secrets - Getting Behind the Perfect Investor PitchMichael Skok
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETTanyaRaina3
 
ChatGPT For Business Use
ChatGPT For Business UseChatGPT For Business Use
ChatGPT For Business UseSanjay Willie
 
WWV2015: Findologic Matthias Heimbeckjan
WWV2015: Findologic Matthias HeimbeckjanWWV2015: Findologic Matthias Heimbeckjan
WWV2015: Findologic Matthias Heimbeckjanwebwinkelvakdag
 
How to Boost Your SEO using Structured Data
How to Boost Your SEO using Structured DataHow to Boost Your SEO using Structured Data
How to Boost Your SEO using Structured DataMartin Tang
 
10 Ways to Get More from Your Pardot Solution
10 Ways to Get More from Your Pardot Solution10 Ways to Get More from Your Pardot Solution
10 Ways to Get More from Your Pardot SolutionPardot
 

Similar to What’s in a Query? Understanding query intent (20)

Actionable Insight Extraction from Reviews and Images - slash n 2019
Actionable Insight Extraction from Reviews and Images - slash n 2019Actionable Insight Extraction from Reviews and Images - slash n 2019
Actionable Insight Extraction from Reviews and Images - slash n 2019
 
Slash n 2018 - Just In Time Personalization
Slash n  2018 - Just In Time Personalization Slash n  2018 - Just In Time Personalization
Slash n 2018 - Just In Time Personalization
 
Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 reprise
 
Emergency SEO: How To Recover When SERP Rankings Suddenly Drop
Emergency SEO: How To Recover When SERP Rankings Suddenly DropEmergency SEO: How To Recover When SERP Rankings Suddenly Drop
Emergency SEO: How To Recover When SERP Rankings Suddenly Drop
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
 
JAB2012 Smart Search Presentation
JAB2012 Smart Search PresentationJAB2012 Smart Search Presentation
JAB2012 Smart Search Presentation
 
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr... Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
Using Spark-Solr at Scale: Productionizing Spark for Search with Apache Solr...
 
How to Use AI in Your Digital Marketing (1).pdf
How to Use AI in Your Digital Marketing (1).pdfHow to Use AI in Your Digital Marketing (1).pdf
How to Use AI in Your Digital Marketing (1).pdf
 
Timothy Resnik - Advanced Search Summit Napa 2021
Timothy Resnik - Advanced Search Summit Napa 2021Timothy Resnik - Advanced Search Summit Napa 2021
Timothy Resnik - Advanced Search Summit Napa 2021
 
Course outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seoCourse outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seo
 
Keyword research webinar 256
Keyword research   webinar 256Keyword research   webinar 256
Keyword research webinar 256
 
Selling on Walmart.com: Navigating Through the Extensive Style Guide
Selling on Walmart.com: Navigating Through the Extensive Style GuideSelling on Walmart.com: Navigating Through the Extensive Style Guide
Selling on Walmart.com: Navigating Through the Extensive Style Guide
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 
Startup Secrets - Getting Behind the Perfect Investor Pitch
Startup Secrets - Getting Behind the Perfect Investor PitchStartup Secrets - Getting Behind the Perfect Investor Pitch
Startup Secrets - Getting Behind the Perfect Investor Pitch
 
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOETHacktoberFestPune - DSC MESCOE x DSC PVGCOET
HacktoberFestPune - DSC MESCOE x DSC PVGCOET
 
ChatGPT For Business Use
ChatGPT For Business UseChatGPT For Business Use
ChatGPT For Business Use
 
WWV2015: Findologic Matthias Heimbeckjan
WWV2015: Findologic Matthias HeimbeckjanWWV2015: Findologic Matthias Heimbeckjan
WWV2015: Findologic Matthias Heimbeckjan
 
How to Boost Your SEO using Structured Data
How to Boost Your SEO using Structured DataHow to Boost Your SEO using Structured Data
How to Boost Your SEO using Structured Data
 
Amazon mp
Amazon mpAmazon mp
Amazon mp
 
10 Ways to Get More from Your Pardot Solution
10 Ways to Get More from Your Pardot Solution10 Ways to Get More from Your Pardot Solution
10 Ways to Get More from Your Pardot Solution
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

What’s in a Query? Understanding query intent

  • 1. What’s in a Query? Understanding query intent Bharat Thakarar Subhadeep Maji Mohit Kumar
  • 2. Flipkart confidential - For Internal use only. Not to be shared externally. E-commerce Search Query: rectangle room mat
  • 3. Flipkart confidential - For Internal use only. Not to be shared externally. ● Search over structured product catalog ○ Products belong to a ‘store’ ■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ E-commerce Search
  • 4. Flipkart confidential - For Internal use only. Not to be shared externally. ● Search over structured product catalog ○ Products belong to a ‘store’ ■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ ○ Products have key-value attributes ■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’; Place of use: ‘Living room’ E-commerce Search
  • 5. Flipkart confidential - For Internal use only. Not to be shared externally. ● Search over structured product catalog ○ Products belong to a ‘store’ ■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ ○ Products have key-value attributes ■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’; Place of use: ‘Living room’ ● Intent of a query: ‘rectangle room mat’ ○ Store: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ ○ Attribute Tagging: <shape>: ‘rectangle’ <place of use>: ‘living room’ <store>: ‘mat’ E-commerce Search
  • 6. Flipkart confidential - For Internal use only. Not to be shared externally. Life of a query - simplified view Ranking - Relevance - Query independent signals - ... Augmentation - Normalisation - Spell Correction - Phrasing - Stemming - Synonymization - ... Intent Understanding - Store identification - Intent Tagging - …
  • 7. Flipkart confidential - For Internal use only. Not to be shared externally. Query to Store identification : Why? (Customer Focused)
  • 8. Flipkart confidential - For Internal use only. Not to be shared externally. Query to Store identification : Why? (Customer Focused) Lifestyle Bigger Images, Less Text Mobiles & Large Spec heavy Furniture Aspect Ratio, Swatches
  • 9. Flipkart confidential - For Internal use only. Not to be shared externally. Query to Store identification : Why? (Internal) ● Establishes context for the query attribute tagging ○ Restricts labeling space ● Backend efficiency ● ...
  • 10. Flipkart confidential - For Internal use only. Not to be shared externally. ● Source: click - log data (query -> products clicked -> stores) ● Statistical aggregation of click measure ● Empirically determined confidence level for redirection ○ Sample data: ‘rectangle room mat’ : ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’ : 95% confidence Query to Store identification : Statistical approach (baseline)
  • 11. Flipkart confidential - For Internal use only. Not to be shared externally. ● Works on exact queries, memorises (no generalization) ● Cannot learn anything useful for verticals where query volume and product clicks are low Statistical approach - Challenges
  • 12. Flipkart confidential - For Internal use only. Not to be shared externally. L1 level store identification ● ML problem setup ○ Short text multi-label multi-class classification ○ Order of 10s L1 classes ● Model: Linear SVM (One vs All) ● Feature sets ○ BOW features (tf.idf) ○ Store name overlap features (tf.idf)
  • 13. Flipkart confidential - For Internal use only. Not to be shared externally. L1 level store identification: Results Before After Query: canvas car body covers
  • 14. Flipkart confidential - For Internal use only. Not to be shared externally. L1 level store identification: Results Before After Query: T-Series led tv
  • 15. Flipkart confidential - For Internal use only. Not to be shared externally. L1 level store identification: Impact ● Backend metrics ○ Nearly 40% drop in queries without store (saving valuable compute resources) ● First user path deployment of ML platform’s modelhost Backend requests without stores
  • 16. Flipkart confidential - For Internal use only. Not to be shared externally. ● ML problem setup ○ Short text *multi-label* multi-class classification ○ Order of 1000s leaf stores Leaf level store identification
  • 17. Flipkart confidential - For Internal use only. Not to be shared externally. ● ML problem setup ○ Short text *multi-label* multi-class classification ○ Order of 1000s leaf stores ● Challenges in extending L1 model: ○ Data sparsity ■ Linear SVM (One vs All) scaling for 1000s of classes ○ BOW features (no generalisation, no sharing) Leaf level store identification
  • 18. Flipkart confidential - For Internal use only. Not to be shared externally. ● Approach: fastText ● Key idea(s): ○ Leverage word2vec (cbow) model where instead of target word use label instead ○ Hierarchical softmax - scaling to large number of classes Leaf level store identification fastText: https://github.com/facebookresearch/fastText
  • 19. Flipkart confidential - For Internal use only. Not to be shared externally. Leaf level store identification: How were challenges addressed? ● Data sparsity ○ Using catalog data for seeding the embeddings ○ Helps learn with less amount of labeled data ● BOW features (no generalisation, no sharing) ○ Embeddings help in the abstraction
  • 20. Flipkart confidential - For Internal use only. Not to be shared externally. ● Significant A/B metrics ○ +3 bps Search Conversion ○ +2 bps Visit Conversion ● SQA analysis (PBAGE): 8% improvement Leaf level store identification - Impact
  • 21. Flipkart confidential - For Internal use only. Not to be shared externally. ● Classifier trained only on catalog space (lot more labeled data) didn’t work well in query space as-is ● Seed embeddings trained with store context in catalog space work Leaf level store identification: Some Learnings
  • 22. Flipkart confidential - For Internal use only. Not to be shared externally. Life of a query - simplified view Ranking - Relevance - Query independent signals - ... Augmentation - Normalisation - Spell Correction - Phrasing - Stemming - Synonymization - ... Intent Understanding - Store identification - Intent Tagging - …
  • 23. Flipkart confidential - For Internal use only. Not to be shared externally. Given a query predict the attributes that best describe the terms (chunks) in the query Query: kids party dress 4-5 years pack of 2 Tagging <ideal_for>: kids <occasion>: party <store>: dress <size>: 4-5 years <pack_of>: pack of 2 Query Intent Tagging
  • 24. Flipkart confidential - For Internal use only. Not to be shared externally. ● Use Query product click through logs ● For each query, click product pair ○ Identify the attributes matched from product description to query tokens ○ Store the fraction of the match to attributes for each query token Statistical Aggregation
  • 25. Flipkart confidential - For Internal use only. Not to be shared externally. ● Works on query token space, weak generalization ● Considers all clicks equally but clicks are noisy ● Cannot learn anything useful for verticals where query volume is low Limitations
  • 26. Flipkart confidential - For Internal use only. Not to be shared externally. ● samsung galaxy j7 ○ brand model_name model_name ● samsung galaxy j7 covers ○ designed_for designed_for category Problem Complexity
  • 27. Flipkart confidential - For Internal use only. Not to be shared externally. Some Exploratory Analysis ● ~40 % catalog tokens cannot be identified unambiguously ● “Cotton” appears in vocabulary of 23 attributes in “HomeFurnishing”
  • 28. Flipkart confidential - For Internal use only. Not to be shared externally. ● Attribute labelling at a position depends on tokens at other positions in the query ● Attributes have affinity (brand, model_name) more likely than (brand, color) in mobiles Is Sequence necessary?
  • 29. Flipkart confidential - For Internal use only. Not to be shared externally. ● Let X be the query s.t X = {x1, x2, . . . , xn} where xj is a query token ● Let Y be the intent s.t Y = {y1, y2, . . . , yn} where yj ∈ attributes Sequence Formulation
  • 30. Flipkart confidential - For Internal use only. Not to be shared externally. Supervised - Conditional Random Field
  • 31. Flipkart confidential - For Internal use only. Not to be shared externally. ● looks_like_attribute ○ Attributes like brand, color, model_name ○ Multinomial NB to generate features ● Defined over window at each position in query ● Global feature like is_alnum, is_shortword Feature Design
  • 32. Flipkart confidential - For Internal use only. Not to be shared externally. ● Moving from query token space to attribute feature space, improves generalization ● Can generate multiple partial labellings, better ranking of search results What did we gain ?
  • 33. Flipkart confidential - For Internal use only. Not to be shared externally. ● Significant A/B metrics ○ +5 bps Search Conversion ○ +2 bps Visit Conversion ● SQA analysis (PBAGE): 4% improvement What did we gain ? Metrics
  • 34. Flipkart confidential - For Internal use only. Not to be shared externally. Query : samsung galaxy s7 edge 2017 Some Examples AfterBefore
  • 35. Flipkart confidential - For Internal use only. Not to be shared externally. Query: Watches with steel belt with square dial Some Examples.. AfterBefore
  • 36. Flipkart confidential - For Internal use only. Not to be shared externally. ● Low volume of high confidence labeled data in some verticals ● Click noise, users sometimes click randomly, especially for lifestyle ● The labeled data for CRF suffers from above issues Why CRF is not enough ?
  • 37. Flipkart confidential - For Internal use only. Not to be shared externally. Some Exploratory Analysis... ● Labeled data has low coverage of on unique queries ~ 10 % ● A supervised model will fail to generalize for these stores
  • 38. Flipkart confidential - For Internal use only. Not to be shared externally. ● Generative vs a Discriminative setting like CRF ● Learning from unlabeled queries ● Catalog and limited labeled data used as weak supervision ● WIP… research paper … production Weakly-Supervised Models
  • 39. Flipkart confidential - For Internal use only. Not to be shared externally. Summary
  • 40. Flipkart confidential - For Internal use only. Not to be shared externally. ● Pattern of solution evolution ○ Statistical -> Supervised -> Supervised ++ (side information) ● Common challenges ○ Not enough labeled data (side information / weak supervision) ○ Label/presentation bias Summary
  • 41. Flipkart confidential - For Internal use only. Not to be shared externally. Query: ‘diamond ring’
  • 42. Flipkart confidential - For Internal use only. Not to be shared externally. Query: ‘diamond ring’
  • 43. Flipkart confidential - For Internal use only. Not to be shared externally. Questions ?