Rewind - Take a peek at the presentation deck from the talk by Flipkart data scientists Bharat Thakarar, Subhadeep Maji and Mohit Kumar at slash n 2018.
Human Factors of XR: Using Human Factors to Design XR Systems
What’s in a Query? Understanding query intent
1. What’s in a Query?
Understanding query
intent
Bharat Thakarar
Subhadeep Maji
Mohit Kumar
2. Flipkart confidential - For Internal use only. Not to be shared externally.
E-commerce Search
Query: rectangle
room mat
3. Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
E-commerce Search
4. Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
○ Products have key-value attributes
■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’;
Place of use: ‘Living room’
E-commerce Search
5. Flipkart confidential - For Internal use only. Not to be shared externally.
● Search over structured product catalog
○ Products belong to a ‘store’
■ Eg: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets & Rugs’
○ Products have key-value attributes
■ Eg: Shape: ‘Rectangle’; Style: ‘Iranian’; Place of use: ‘Living room’
● Intent of a query: ‘rectangle room mat’
○ Store: ‘Home Furnishing’ -> ‘Floor Coverings’ -> ‘Carpets &
Rugs’
○ Attribute Tagging: <shape>: ‘rectangle’ <place of use>: ‘living
room’ <store>: ‘mat’
E-commerce Search
6. Flipkart confidential - For Internal use only. Not to be shared externally.
Life of a query - simplified view
Ranking
- Relevance
- Query independent
signals
- ...
Augmentation
- Normalisation
- Spell Correction
- Phrasing
- Stemming
- Synonymization
- ...
Intent Understanding
- Store identification
- Intent Tagging
- …
7. Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Customer Focused)
8. Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Customer Focused)
Lifestyle
Bigger Images, Less Text
Mobiles & Large
Spec heavy
Furniture
Aspect Ratio, Swatches
9. Flipkart confidential - For Internal use only. Not to be shared externally.
Query to Store identification :
Why? (Internal)
● Establishes context for the query attribute tagging
○ Restricts labeling space
● Backend efficiency
● ...
10. Flipkart confidential - For Internal use only. Not to be shared externally.
● Source: click - log data (query -> products clicked ->
stores)
● Statistical aggregation of click measure
● Empirically determined confidence level for redirection
○ Sample data: ‘rectangle room mat’ : ‘Home Furnishing’ -> ‘Floor
Coverings’ -> ‘Carpets & Rugs’ : 95% confidence
Query to Store identification :
Statistical approach (baseline)
11. Flipkart confidential - For Internal use only. Not to be shared externally.
● Works on exact queries, memorises (no generalization)
● Cannot learn anything useful for verticals where query
volume and product clicks are low
Statistical approach -
Challenges
12. Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification
● ML problem setup
○ Short text multi-label multi-class classification
○ Order of 10s L1 classes
● Model: Linear SVM (One vs All)
● Feature sets
○ BOW features (tf.idf)
○ Store name overlap features (tf.idf)
13. Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Results
Before After
Query: canvas car body covers
14. Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Results
Before After
Query: T-Series led tv
15. Flipkart confidential - For Internal use only. Not to be shared externally.
L1 level store identification: Impact
● Backend metrics
○ Nearly 40% drop in queries without
store (saving valuable compute resources)
● First user path deployment of
ML platform’s modelhost
Backend requests without stores
16. Flipkart confidential - For Internal use only. Not to be shared externally.
● ML problem setup
○ Short text *multi-label* multi-class classification
○ Order of 1000s leaf stores
Leaf level store identification
17. Flipkart confidential - For Internal use only. Not to be shared externally.
● ML problem setup
○ Short text *multi-label* multi-class classification
○ Order of 1000s leaf stores
● Challenges in extending L1 model:
○ Data sparsity
■ Linear SVM (One vs All) scaling for 1000s of classes
○ BOW features (no generalisation, no sharing)
Leaf level store identification
18. Flipkart confidential - For Internal use only. Not to be shared externally.
● Approach: fastText
● Key idea(s):
○ Leverage word2vec (cbow) model
where instead of target word use label
instead
○ Hierarchical softmax - scaling to large
number of classes
Leaf level store identification
fastText: https://github.com/facebookresearch/fastText
19. Flipkart confidential - For Internal use only. Not to be shared externally.
Leaf level store identification:
How were challenges addressed?
● Data sparsity
○ Using catalog data for seeding the embeddings
○ Helps learn with less amount of labeled data
● BOW features (no generalisation, no sharing)
○ Embeddings help in the abstraction
20. Flipkart confidential - For Internal use only. Not to be shared externally.
● Significant A/B metrics
○ +3 bps Search Conversion
○ +2 bps Visit Conversion
● SQA analysis (PBAGE): 8% improvement
Leaf level store identification - Impact
21. Flipkart confidential - For Internal use only. Not to be shared externally.
● Classifier trained only on catalog space (lot more labeled
data) didn’t work well in query space as-is
● Seed embeddings trained with store context in catalog
space work
Leaf level store identification:
Some Learnings
22. Flipkart confidential - For Internal use only. Not to be shared externally.
Life of a query - simplified view
Ranking
- Relevance
- Query independent
signals
- ...
Augmentation
- Normalisation
- Spell Correction
- Phrasing
- Stemming
- Synonymization
- ...
Intent Understanding
- Store identification
- Intent Tagging
- …
23. Flipkart confidential - For Internal use only. Not to be shared externally.
Given a query predict the attributes that best describe
the terms (chunks) in the query
Query: kids party dress 4-5 years pack of 2
Tagging <ideal_for>: kids <occasion>: party <store>:
dress <size>: 4-5 years <pack_of>: pack of 2
Query Intent Tagging
24. Flipkart confidential - For Internal use only. Not to be shared externally.
● Use Query product click through logs
● For each query, click product pair
○ Identify the attributes matched from product description
to query tokens
○ Store the fraction of the match to attributes for each
query token
Statistical Aggregation
25. Flipkart confidential - For Internal use only. Not to be shared externally.
● Works on query token space, weak generalization
● Considers all clicks equally but clicks are noisy
● Cannot learn anything useful for verticals where
query volume is low
Limitations
26. Flipkart confidential - For Internal use only. Not to be shared externally.
● samsung galaxy j7
○ brand model_name model_name
● samsung galaxy j7 covers
○ designed_for designed_for category
Problem Complexity
27. Flipkart confidential - For Internal use only. Not to be shared externally.
Some Exploratory Analysis
● ~40 % catalog
tokens cannot be
identified
unambiguously
● “Cotton” appears
in vocabulary of 23
attributes in
“HomeFurnishing”
28. Flipkart confidential - For Internal use only. Not to be shared externally.
● Attribute labelling at a position depends on tokens at
other positions in the query
● Attributes have affinity (brand, model_name) more
likely than (brand, color) in mobiles
Is Sequence necessary?
29. Flipkart confidential - For Internal use only. Not to be shared externally.
● Let X be the query s.t X = {x1, x2, . . . , xn} where xj is
a query token
● Let Y be the intent s.t Y = {y1, y2, . . . , yn} where yj ∈
attributes
Sequence Formulation
30. Flipkart confidential - For Internal use only. Not to be shared externally.
Supervised - Conditional
Random Field
31. Flipkart confidential - For Internal use only. Not to be shared externally.
● looks_like_attribute
○ Attributes like brand, color, model_name
○ Multinomial NB to generate features
● Defined over window at each position in query
● Global feature like is_alnum, is_shortword
Feature Design
32. Flipkart confidential - For Internal use only. Not to be shared externally.
● Moving from query token space to attribute feature
space, improves generalization
● Can generate multiple partial labellings, better
ranking of search results
What did we gain ?
33. Flipkart confidential - For Internal use only. Not to be shared externally.
● Significant A/B metrics
○ +5 bps Search Conversion
○ +2 bps Visit Conversion
● SQA analysis (PBAGE): 4% improvement
What did we gain ? Metrics
34. Flipkart confidential - For Internal use only. Not to be shared externally.
Query : samsung galaxy s7 edge 2017
Some Examples
AfterBefore
35. Flipkart confidential - For Internal use only. Not to be shared externally.
Query: Watches with steel belt with square dial
Some Examples..
AfterBefore
36. Flipkart confidential - For Internal use only. Not to be shared externally.
● Low volume of high confidence labeled data in some
verticals
● Click noise, users sometimes click randomly,
especially for lifestyle
● The labeled data for CRF suffers from above issues
Why CRF is not enough ?
37. Flipkart confidential - For Internal use only. Not to be shared externally.
Some Exploratory Analysis...
● Labeled data has low
coverage of on
unique queries ~ 10
%
● A supervised model
will fail to generalize
for these stores
38. Flipkart confidential - For Internal use only. Not to be shared externally.
● Generative vs a Discriminative setting like CRF
● Learning from unlabeled queries
● Catalog and limited labeled data used as weak
supervision
● WIP… research paper … production
Weakly-Supervised Models
40. Flipkart confidential - For Internal use only. Not to be shared externally.
● Pattern of solution evolution
○ Statistical -> Supervised -> Supervised ++ (side information)
● Common challenges
○ Not enough labeled data (side information / weak supervision)
○ Label/presentation bias
Summary
41. Flipkart confidential - For Internal use only. Not to be shared externally.
Query: ‘diamond ring’
42. Flipkart confidential - For Internal use only. Not to be shared externally.
Query: ‘diamond ring’