SlideShare a Scribd company logo
1 of 18
Download to read offline
Search @ Flipkart
Umesh Prasad
Thejus VM
Empowering Consumers discover and find products
Solr/ Lucene Meetup 2 @ Bangalore
Date : July 27, 2013
Outline
● Search Architecture @ Flipkart
● Challenges for E-commerce
○ Diverse Catalogue
○ Availability, Uptime and performance
○ High frequency updates
● Solutions
○ Caching and warm up
○ External Source Fields (Sort, Facet, Filter)
○ Relevance optimizations
Flipkart Search Architecture
Technologies Used
The E-commerce Search Challenge
● Diverse catalogue
○ ~13 million products, ~900 categories
○ What fields to Search
○ How to rank (within category/across categories). Ranking Facets ?
○ tf-idf and vector space model doesn't help
● Performance
○ 99.99 % availability
○ ~1000 qps
○ ~75 ms for Search, ~5 ms for Autosuggest
○ Prefetching data (Conflicts with liveliness)
● High rate of updates
○ Multiple data sources (aggregate, index, commit, replicate)
○ Temporal fields (Price/Availability/SLAs/Offers)
○ Lucene doesn't support partial updates
Addressing - Performance / Latency
● Make Search Faster
○ Use Filters, score only if needed, lazy field loads,
smaller indexes aka sharding
● Caching
○ Solr caches (Type/Sizing/Tuning/Warming)
○ Custom caches
○ Cache warmup on replication and startup
Solr Search Flow
And High Latency Cache
Cache hit is 10X -
50X faster.
Solr Caches
● QueryCache
○ Key = <Lucene Query, Filters, SortFields>
○ Value = Docset(Bitset) / DocList (bitset with score)
○ Caching only a results Window
○ Use : Pagination/repeat queries
● FilterCache
○ Key = Query
○ Value = Docset (maxDoc)
○ Matching / Faceting
● FieldValueCache
○ Key = FieldName
○ Value = <Term,DocSet>
○ Faceting
● DocumentCache
○ Key = docId
○ Value = Fields
Expensive Features
● Facet on Queries
○ Facet.queries
● Grouping
○ ngroups (counting number of groups )
○ facet counting of groups (makes 2nd query)
○ No Cache for Group
● Solution : High Latency Cache
○ Key = All Request Params
○ Value = Full response object
○ Re-generate
How replication Impacts Caching ?
Challenge 3 : High Rate of Updates
● Two Solutions
○ Near real time Indexing / Searching
○ External Fields
● NRT Indexing and searching
○ Softcommits => solr caches invalidated
○ Lot of churn : Document deleted and re-added.
○ No autowarm for document cache
● External Fields
○ Resonates with Horizontal partition (Document level
partitioning)
○ Great for Ephemeral fields (Price/availability/slas)
○ Supports faceting / filter / sorting
External Fields and Relevance Tuning
Sorting on 500 plus Dynamic Fields
● 10 million products * 4 bytes = 38.1 MB
● 38.1 MB * 500 fields = 17.0 GB of Heap Memory
● On replication : 17 * 2 = 34 GB Heap for just FieldCache
BOOM
External Fields
Relevance and Scoring
● Search Page(Query based scoring)
○ Handcrafted boosts to capture retail specific signals
○ User feedback based ranking
○ Turn off - query norm, tf, idf on specific fields
● Browse Page(Non Query based Scoring)
○ Challenge - How do we rank in order to maximize
diversity and still show relevant products
Query Classification
● Rank category for a given query
● Signals
○ Text Scoring
○ Retail signals
○ Click stream data
● Rules Specified over classifications for better
customer experience
Q & A

More Related Content

What's hot

Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
DataWorks Summit
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 

What's hot (20)

Netflix Global Search - Lucene Revolution
Netflix Global Search - Lucene RevolutionNetflix Global Search - Lucene Revolution
Netflix Global Search - Lucene Revolution
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Anatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur DatarAnatomy of an eCommerce Search Engine by Mayur Datar
Anatomy of an eCommerce Search Engine by Mayur Datar
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
How the Lucene More Like This Works
How the Lucene More Like This WorksHow the Lucene More Like This Works
How the Lucene More Like This Works
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Fuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer ShinFuzzy Matching on Apache Spark with Jennifer Shin
Fuzzy Matching on Apache Spark with Jennifer Shin
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
Faceted Metadata for Site Navigation and Search
Faceted Metadata for Site Navigation and SearchFaceted Metadata for Site Navigation and Search
Faceted Metadata for Site Navigation and Search
 
OSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchOSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearch
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
 

Viewers also liked

The parsers & test upload
The parsers & test uploadThe parsers & test upload
The parsers & test upload
Anupam Jain
 
Recommendations play @flipkart
Recommendations play @flipkartRecommendations play @flipkart
Recommendations play @flipkart
hava101
 
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Lucidworks
 
Events, Signals, and Recommendations
Events, Signals, and RecommendationsEvents, Signals, and Recommendations
Events, Signals, and Recommendations
Lucidworks
 

Viewers also liked (20)

Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
The parsers & test upload
The parsers & test uploadThe parsers & test upload
The parsers & test upload
 
Recommendations play @flipkart
Recommendations play @flipkartRecommendations play @flipkart
Recommendations play @flipkart
 
Strategic recommendations for flipkart
Strategic recommendations for flipkartStrategic recommendations for flipkart
Strategic recommendations for flipkart
 
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Events, Signals, and Recommendations
Events, Signals, and RecommendationsEvents, Signals, and Recommendations
Events, Signals, and Recommendations
 
Etsy Search: How We Index and Query 26 Million One-of-a-kind Items
Etsy Search: How We Index and Query 26 Million One-of-a-kind ItemsEtsy Search: How We Index and Query 26 Million One-of-a-kind Items
Etsy Search: How We Index and Query 26 Million One-of-a-kind Items
 
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyEvolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
It's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksIt's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, Lucidworks
 
Search At AstraZeneca. An Agile AppStore (search-based apps) Created On A Ric...
Search At AstraZeneca. An Agile AppStore (search-based apps) Created On A Ric...Search At AstraZeneca. An Agile AppStore (search-based apps) Created On A Ric...
Search At AstraZeneca. An Agile AppStore (search-based apps) Created On A Ric...
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
 
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxCoffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
 
Webinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and RelevanceWebinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and Relevance
 
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 
Webinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with FusionWebinar: Site Search in an Hour with Fusion
Webinar: Site Search in an Hour with Fusion
 

Similar to Search@flipkart

Similar to Search@flipkart (20)

Query optimization in Apache Tajo
Query optimization in Apache TajoQuery optimization in Apache Tajo
Query optimization in Apache Tajo
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Centernet
CenternetCenternet
Centernet
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Data Enginering from Google Data Warehouse
Data Enginering from Google Data WarehouseData Enginering from Google Data Warehouse
Data Enginering from Google Data Warehouse
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Presto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@MyntraPresto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@Myntra
 
Druid
DruidDruid
Druid
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result Reordering
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 

Search@flipkart

  • 1. Search @ Flipkart Umesh Prasad Thejus VM Empowering Consumers discover and find products Solr/ Lucene Meetup 2 @ Bangalore Date : July 27, 2013
  • 2. Outline ● Search Architecture @ Flipkart ● Challenges for E-commerce ○ Diverse Catalogue ○ Availability, Uptime and performance ○ High frequency updates ● Solutions ○ Caching and warm up ○ External Source Fields (Sort, Facet, Filter) ○ Relevance optimizations
  • 3.
  • 6. The E-commerce Search Challenge ● Diverse catalogue ○ ~13 million products, ~900 categories ○ What fields to Search ○ How to rank (within category/across categories). Ranking Facets ? ○ tf-idf and vector space model doesn't help ● Performance ○ 99.99 % availability ○ ~1000 qps ○ ~75 ms for Search, ~5 ms for Autosuggest ○ Prefetching data (Conflicts with liveliness) ● High rate of updates ○ Multiple data sources (aggregate, index, commit, replicate) ○ Temporal fields (Price/Availability/SLAs/Offers) ○ Lucene doesn't support partial updates
  • 7. Addressing - Performance / Latency ● Make Search Faster ○ Use Filters, score only if needed, lazy field loads, smaller indexes aka sharding ● Caching ○ Solr caches (Type/Sizing/Tuning/Warming) ○ Custom caches ○ Cache warmup on replication and startup
  • 8. Solr Search Flow And High Latency Cache Cache hit is 10X - 50X faster.
  • 9. Solr Caches ● QueryCache ○ Key = <Lucene Query, Filters, SortFields> ○ Value = Docset(Bitset) / DocList (bitset with score) ○ Caching only a results Window ○ Use : Pagination/repeat queries ● FilterCache ○ Key = Query ○ Value = Docset (maxDoc) ○ Matching / Faceting ● FieldValueCache ○ Key = FieldName ○ Value = <Term,DocSet> ○ Faceting ● DocumentCache ○ Key = docId ○ Value = Fields
  • 10. Expensive Features ● Facet on Queries ○ Facet.queries ● Grouping ○ ngroups (counting number of groups ) ○ facet counting of groups (makes 2nd query) ○ No Cache for Group ● Solution : High Latency Cache ○ Key = All Request Params ○ Value = Full response object ○ Re-generate
  • 12. Challenge 3 : High Rate of Updates ● Two Solutions ○ Near real time Indexing / Searching ○ External Fields ● NRT Indexing and searching ○ Softcommits => solr caches invalidated ○ Lot of churn : Document deleted and re-added. ○ No autowarm for document cache ● External Fields ○ Resonates with Horizontal partition (Document level partitioning) ○ Great for Ephemeral fields (Price/availability/slas) ○ Supports faceting / filter / sorting
  • 13. External Fields and Relevance Tuning
  • 14. Sorting on 500 plus Dynamic Fields ● 10 million products * 4 bytes = 38.1 MB ● 38.1 MB * 500 fields = 17.0 GB of Heap Memory ● On replication : 17 * 2 = 34 GB Heap for just FieldCache BOOM
  • 16. Relevance and Scoring ● Search Page(Query based scoring) ○ Handcrafted boosts to capture retail specific signals ○ User feedback based ranking ○ Turn off - query norm, tf, idf on specific fields ● Browse Page(Non Query based Scoring) ○ Challenge - How do we rank in order to maximize diversity and still show relevant products
  • 17. Query Classification ● Rank category for a given query ● Signals ○ Text Scoring ○ Retail signals ○ Click stream data ● Rules Specified over classifications for better customer experience
  • 18. Q & A