SlideShare a Scribd company logo
1 of 33
OCTOBER 11-14, 2016 • BOSTON, MA
Near Real time Indexing
Building Real Time Search Index For E-Commerce
Umesh Prasad
Tech Lead @ Flipkart
Thejus V M
Data Architect @ Flipkart
Agenda
• Search @ Flipkart
• Need for Real Time Search
• SolrCloud Solution
• Our approach
• Q & A
Traffic @ Flipkart
• Peak Traffic
– ~ 800K active users
– ~ 160K requests per second
• Search Traffic
– ~ 40K searches per second (Service)
– ~ 10K searches per second (Solr )
• Latency
– Median : 11 ms
– 99th percentile : 1.1 second
Search @ Flipkart
• Catalogue
– ~ 50 main categories
– ~ 5000 sub-categories
– ~ 231 million documents
– ~ 90 million SKUs
– ~ 160 million listings
• E-commerce Marketplace
– ~ 100K Sellers
– Local Sellers
– Regional Availability
– Logistics Constraints
E-commerce Search
• Heavy usage of drill down filters
• Heavy usage of faceting
• Only top results matter
• Results grouped/collapsed by products
• Serviceability and delivery experience MATTERS
Agenda
• Search @ Flipkart
• Need for Real Time Search
• SolrCloud Solution
• Our approach
• Q & A
Sorry, Stock Over !!?
Damn !! Is Offer Over ??
What !! All Steal Deals Gone ??
Product /Listing: Important Attributes
Seller
Rating
Service
catalogue
service
Promise
Service
Availability
Service
Offer
Service
Pricing
Service
Product aka SKU
Listings
Summary : Lucene Document
• Product/SKU (Parent Document)
– Listing (Child Document)
• Query : Mostly SKU Attributes (Free Text)
• Filters : SKU + Listing Attributes (Drill Down)
• Ranking : SKU + Listing Attributes
(Explicit/Relevance)
• Index Time Join aka Block Join (Best
Performance)
Out Of Stock, but Why Show?
Index has Stale
Availability Data
234K
Products
Challenge 1 : High Update Rates
updates / sec updates /hr
normal Peak
text / catalogue ~10 ~100 ~100K
pricing ~100 ~1K ~10 million
availability ~100 ~10K ~10 million
offer ~100 ~10K ~10 million
seller rating ~10 ~1K ~1 million
signal 6 ~10 ~100 ~1 million
signal 7 ~100 ~10K ~10 million
signal 8 ~100 ~10K ~10 million
Challenge 2 : Micro Services
Ingestion pipeline
Catalogue Pricing Availability Offers ...
Document Builder
Solr/Lucene
Change
Propagation
Documents
{L1,L2 … P1}
Updates Stream 1
Updates Stream 2
Updates Stream 3
● Lucene doesn’t support Partial Updates
● Update = Delete + Add
Agenda
• Search @ Flipkart
• Need for Real Time Search
• SolrCloud Solution
• Our approach
• Q & A
SolrCloud for NRT
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Ingestion pipeline Shard
Leader
Auto commit
Soft Commit
Batch of
documents
For Document
Versioning
Update Log
Forward to Replica
SolrCloud Evaluation
• Update = Delete + Add
– Block Join Index ⇒ Update Whole Block (Product + Listings)
• Updated Document gets streamed to all replicas in sync
– Reduces indexing throughput
• Soft commit is Not Free
– Soft commit ⇒ In Memory Segment
– Lots of Merges
– Huge document churn / deletes
– All caches still need to be re-generated
– Filter Cache miss specially hurts performance
Agenda
• Search @ Flipkart
• Need for Real Time Index
• SolrCloud Solution
•Our approach
• Q & A
ProductA
brand : Apple
availability : T
price : 45000
ProductB
brand : Samsung
availability : T
price : 23000
ProductC
brand : Apple
availability : F
price : 5000
Document ID
Mappings
Posting List
(Inverted Index)
DocValues
(columunar data)
Lucene Segment
Lucene Index
0 ProductA
1 ProductB
2 ProductC
45000 23000 5000Price
availability : T
brand : Samsung
brand : Apple 0 , 2
1
0 , 1
Terms
Sparse
Bitsets
A Typical Search Flow
Query Rewrite
Results
Query
Matching
Ranking Faceting
Stats
Posting List
Doc Values
Other
Components
Lucene Segment
Inverted Index
Forward Index
NRT Store
samsung mobiles
Offer : exchange offer
price desc
category : mobiles
brand : samsung
Offer : exchange offer
NRT Forward Index - Considerations
● Lookup efficiency
– 50th percentile : ~10K matches
– 99th percentile : ~1 million matches
● Data on Java heap
– Memory efficiency
NRT Forward Index - Naive Implementation
NRT Forward IndexLucene Segment
Lookup Engine
0 ProductB
1 ProductA
2 ProductC
3 ProductD
ProductD
ProductA
ProductB
ProductC
ProductD
True
False
False
True
100
150
200
250
ProductId(3) <ProductD,price>
DocId : 3
field: price
250
ProductId Availability Price
Latency : ~10 secs for ~1 Million
lookups
NRT Store - Forward Index Optimized
Lookup Engine
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
DocId : 3
Field : price
250
DocId - NrtId
0
1
2
3
3
0
1
2
NrtId(3)
2
Price(2
)
NRT Forward Index (Segment Independent)
100 200 250 150Price
0 ProductA
1 ProductC
2 ProductD
3 ProductB
Availability T F F T
Status 01 10 01 00
Latency : ~100 ms for ~1 Million lookups
NRT Store Filter - PostFilter
PostFilter(Price:[100 TO 150])
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
DocId : 3
Don’t
Delegate
DocId - NrtId
0
1
2
3
3
0
1
2
NrtId(3)
2
Price(2
)
NRT Forward Index (Segment Independent)
100 200 250 150Price
0 ProductA
1 ProductC
2 ProductD
3 ProductB
Availability T F F T
Status 01 10 01 00
NRT Filter
NRT Store - Invert index
NRT Forward StoreNRT Inverter
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
NRT DocIdSet Cache
Availability : T 0 3
Offer : O1 2 3
Offer:O1 DocIdSet
Solr Integration Points
• ValueSources
• Filtering
– Custom Filter Implementation for cached DocIdSet
– Custom PostFilter
• Query
– Wrapper over Filter
• Custom FacetComponent
Near Real Time Solr Architecture
Solr
Kafka
Ingestion pipeline
NRT Forward
Index
Ranking
Matching
Faceting
Redis
Bootstrap
NRT Inverted
store
Solr Master
NRT Updates
Lucene Updates
Catalogue
Pricing
Availability
Offers
Seller
Quality
Commit
+
Replicate
+
Reopen
Lucene
Others
Accomplishments
• Real time sorting
• Real time filtering : PostFilter
– Higher latency
• Near real time filtering : cached DocIdSet
– No consistency between lookup and filtering
• Independent of lucene commits
• Query latency comparable to DocValues
– Consistent 99% performance
Accomplishments @ Flipkart
● Real time consumption for ~150 Signals
● Reduction in shown out of stock products by 2X
● Production instances of ~50K updates/second real time
Thank you
&
Questions

More Related Content

What's hot

Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture Ramez Al-Fayez
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Sease
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hoodSmartCat
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchIsmaeel Enjreny
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Sumo Logic QuickStart
Sumo Logic QuickStartSumo Logic QuickStart
Sumo Logic QuickStartSumo Logic
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream4Science
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaSpringPeople
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 

What's hot (20)

Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Taxonomies for Users
Taxonomies for UsersTaxonomies for Users
Taxonomies for Users
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Sumo Logic QuickStart
Sumo Logic QuickStartSumo Logic QuickStart
Sumo Logic QuickStart
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 

Similar to Near RealTime search @Flipkart

near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce Umesh Prasad
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl newGreenM
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsSamantha Quiñones
 
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014TAUS - The Language Data Network
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in UberC4Media
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...Amazon Web Services
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Lucidworks
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitterTwitter Developers
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02Valerie Akinson Brown
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02BigDataCamp
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02Valerie Akinson Brown
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising TechApache Apex
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScaleSeunghyun Lee
 

Similar to Near RealTime search @Flipkart (20)

near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in Uber
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Scalable IoT platform
Scalable IoT platformScalable IoT platform
Scalable IoT platform
 

Recently uploaded

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 

Recently uploaded (20)

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 

Near RealTime search @Flipkart

  • 1. OCTOBER 11-14, 2016 • BOSTON, MA
  • 2. Near Real time Indexing Building Real Time Search Index For E-Commerce Umesh Prasad Tech Lead @ Flipkart Thejus V M Data Architect @ Flipkart
  • 3. Agenda • Search @ Flipkart • Need for Real Time Search • SolrCloud Solution • Our approach • Q & A
  • 4.
  • 5.
  • 6. Traffic @ Flipkart • Peak Traffic – ~ 800K active users – ~ 160K requests per second • Search Traffic – ~ 40K searches per second (Service) – ~ 10K searches per second (Solr ) • Latency – Median : 11 ms – 99th percentile : 1.1 second
  • 7. Search @ Flipkart • Catalogue – ~ 50 main categories – ~ 5000 sub-categories – ~ 231 million documents – ~ 90 million SKUs – ~ 160 million listings • E-commerce Marketplace – ~ 100K Sellers – Local Sellers – Regional Availability – Logistics Constraints
  • 8. E-commerce Search • Heavy usage of drill down filters • Heavy usage of faceting • Only top results matter • Results grouped/collapsed by products • Serviceability and delivery experience MATTERS
  • 9. Agenda • Search @ Flipkart • Need for Real Time Search • SolrCloud Solution • Our approach • Q & A
  • 11. Damn !! Is Offer Over ??
  • 12. What !! All Steal Deals Gone ??
  • 13. Product /Listing: Important Attributes Seller Rating Service catalogue service Promise Service Availability Service Offer Service Pricing Service Product aka SKU Listings
  • 14. Summary : Lucene Document • Product/SKU (Parent Document) – Listing (Child Document) • Query : Mostly SKU Attributes (Free Text) • Filters : SKU + Listing Attributes (Drill Down) • Ranking : SKU + Listing Attributes (Explicit/Relevance) • Index Time Join aka Block Join (Best Performance)
  • 15. Out Of Stock, but Why Show? Index has Stale Availability Data 234K Products
  • 16. Challenge 1 : High Update Rates updates / sec updates /hr normal Peak text / catalogue ~10 ~100 ~100K pricing ~100 ~1K ~10 million availability ~100 ~10K ~10 million offer ~100 ~10K ~10 million seller rating ~10 ~1K ~1 million signal 6 ~10 ~100 ~1 million signal 7 ~100 ~10K ~10 million signal 8 ~100 ~10K ~10 million
  • 17. Challenge 2 : Micro Services Ingestion pipeline Catalogue Pricing Availability Offers ... Document Builder Solr/Lucene Change Propagation Documents {L1,L2 … P1} Updates Stream 1 Updates Stream 2 Updates Stream 3 ● Lucene doesn’t support Partial Updates ● Update = Delete + Add
  • 18. Agenda • Search @ Flipkart • Need for Real Time Search • SolrCloud Solution • Our approach • Q & A
  • 20. SolrCloud Evaluation • Update = Delete + Add – Block Join Index ⇒ Update Whole Block (Product + Listings) • Updated Document gets streamed to all replicas in sync – Reduces indexing throughput • Soft commit is Not Free – Soft commit ⇒ In Memory Segment – Lots of Merges – Huge document churn / deletes – All caches still need to be re-generated – Filter Cache miss specially hurts performance
  • 21. Agenda • Search @ Flipkart • Need for Real Time Index • SolrCloud Solution •Our approach • Q & A
  • 22. ProductA brand : Apple availability : T price : 45000 ProductB brand : Samsung availability : T price : 23000 ProductC brand : Apple availability : F price : 5000 Document ID Mappings Posting List (Inverted Index) DocValues (columunar data) Lucene Segment Lucene Index 0 ProductA 1 ProductB 2 ProductC 45000 23000 5000Price availability : T brand : Samsung brand : Apple 0 , 2 1 0 , 1 Terms Sparse Bitsets
  • 23. A Typical Search Flow Query Rewrite Results Query Matching Ranking Faceting Stats Posting List Doc Values Other Components Lucene Segment Inverted Index Forward Index NRT Store samsung mobiles Offer : exchange offer price desc category : mobiles brand : samsung Offer : exchange offer
  • 24. NRT Forward Index - Considerations ● Lookup efficiency – 50th percentile : ~10K matches – 99th percentile : ~1 million matches ● Data on Java heap – Memory efficiency
  • 25. NRT Forward Index - Naive Implementation NRT Forward IndexLucene Segment Lookup Engine 0 ProductB 1 ProductA 2 ProductC 3 ProductD ProductD ProductA ProductB ProductC ProductD True False False True 100 150 200 250 ProductId(3) <ProductD,price> DocId : 3 field: price 250 ProductId Availability Price Latency : ~10 secs for ~1 Million lookups
  • 26. NRT Store - Forward Index Optimized Lookup Engine Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD DocId : 3 Field : price 250 DocId - NrtId 0 1 2 3 3 0 1 2 NrtId(3) 2 Price(2 ) NRT Forward Index (Segment Independent) 100 200 250 150Price 0 ProductA 1 ProductC 2 ProductD 3 ProductB Availability T F F T Status 01 10 01 00 Latency : ~100 ms for ~1 Million lookups
  • 27. NRT Store Filter - PostFilter PostFilter(Price:[100 TO 150]) Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD DocId : 3 Don’t Delegate DocId - NrtId 0 1 2 3 3 0 1 2 NrtId(3) 2 Price(2 ) NRT Forward Index (Segment Independent) 100 200 250 150Price 0 ProductA 1 ProductC 2 ProductD 3 ProductB Availability T F F T Status 01 10 01 00
  • 28. NRT Filter NRT Store - Invert index NRT Forward StoreNRT Inverter Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD NRT DocIdSet Cache Availability : T 0 3 Offer : O1 2 3 Offer:O1 DocIdSet
  • 29. Solr Integration Points • ValueSources • Filtering – Custom Filter Implementation for cached DocIdSet – Custom PostFilter • Query – Wrapper over Filter • Custom FacetComponent
  • 30. Near Real Time Solr Architecture Solr Kafka Ingestion pipeline NRT Forward Index Ranking Matching Faceting Redis Bootstrap NRT Inverted store Solr Master NRT Updates Lucene Updates Catalogue Pricing Availability Offers Seller Quality Commit + Replicate + Reopen Lucene Others
  • 31. Accomplishments • Real time sorting • Real time filtering : PostFilter – Higher latency • Near real time filtering : cached DocIdSet – No consistency between lookup and filtering • Independent of lucene commits • Query latency comparable to DocValues – Consistent 99% performance
  • 32. Accomplishments @ Flipkart ● Real time consumption for ~150 Signals ● Reduction in shown out of stock products by 2X ● Production instances of ~50K updates/second real time