SlideShare a Scribd company logo
1 of 27
Solr Compute Cloud – An Elastic
Solr Infrastructure
Nitin Sharma
- Member of technical staff, BloomReach
- nitin.sharma@bloomreach.com
Abstract
Scaling search platforms is an extremely hard problem
• Serving hundreds of millions of documents
• Low latency
• High throughput workloads
• Optimized cost.
At BloomReach, we have implemented SC2, an elastic Solr infrastructure for big data applications
that:
• Supports heterogeneous workloads while hosted in the cloud.
• Dynamically grows/shrinks search servers
• Application and Pipeline level isolation, NRT search and indexing.
• Offers latency guarantees and application-specific performance tuning.
• Provides high-availability features like cluster replacement, cross-data center support, disaster
recovery etc.
About Us
BloomReach
BloomReach has developed a personalized discovery platform that features applications that analyze
big data to makes our customers’ digital content more discoverable, relevant and profitable.
Myself
I work on search platform scaling for BloomReach’s big data. My relevant experience and background
includes scaling real-time services for latency sensitive applications and building performance and search-
quality metrics infrastructure for personalization platforms.
The
BloomReach
Personalized
Discovery
Platform
BloomReach’s Applications
Organic
Search
Contentunderstanding
What it does
Content optimization,
management and measurement
Benefit
Enhanced discoverability and
customer acquisition in organic search
What it does
Personalized onsite search and
navigation across devices
Benefit
Relevant and consistent onsite
experiences for new and known users
What it does
Merchandising tool that understa
nds products and identifies oppo
rtunities
Benefit
Prioritize and optimize
online merchandising
SNAP
Compass
Agenda
• BloomReach search use cases and architecture
• Old architecture and issues
• Scaling challenges
• Elastic SolrCloud architecture and benefits
• Lessons learned
BloomReach Search Use Cases
1. Front-end (serving) queries – Uptime and Latency sensitive
2. Batch search pipelines – Throughput sensitive
3. Time bound indexing requirements – Customer Specific
4. Time bound Solr config updates
BloomReach Search Architecture
Solr
Cluster
Zookeeper Ensemble Map Reduce
Pipelines (Reads)
Indexing Pipelines
Pipeline 1
Pipeline 2
Pipeline n
Indexing 1
Indexing 2
Indexing n
Heavy Load
Moderate Load
Light Load
Legend
Public API
Search Traffic
Search Traffic
Throughput Issues…
Solr
Cluster
Zookeeper Ensemble
Pipeline 1
Pipeline 2
Pipeline n
Indexing 1
Indexing 2
Indexing n
Public API
Search Traffic
● Heterogeneous read
workload
● Same collection - different
pipelines, different query
patterns, different schedule
● Cache tuning is virtually
impossible
● Larger pipeline starving the
small ones
● Machine utilization
determines throughput and
stability of a pipeline at any
point
● No isolation among jobs
Stability and Uptime Issues…
Solr
Cluster
Zookeeper Ensemble
Pipeline 1
Pipeline 2
Pipeline n
Indexing 1
Indexing 2
Indexing n
Public API
Search Traffic
● Bad clients – bring down
the cluster/degrade
performance
● Bad queries (with heavy
load) – render nodes
unresponsive
● Garbage collection issues
● ZK stability issues (as we
scale collections)
● CPU /Load Issues
● Higher number of
concurrent pipelines,
higher number of issues
Indexing Issues…
Solr
Cluster
Zookeeper Ensemble
Pipeline 1
Pipeline 2
Pipeline n
Indexing 1
Indexing 2
Indexing n
Public API
Search Traffic
● Commit frequencies vary
with indexer types
● Indexer run during another
pipeline – performance
● Indexer client leaks
● Too many stored fields
● Non-batch updates
Rethinking…
• Shared cluster for pipelines does not scale.
• Guaranteeing an uptime of 99.99+ - non trivial
• Every job runs great in isolation. When you put them together, they fail.
• Running index-heavy load and read-heavy load - cluster performance issues.
• Any direct access to production cluster – cluster stability (client leaks, bad queries etc.).
What if every pipeline had its own cluster?
Solr Compute Cloud (SC2)
• Elastic Infrastructure – Provision Solr Clusters on demand, on-the-fly.
• Create, Use, Terminate Model - Create a temporary cluster with necessary data, use it and throw it away.
• Technologies behind SC2 (built in House)
Cluster Management API - Dynamic cluster provisioning and resource allocation.
Solr HAFT – High availability and data management library for SolrCloud.
• Isolation - Pipelines get their own cluster. One cannot disrupt another.
• Dynamic Scaling – Every pipeline can state its own replication requirements.
• Production Safeguard - No direct access. Safeguards from bad clients/access patterns.
• Cost Saving – Provision for the average; withstand peak with elastic growth.
Solr Compute Cloud
Solr
Cluster
Zookeeper Ensemble
Pipeline 1
Solr
Compute
Cloud
API
Solr Cluster
Collection A
Replicas: 6
1. Read pipeline requests
collection and desired
replicas from SC2 API.
2. SC2 API provisions
cluster dynamically with
needed setup (and
streams Solr data).
3. SC2 calls HAFT service to
replicate data from
production to provisioned
cluster.
4. Pipeline uses this cluster
to run job.
1
4
Request: {Collection: A, Replica: 6}
2
Solr
HAFT
Service
3
3
Read
Replicate
Solr Compute Cloud…
Solr
Cluster
Zookeeper Ensemble
Pipeline 1
Solr
Compute
Cloud
API
Solr Cluster
Collection A
Replicas: 6
1. Pipeline finishes running
the job.
2. Pipeline calls SC2 API to
terminate the cluster.
3. SC2 terminates the
cluster.
2
Terminate: {Cluster}
3
Solr
HAFT
Service
1
Solr Compute Cloud – Read Pipeline View
Zookeeper Ensemble
Pipeline 1
Solr
Compute
Cloud
API
Solr Cluster
Collection A
Replicas: 6
Request: {Collection: A, Replica: 6}
Pipeline 2
Solr Cluster
Collection B
Replicas: 2
Request: {Collection: B, Replica: 2}
Pipeline n
Solr Cluster
Collection C
Replicas: 1
Request: {Collection: C, Replica: 1}
Solr
HAFT
Service
Production
Solr Cluster
Solr Compute Cloud – Indexing
Production
Solr Cluster
Zookeeper Ensemble
Indexing
Solr
Compute
Cloud
API
Solr Cluster
Collection A
Replicas: 6
1. Read pipeline requests
collection and desired
replicas from SC2 API.
2. SC2 API provisions
cluster dynamically with
needed setup (and
streams Solr data).
1. Indexer uses this cluster
to index the data.
2. Indexer calls HAFT
service to replicate the
index from dynamic
cluster to production.
3. HAFT service reads data
from dynamic cluster and
replicates to production
Solr.
1
3
Request: {Collection: A, Replica: 2}
2
Replicate
Solr HAFT Service
4
5
Read
Solr Compute Cloud – Global View
Zookeeper Ensemble
Solr
Compute
Cloud
API
Solr HAFT Service
Production
Solr Cluster
Indexing Pipelines 1
Elastic Clusters
Read Pipelines 1
Read Pipelines n
Indexing Pipelines n
Provision: {Cluster}
Terminate: {Cluster}
Replicate Index
Replicate Index
Run Job
Solr Compute Cloud API
1. API to provision clusters on demand.
2. Dynamic cluster and resource allocation (includes cost optimization)
3. Track request state, cluster performance and cost.
4. Terminate long-running, runaway clusters.
Solr HAFT Service
1. High availability and fault tolerance
2. Home-grown technology
3. Open Source -  (Work in progress)
4. Features
• One push disaster recovery
• High availability operations
• Replace node
• Add replicas
• Repair collection
• Collection versioning
• Cluster backup operations
• Dynamic replica creation
• Cluster clone
• Cluster swap
• Cluster state reconstruction
Solr HAFT Service
Clone Alias
Clone Collections
Custom Commit Node Replacement
Node Repair
Clone Cluster
Collection Versioning
Black Box Recording
Lucene Segment
Optimize
Index Management Actions
High Availability Actions
Cluster Backup Operations
Solr Metadata
Zookeeper
Metadata
Verification Monitoring
Solr HAFT Service – Functional View
Dynamic Replica
Creation
Cluster Clone
Cluster Swap
Cluster State
Reconstruction
Disaster Recovery in New Architecture
Old
Production
Solr
Cluster
Zookeeper Ensemble
New
Solr
Cluster
Zookeeper Ensemble
Solr HAFT Service
Push
Button
Recovery
Brave Soul on Pager Duty
1
2
DNS
3
1. Guy on Pager clicks the
recovery button
2. Solr HAFT Service
triggers
Cluster Setup
State Reconstruction
Cluster Clone
Cluster Swap
3. Production DNS – New
Cluster
SC2 vs Non-SC2 (Stability Features)
Property Non-SC2 SC2
Linear Scalability for Heterogeneous
Workload
Pipeline Level Isolation
Dynamic Collection Scaling
Prevention from Bad Clients
Pipeline Specific Performance
No Direct Access to Production Cluster
Can Sleep at night? 
SC2 vs Non-SC2 (Availability Features)
Property Non-SC2 SC2
Cross Data-Center Support
Cluster Cloning
Collection Versioning
One-Push Disaster Recovery
Repair API for Nodes/Collections
Node Replacement
Lessons Learned
1. Solr is a search platform. Do not use it as a database (for scans and lookups).
Evaluate your stored fields.
2. Understand access patterns, QPS and queries in detail. Be careful when tuning
caches.
3. Have access control for large-scale jobs that directly talk to your cluster. (Internal
DDOS attacks are hard to track.)
4. Instrument every piece of infrastructure and collect metrics.
5. Build automated disaster recovery (You will need it. )
Questions?
Thank You!
Nitin Sharma
nitin.sharma@bloomreach.com
https://www.linkedin.com/in/knitinsharma

More Related Content

What's hot

Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 
RAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseRAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseNikhil Kumar
 
State of Linux Containers for HPC
State of Linux Containers for HPCState of Linux Containers for HPC
State of Linux Containers for HPCinside-BigData.com
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsDoiT International
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptSantosh Kangane
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNDataWorks Summit/Hadoop Summit
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache KylinShi Shao Feng
 
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...confluent
 
Kubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceKubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceOmer Barel
 
Apache Helix DevOps & LSPE-IN Meetup
Apache Helix DevOps & LSPE-IN Meetup Apache Helix DevOps & LSPE-IN Meetup
Apache Helix DevOps & LSPE-IN Meetup Shahnawaz Saifi
 
Introduction to Galera
Introduction to GaleraIntroduction to Galera
Introduction to GaleraHenrik Ingo
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Severalnines
 

What's hot (20)

open stackliberty_recap_by_VietOpenStack
open stackliberty_recap_by_VietOpenStackopen stackliberty_recap_by_VietOpenStack
open stackliberty_recap_by_VietOpenStack
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
RAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and DatabaseRAC-Installing your First Cluster and Database
RAC-Installing your First Cluster and Database
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
State of Linux Containers for HPC
State of Linux Containers for HPCState of Linux Containers for HPC
State of Linux Containers for HPC
 
Convert single instance to RAC
Convert single instance to RACConvert single instance to RAC
Convert single instance to RAC
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and concept
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache Kylin
 
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
 
Kubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidenceKubernetes Overview - Deploy your app with confidence
Kubernetes Overview - Deploy your app with confidence
 
Apache Helix DevOps & LSPE-IN Meetup
Apache Helix DevOps & LSPE-IN Meetup Apache Helix DevOps & LSPE-IN Meetup
Apache Helix DevOps & LSPE-IN Meetup
 
Introduction to Galera
Introduction to GaleraIntroduction to Galera
Introduction to Galera
 
Helix talk at RelateIQ
Helix talk at RelateIQHelix talk at RelateIQ
Helix talk at RelateIQ
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
 

Viewers also liked

My stories 2 minnie
My stories 2 minnieMy stories 2 minnie
My stories 2 minnieTLeeTSIS
 
2014 10-30 presentatie wijzigingen zorg dorpsraad 2
2014 10-30 presentatie wijzigingen zorg dorpsraad 22014 10-30 presentatie wijzigingen zorg dorpsraad 2
2014 10-30 presentatie wijzigingen zorg dorpsraad 2Pieter Sande
 
Pari’s stories
Pari’s storiesPari’s stories
Pari’s storiesTLeeTSIS
 
3 the diary of a young girl
3 the diary of a young girl3 the diary of a young girl
3 the diary of a young girlTLeeTSIS
 
Sally ride
Sally rideSally ride
Sally ride321835
 
Optimizing Your Author Website for Google and Social Media
Optimizing Your Author Website for Google and Social MediaOptimizing Your Author Website for Google and Social Media
Optimizing Your Author Website for Google and Social MediaKatherine Cowley
 
Petit dej digital_magency-slideshare
Petit dej digital_magency-slidesharePetit dej digital_magency-slideshare
Petit dej digital_magency-slideshareMAGENCY DIGITAL
 

Viewers also liked (16)

My stories 2 minnie
My stories 2 minnieMy stories 2 minnie
My stories 2 minnie
 
Office 365
Office 365Office 365
Office 365
 
2014 10-30 presentatie wijzigingen zorg dorpsraad 2
2014 10-30 presentatie wijzigingen zorg dorpsraad 22014 10-30 presentatie wijzigingen zorg dorpsraad 2
2014 10-30 presentatie wijzigingen zorg dorpsraad 2
 
Hi tek
Hi tek Hi tek
Hi tek
 
尚品宅配网站效果图研究
尚品宅配网站效果图研究尚品宅配网站效果图研究
尚品宅配网站效果图研究
 
Pari’s stories
Pari’s storiesPari’s stories
Pari’s stories
 
Question 4
Question 4Question 4
Question 4
 
College kpi
College kpiCollege kpi
College kpi
 
It kpi dashboard
It kpi dashboardIt kpi dashboard
It kpi dashboard
 
04 італія неаполь_лігурія_2014
04 італія неаполь_лігурія_201404 італія неаполь_лігурія_2014
04 італія неаполь_лігурія_2014
 
3 the diary of a young girl
3 the diary of a young girl3 the diary of a young girl
3 the diary of a young girl
 
Clash of clans
Clash of clansClash of clans
Clash of clans
 
Sally ride
Sally rideSally ride
Sally ride
 
Optimizing Your Author Website for Google and Social Media
Optimizing Your Author Website for Google and Social MediaOptimizing Your Author Website for Google and Social Media
Optimizing Your Author Website for Google and Social Media
 
сLasik
сLasikсLasik
сLasik
 
Petit dej digital_magency-slideshare
Petit dej digital_magency-slidesharePetit dej digital_magency-slideshare
Petit dej digital_magency-slideshare
 

Similar to Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin

Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure bloomreacheng
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...Lucidworks
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...RightScale
 
Failover-Apachecon-Asia-2022.pptx
Failover-Apachecon-Asia-2022.pptxFailover-Apachecon-Asia-2022.pptx
Failover-Apachecon-Asia-2022.pptxDavidKjerrumgaard1
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application clusterSatishbabu Gunukula
 
UKOUG Tech15 - Overheads of RAC?
UKOUG Tech15 - Overheads of RAC?UKOUG Tech15 - Overheads of RAC?
UKOUG Tech15 - Overheads of RAC?Zahid Anwar (OCM)
 

Similar to Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin (20)

Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...[RightScale Webinar] Architecting Databases in the cloud:  How RightScale Doe...
[RightScale Webinar] Architecting Databases in the cloud: How RightScale Doe...
 
Failover-Apachecon-Asia-2022.pptx
Failover-Apachecon-Asia-2022.pptxFailover-Apachecon-Asia-2022.pptx
Failover-Apachecon-Asia-2022.pptx
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application cluster
 
UKOUG Tech15 - Overheads of RAC?
UKOUG Tech15 - Overheads of RAC?UKOUG Tech15 - Overheads of RAC?
UKOUG Tech15 - Overheads of RAC?
 

Recently uploaded

Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 

Recently uploaded (20)

Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 

Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin

  • 1.
  • 2. Solr Compute Cloud – An Elastic Solr Infrastructure Nitin Sharma - Member of technical staff, BloomReach - nitin.sharma@bloomreach.com
  • 3. Abstract Scaling search platforms is an extremely hard problem • Serving hundreds of millions of documents • Low latency • High throughput workloads • Optimized cost. At BloomReach, we have implemented SC2, an elastic Solr infrastructure for big data applications that: • Supports heterogeneous workloads while hosted in the cloud. • Dynamically grows/shrinks search servers • Application and Pipeline level isolation, NRT search and indexing. • Offers latency guarantees and application-specific performance tuning. • Provides high-availability features like cluster replacement, cross-data center support, disaster recovery etc.
  • 4. About Us BloomReach BloomReach has developed a personalized discovery platform that features applications that analyze big data to makes our customers’ digital content more discoverable, relevant and profitable. Myself I work on search platform scaling for BloomReach’s big data. My relevant experience and background includes scaling real-time services for latency sensitive applications and building performance and search- quality metrics infrastructure for personalization platforms.
  • 6. BloomReach’s Applications Organic Search Contentunderstanding What it does Content optimization, management and measurement Benefit Enhanced discoverability and customer acquisition in organic search What it does Personalized onsite search and navigation across devices Benefit Relevant and consistent onsite experiences for new and known users What it does Merchandising tool that understa nds products and identifies oppo rtunities Benefit Prioritize and optimize online merchandising SNAP Compass
  • 7. Agenda • BloomReach search use cases and architecture • Old architecture and issues • Scaling challenges • Elastic SolrCloud architecture and benefits • Lessons learned
  • 8. BloomReach Search Use Cases 1. Front-end (serving) queries – Uptime and Latency sensitive 2. Batch search pipelines – Throughput sensitive 3. Time bound indexing requirements – Customer Specific 4. Time bound Solr config updates
  • 9. BloomReach Search Architecture Solr Cluster Zookeeper Ensemble Map Reduce Pipelines (Reads) Indexing Pipelines Pipeline 1 Pipeline 2 Pipeline n Indexing 1 Indexing 2 Indexing n Heavy Load Moderate Load Light Load Legend Public API Search Traffic Search Traffic
  • 10. Throughput Issues… Solr Cluster Zookeeper Ensemble Pipeline 1 Pipeline 2 Pipeline n Indexing 1 Indexing 2 Indexing n Public API Search Traffic ● Heterogeneous read workload ● Same collection - different pipelines, different query patterns, different schedule ● Cache tuning is virtually impossible ● Larger pipeline starving the small ones ● Machine utilization determines throughput and stability of a pipeline at any point ● No isolation among jobs
  • 11. Stability and Uptime Issues… Solr Cluster Zookeeper Ensemble Pipeline 1 Pipeline 2 Pipeline n Indexing 1 Indexing 2 Indexing n Public API Search Traffic ● Bad clients – bring down the cluster/degrade performance ● Bad queries (with heavy load) – render nodes unresponsive ● Garbage collection issues ● ZK stability issues (as we scale collections) ● CPU /Load Issues ● Higher number of concurrent pipelines, higher number of issues
  • 12. Indexing Issues… Solr Cluster Zookeeper Ensemble Pipeline 1 Pipeline 2 Pipeline n Indexing 1 Indexing 2 Indexing n Public API Search Traffic ● Commit frequencies vary with indexer types ● Indexer run during another pipeline – performance ● Indexer client leaks ● Too many stored fields ● Non-batch updates
  • 13. Rethinking… • Shared cluster for pipelines does not scale. • Guaranteeing an uptime of 99.99+ - non trivial • Every job runs great in isolation. When you put them together, they fail. • Running index-heavy load and read-heavy load - cluster performance issues. • Any direct access to production cluster – cluster stability (client leaks, bad queries etc.). What if every pipeline had its own cluster?
  • 14. Solr Compute Cloud (SC2) • Elastic Infrastructure – Provision Solr Clusters on demand, on-the-fly. • Create, Use, Terminate Model - Create a temporary cluster with necessary data, use it and throw it away. • Technologies behind SC2 (built in House) Cluster Management API - Dynamic cluster provisioning and resource allocation. Solr HAFT – High availability and data management library for SolrCloud. • Isolation - Pipelines get their own cluster. One cannot disrupt another. • Dynamic Scaling – Every pipeline can state its own replication requirements. • Production Safeguard - No direct access. Safeguards from bad clients/access patterns. • Cost Saving – Provision for the average; withstand peak with elastic growth.
  • 15. Solr Compute Cloud Solr Cluster Zookeeper Ensemble Pipeline 1 Solr Compute Cloud API Solr Cluster Collection A Replicas: 6 1. Read pipeline requests collection and desired replicas from SC2 API. 2. SC2 API provisions cluster dynamically with needed setup (and streams Solr data). 3. SC2 calls HAFT service to replicate data from production to provisioned cluster. 4. Pipeline uses this cluster to run job. 1 4 Request: {Collection: A, Replica: 6} 2 Solr HAFT Service 3 3 Read Replicate
  • 16. Solr Compute Cloud… Solr Cluster Zookeeper Ensemble Pipeline 1 Solr Compute Cloud API Solr Cluster Collection A Replicas: 6 1. Pipeline finishes running the job. 2. Pipeline calls SC2 API to terminate the cluster. 3. SC2 terminates the cluster. 2 Terminate: {Cluster} 3 Solr HAFT Service 1
  • 17. Solr Compute Cloud – Read Pipeline View Zookeeper Ensemble Pipeline 1 Solr Compute Cloud API Solr Cluster Collection A Replicas: 6 Request: {Collection: A, Replica: 6} Pipeline 2 Solr Cluster Collection B Replicas: 2 Request: {Collection: B, Replica: 2} Pipeline n Solr Cluster Collection C Replicas: 1 Request: {Collection: C, Replica: 1} Solr HAFT Service Production Solr Cluster
  • 18. Solr Compute Cloud – Indexing Production Solr Cluster Zookeeper Ensemble Indexing Solr Compute Cloud API Solr Cluster Collection A Replicas: 6 1. Read pipeline requests collection and desired replicas from SC2 API. 2. SC2 API provisions cluster dynamically with needed setup (and streams Solr data). 1. Indexer uses this cluster to index the data. 2. Indexer calls HAFT service to replicate the index from dynamic cluster to production. 3. HAFT service reads data from dynamic cluster and replicates to production Solr. 1 3 Request: {Collection: A, Replica: 2} 2 Replicate Solr HAFT Service 4 5 Read
  • 19. Solr Compute Cloud – Global View Zookeeper Ensemble Solr Compute Cloud API Solr HAFT Service Production Solr Cluster Indexing Pipelines 1 Elastic Clusters Read Pipelines 1 Read Pipelines n Indexing Pipelines n Provision: {Cluster} Terminate: {Cluster} Replicate Index Replicate Index Run Job
  • 20. Solr Compute Cloud API 1. API to provision clusters on demand. 2. Dynamic cluster and resource allocation (includes cost optimization) 3. Track request state, cluster performance and cost. 4. Terminate long-running, runaway clusters.
  • 21. Solr HAFT Service 1. High availability and fault tolerance 2. Home-grown technology 3. Open Source -  (Work in progress) 4. Features • One push disaster recovery • High availability operations • Replace node • Add replicas • Repair collection • Collection versioning • Cluster backup operations • Dynamic replica creation • Cluster clone • Cluster swap • Cluster state reconstruction
  • 22. Solr HAFT Service Clone Alias Clone Collections Custom Commit Node Replacement Node Repair Clone Cluster Collection Versioning Black Box Recording Lucene Segment Optimize Index Management Actions High Availability Actions Cluster Backup Operations Solr Metadata Zookeeper Metadata Verification Monitoring Solr HAFT Service – Functional View Dynamic Replica Creation Cluster Clone Cluster Swap Cluster State Reconstruction
  • 23. Disaster Recovery in New Architecture Old Production Solr Cluster Zookeeper Ensemble New Solr Cluster Zookeeper Ensemble Solr HAFT Service Push Button Recovery Brave Soul on Pager Duty 1 2 DNS 3 1. Guy on Pager clicks the recovery button 2. Solr HAFT Service triggers Cluster Setup State Reconstruction Cluster Clone Cluster Swap 3. Production DNS – New Cluster
  • 24. SC2 vs Non-SC2 (Stability Features) Property Non-SC2 SC2 Linear Scalability for Heterogeneous Workload Pipeline Level Isolation Dynamic Collection Scaling Prevention from Bad Clients Pipeline Specific Performance No Direct Access to Production Cluster Can Sleep at night? 
  • 25. SC2 vs Non-SC2 (Availability Features) Property Non-SC2 SC2 Cross Data-Center Support Cluster Cloning Collection Versioning One-Push Disaster Recovery Repair API for Nodes/Collections Node Replacement
  • 26. Lessons Learned 1. Solr is a search platform. Do not use it as a database (for scans and lookups). Evaluate your stored fields. 2. Understand access patterns, QPS and queries in detail. Be careful when tuning caches. 3. Have access control for large-scale jobs that directly talk to your cluster. (Internal DDOS attacks are hard to track.) 4. Instrument every piece of infrastructure and collect metrics. 5. Build automated disaster recovery (You will need it. )

Editor's Notes

  1. GM. Thanks for making it to the session I am Nitin… The Talk is about SC2 which was built inside bloomreach to scale to our search use cases. If things go as planned, we should have a few mins for questions. If not I will be more than happy to talk offline. Please post your questions in the activity feed
  2. Typical Search Platforms have low latency requirements as we scale # of collections and # of documents Performance, Availability, Scalability and Stability Job Level Isolation with latency and SLA guarantees Diaster Recovery Platform This presentation will describe an innovative implementation of scaling Solr in an elastic fashion..
  3. Bloomreach is a big data based marketing platform We offer products that make our customer’s digital content more relevant
  4. What kind of search use cases does bloomreach have? A year worth of work and home grown technologies. Staying at a high level Glad to go over offline.
  5. Latency Sensitive Frontend Applications (Incoming customer queries through api) Huge Batch (Map reduce based jobs) jobs constant analyzing and figuring out the relevant content Time Sensitive index reads and writes
  6. Picture is worth a 1000 words Search Traffic – Different set of queries (utilizing all sorts of features faceting, sorting , custom ranking) ETL pipelines, Product based pipelines Pipelines Running at different schedules, different QPS and different throughput requirements. Indexer (Partial and Full indexing) Lets go over what kind of issues we encountered with this setup
  7. Red Circle indicates the issues. Multiple Products, Mulitple read pipleines. They run at different schedules. One can starve the other. One can screw up the latencies of the other. The setup is very static Un even load. Mem/CPU Usage varies and so does the machine utilization OOM PermGen issues (#Collections) Old Gen Issues High CPU Bad Queries Even Sharding Unused features Highlighting Random jars
  8. Bad clients – bring down the cluster/degrade performance (Any misbehaving job shoud Fail instead of bringing cluster down?) Bad queries (with heavy load) – render nodes unresponsive Perm Gen , Old Gen , Young Gen (Parallel GC vs CMS) ZK txn log and data dirs in different drives. We have developed multiple custom components. They might cause exceptions during querytime
  9. Auto commits did help to a certain extent but not too much No job level isolation Client Leaks No Rate limiting
  10. Summary of issues Ok how do we achieve that? Introducing SC2
  11. Spend More time… Linearly Scalable with Dynamic Scaling No direct access to production. Any misbehaving job “Fails” instead of bringing cluster down? Cost Saving:
  12. Instead of from , you replicate to production Access to production is only through replication and not through any other means
  13. Spend More Time…
  14. Old vs New architecture