SlideShare a Scribd company logo
1 of 30
About Me 
• Lucene/Solr committer. Work for Lucidworks; focus on hardening 
SolrCloud, devops, big data architecture / deployments 
• Operated smallish cluster in AWS for Dachis Group (1.5 years ago, 18 
shards ~900M docs) 
• Solr Scale Toolkit: Fabric/boto framework for deploying and managing 
clusters in EC2 
• Co-author of Solr In Action with Trey Grainger
Agenda 
1. Quick review of the SolrCloud architecture 
2. Indexing & Query performance tests 
3. Solr Scale Toolkit (quick overview) 
4. Q & A
Solr in the wild … 
https://twitter.com/bretthoerner/status/476830302430437376
SolrCloud distilled 
Subset of optional features in Solr to enable and 
simplify horizontal scaling a search index using 
sharding and replication. 
Goals 
performance, scalability, high-availability, 
simplicity, elasticity, and 
community-driven!
Collection == distributed index 
A collection is a distributed index defined by: 
• named configuration stored in ZooKeeper 
• number of shards: documents are distributed across N partitions of the index 
• document routing strategy: how documents get assigned to shards 
• replication factor: how many copies of each document in the collection 
Collections API: 
curl "http://localhost:8983/solr/admin/collections? 
action=CREATE&name=logstash4solr&replicationFactor=2& 
numShards=2&collection.configName=logs"
SolrCloud High-level Architecture
ZooKeeper 
• Is a very good thing ... clusters are a zoo! 
• Centralized configuration management 
• Cluster state management 
• Leader election (shard leader and overseer) 
• Overseer distributed work queue 
• Live Nodes 
• Ephemeral znodes used to signal a server is gone 
• Needs at least 3 nodes for quorum in production
ZooKeeper: State Management 
• Keep track of live nodes /live_nodes znode 
• ephemeral nodes 
• ZooKeeper client timeout 
• Collection metadata and replica state in /clusterstate.json 
• Every Solr node has watchers for /live_nodes and /clusterstate.json 
• Leader election 
• ZooKeeper sequence number on ephemeral znodes
Scalability Highlights 
• No split-brain problems (b/c of ZooKeeper) 
• All nodes in cluster perform indexing and execute queries; no master node 
• Distributed indexing: No SPoF, high throughput via direct updates to 
leaders, automated failover to new leader 
• Distributed queries: Add replicas to scale-out qps; parallelize complex query 
computations; fault-tolerance 
• Indexing / queries continue so long as there is 1 healthy replica per shard
Cluster sizing 
How many servers do I need to index X docs? 
... shards ... ? 
... replicas ... ? 
I need N queries per second over M docs, how many 
servers do I need? 
It depends!
Testing Methodology 
• Transparent repeatable results 
• Ideally hoping for something owned by the community 
• Synthetic docs ~ 1K each on disk, mix of field types 
• Data set created using code borrowed from PigMix 
• English text fields generated using a Zipfian distribution 
• Java 1.7u67, Amazon Linux, r3.2xlarge nodes 
• enhanced networking enabled, placement group, same AZ 
• Stock Solr (cloud) 4.10 
• Using custom GC tuning parameters and auto-commit settings 
• Use Elastic MapReduce to generate indexing load 
• As many nodes as I need to drive Solr!
Indexing Performance 
Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec 
10 10 1 48 1762 73,780 
10 10 2 34 3727 34,881 
10 20 1 48 1282 101,404 
10 20 2 34 3207 40,536 
10 30 1 72 1070 121,495 
10 30 2 60 3159 41,152 
15 15 1 60 1106 117,541 
15 15 2 42 2465 52,738 
15 30 1 60 827 157,195 
15 30 2 42 2129 61,062
Visualize Server Performance
Direct Updates to Leaders
Replication
Indexing Performance Lessons 
• Solr has no built-in throttling support – will accept work until it falls over; need to build this into 
your indexing application logic 
• Oversharding helps parallelize indexing work and gives you an easy way to add more 
hardware to your cluster 
• GC tuning is critical (more below) 
• Auto-hard commit to keep transaction logs manageable 
• Auto soft-commit to see docs as they are indexed 
• Replication is expensive! (more work needed here)
GC Tuning 
• Stop-the-world GC pauses can lead to ZooKeeper session expiration (which is bad) 
• More JVMs with smaller heap sizes are better! (12-16GB max per JVM ~ less if you can) 
• MMapDirectory relies on sufficient memory available to the OS cache (off-heap) 
• GC activity during Solr indexing is stable and generally doesn’t cause any stop-the-world 
collections … queries are a different story 
• Enable verbose GC logging (even in prod) so you can troubleshoot issues: 
-verbose:gc –Xloggc:gc.log -XX:+PrintHeapAtGC -XX:+PrintGCDetails  
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps  
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime  
-XX:+PrintGCApplicationConcurrentTime
GC Flags I use with Solr 
-Xss256k  
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC  
-XX:MaxTenuringThreshold=8 -XX:NewRatio=3  
-XX:CMSInitiatingOccupancyFraction=40  
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4  
-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90  
-XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=12m  
-XX:CMSFullGCsBeforeCompaction=1  
-XX:+UseCMSInitiatingOccupancyOnly  
-XX:CMSTriggerPermRatio=80  
-XX:CMSMaxAbortablePrecleanTime=6000  
-XX:+CMSParallelRemarkEnabled  
-XX:+ParallelRefProcEnabled  
-XX:+UseLargePages -XX:+AggressiveOpts
Sizing GC Spaces 
http://kumarsoablog.blogspot.com/2013/02/jvm-parameter-survivorratio_7.html
Query Performance 
• Still a work in progress! 
• Sustained QPS & Execution time of 99th Percentile (coda hale metrics is good for this) 
• Stable: ~5,000 QPS / 99th at 300ms while indexing ~10,000 docs / sec 
• Using the TermsComponent to build queries based on the terms in each field. 
• Harder to accurately simulate user queries over synthetic data 
• Need mix of faceting, paging, sorting, grouping, boolean clauses, range queries, boosting, filters (some 
cached, some not), etc ... 
• Does the randomness in your test queries model (expected) user behavior? 
• Start with one server (1 shard) to determine baseline query performance. 
• Look for inefficiencies in your schema and other config settings
Query Performance, cont. 
• Higher risk of full GC pauses (facets, filters, sorting) 
• Use optimized data structures (DocValues) for facet / sort fields, Trie-based numeric fields for 
range queries, facet.method=enum for low cardinality fields 
• Check sizing of caches, esp. filterCache in solrconfig.xml 
• Add more replicas; load-balance; Solr can set HTTP headers to work with caching proxies like 
Squid 
• -Dhttp.maxConnections=## (default = 5, increase to accommodate more threads sending 
queries) 
• Avoid increasing ZooKeeper client timeout ~ 15000 (15 seconds) is about right 
• Don’t just keep throwing more memory at Java! –Xmx128G
Call me maybe - Jepsen 
https://github.com/aphyr/jepsen 
• Solr tests being developed by Lucene/Solr committer Shalin 
Mangar (@shalinmanger) 
• Prototype in place: 
• No ack’d writes were lost! 
• No un-ack’d writes succeeded 
See: https://github.com/LucidWorks/jepsen/tree/solr-jepsen
Solr Scale Toolkit 
• Open source: https://github.com/LucidWorks/solr-scale-tk 
• Fabric (Python) toolset for deploying and managing SolrCloud clusters in the cloud 
• Code to support benchmark tests (Pig script for data generation / indexing, JMeter samplers) 
• EC2 for now, more cloud providers coming soon via Apache libcloud 
• Contributors welcome! 
• More info: http://searchhub.org/2014/06/03/introducing-the-solr-scale-toolkit/
Provisioning cluster nodes 
fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge 
• Custom built AMI (one for PV instances and one for HVM instances) – 
Amazon Linux 
• Block device mapping 
• dedicated disk per Solr node 
• Launch and then poll status until they are live 
• verify SSH connectivity 
• Tag each instance with a cluster ID and username
Deploy ZooKeeper ensemble 
fab new_zk_ensemble:zk1,n=3 
• Two options: 
• provision 1 to N nodes when you launch Solr cluster 
• use existing named ensemble 
• Fabric command simply creates the myid files and zoo.cfg file for the 
ensemble 
• and some cron scripts for managing snapshots 
• Basic health checking of ZooKeeper status: 
echo srvr | nc localhost 2181
Deploy SolrCloud cluster 
fab new_solrcloud:test1,zk=zk1,nodesPerHost=2 
• Uses bin/solr in Solr 4.10 to control Solr nodes 
• Set system props: jetty.port, host, zkHost, JVM opts 
• One or more Solr nodes per machine 
• JVM mem opts dependent on instance type and # of Solr nodes 
per instance 
• Optionally configure log4j.properties to append messages to 
Rabbitmq for SiLK integration
Automate day-to-day cluster management tasks 
• Deploy a configuration directory to ZooKeeper 
• Create a new collection 
• Attach a local JConsole/VisualVM to a remote JVM 
• Rolling restart (with Overseer awareness) 
• Build Solr locally and patch remote 
• Use a relay server to scp the JARs to Amazon network once and then scp them to other nodes 
from within the network 
• Put/get files 
• Grep over all log files (across the cluster)
Wrap-up and Q & A 
• LucidWorks: http://www.lucidworks.com -- We’re hiring! 
• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk 
• SiLK: http://www.lucidworks.com/lucidworks-silk/ 
• Solr In Action: http://www.manning.com/grainger/ 
• Connect: @thelabdude / tim.potter@lucidworks.com
Benchmarking Solr Performance at Scale

More Related Content

What's hot

Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Lucidworks
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
thelabdude
 
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Lucidworks
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search Performance
Lucidworks (Archived)
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Lucidworks
 

What's hot (20)

Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
Faceting Optimizations for Solr: Presented by Toke Eskildsen, State & Univers...
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search Performance
 
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
ApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr IntegrationApacheCon NA 2015 Spark / Solr Integration
ApacheCon NA 2015 Spark / Solr Integration
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 

Similar to Benchmarking Solr Performance at Scale

Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
Enkitec
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
Enkitec
 

Similar to Benchmarking Solr Performance at Scale (20)

Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Container Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, NetflixContainer Performance Analysis Brendan Gregg, Netflix
Container Performance Analysis Brendan Gregg, Netflix
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Where Django Caching Bust at the Seams
Where Django Caching Bust at the SeamsWhere Django Caching Bust at the Seams
Where Django Caching Bust at the Seams
 

Recently uploaded

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 

Recently uploaded (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 

Benchmarking Solr Performance at Scale

  • 1.
  • 2. About Me • Lucene/Solr committer. Work for Lucidworks; focus on hardening SolrCloud, devops, big data architecture / deployments • Operated smallish cluster in AWS for Dachis Group (1.5 years ago, 18 shards ~900M docs) • Solr Scale Toolkit: Fabric/boto framework for deploying and managing clusters in EC2 • Co-author of Solr In Action with Trey Grainger
  • 3. Agenda 1. Quick review of the SolrCloud architecture 2. Indexing & Query performance tests 3. Solr Scale Toolkit (quick overview) 4. Q & A
  • 4. Solr in the wild … https://twitter.com/bretthoerner/status/476830302430437376
  • 5. SolrCloud distilled Subset of optional features in Solr to enable and simplify horizontal scaling a search index using sharding and replication. Goals performance, scalability, high-availability, simplicity, elasticity, and community-driven!
  • 6. Collection == distributed index A collection is a distributed index defined by: • named configuration stored in ZooKeeper • number of shards: documents are distributed across N partitions of the index • document routing strategy: how documents get assigned to shards • replication factor: how many copies of each document in the collection Collections API: curl "http://localhost:8983/solr/admin/collections? action=CREATE&name=logstash4solr&replicationFactor=2& numShards=2&collection.configName=logs"
  • 8. ZooKeeper • Is a very good thing ... clusters are a zoo! • Centralized configuration management • Cluster state management • Leader election (shard leader and overseer) • Overseer distributed work queue • Live Nodes • Ephemeral znodes used to signal a server is gone • Needs at least 3 nodes for quorum in production
  • 9. ZooKeeper: State Management • Keep track of live nodes /live_nodes znode • ephemeral nodes • ZooKeeper client timeout • Collection metadata and replica state in /clusterstate.json • Every Solr node has watchers for /live_nodes and /clusterstate.json • Leader election • ZooKeeper sequence number on ephemeral znodes
  • 10. Scalability Highlights • No split-brain problems (b/c of ZooKeeper) • All nodes in cluster perform indexing and execute queries; no master node • Distributed indexing: No SPoF, high throughput via direct updates to leaders, automated failover to new leader • Distributed queries: Add replicas to scale-out qps; parallelize complex query computations; fault-tolerance • Indexing / queries continue so long as there is 1 healthy replica per shard
  • 11. Cluster sizing How many servers do I need to index X docs? ... shards ... ? ... replicas ... ? I need N queries per second over M docs, how many servers do I need? It depends!
  • 12. Testing Methodology • Transparent repeatable results • Ideally hoping for something owned by the community • Synthetic docs ~ 1K each on disk, mix of field types • Data set created using code borrowed from PigMix • English text fields generated using a Zipfian distribution • Java 1.7u67, Amazon Linux, r3.2xlarge nodes • enhanced networking enabled, placement group, same AZ • Stock Solr (cloud) 4.10 • Using custom GC tuning parameters and auto-commit settings • Use Elastic MapReduce to generate indexing load • As many nodes as I need to drive Solr!
  • 13. Indexing Performance Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec 10 10 1 48 1762 73,780 10 10 2 34 3727 34,881 10 20 1 48 1282 101,404 10 20 2 34 3207 40,536 10 30 1 72 1070 121,495 10 30 2 60 3159 41,152 15 15 1 60 1106 117,541 15 15 2 42 2465 52,738 15 30 1 60 827 157,195 15 30 2 42 2129 61,062
  • 15. Direct Updates to Leaders
  • 17. Indexing Performance Lessons • Solr has no built-in throttling support – will accept work until it falls over; need to build this into your indexing application logic • Oversharding helps parallelize indexing work and gives you an easy way to add more hardware to your cluster • GC tuning is critical (more below) • Auto-hard commit to keep transaction logs manageable • Auto soft-commit to see docs as they are indexed • Replication is expensive! (more work needed here)
  • 18. GC Tuning • Stop-the-world GC pauses can lead to ZooKeeper session expiration (which is bad) • More JVMs with smaller heap sizes are better! (12-16GB max per JVM ~ less if you can) • MMapDirectory relies on sufficient memory available to the OS cache (off-heap) • GC activity during Solr indexing is stable and generally doesn’t cause any stop-the-world collections … queries are a different story • Enable verbose GC logging (even in prod) so you can troubleshoot issues: -verbose:gc –Xloggc:gc.log -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime
  • 19. GC Flags I use with Solr -Xss256k -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:MaxTenuringThreshold=8 -XX:NewRatio=3 -XX:CMSInitiatingOccupancyFraction=40 -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=12m -XX:CMSFullGCsBeforeCompaction=1 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSTriggerPermRatio=80 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
  • 20. Sizing GC Spaces http://kumarsoablog.blogspot.com/2013/02/jvm-parameter-survivorratio_7.html
  • 21. Query Performance • Still a work in progress! • Sustained QPS & Execution time of 99th Percentile (coda hale metrics is good for this) • Stable: ~5,000 QPS / 99th at 300ms while indexing ~10,000 docs / sec • Using the TermsComponent to build queries based on the terms in each field. • Harder to accurately simulate user queries over synthetic data • Need mix of faceting, paging, sorting, grouping, boolean clauses, range queries, boosting, filters (some cached, some not), etc ... • Does the randomness in your test queries model (expected) user behavior? • Start with one server (1 shard) to determine baseline query performance. • Look for inefficiencies in your schema and other config settings
  • 22. Query Performance, cont. • Higher risk of full GC pauses (facets, filters, sorting) • Use optimized data structures (DocValues) for facet / sort fields, Trie-based numeric fields for range queries, facet.method=enum for low cardinality fields • Check sizing of caches, esp. filterCache in solrconfig.xml • Add more replicas; load-balance; Solr can set HTTP headers to work with caching proxies like Squid • -Dhttp.maxConnections=## (default = 5, increase to accommodate more threads sending queries) • Avoid increasing ZooKeeper client timeout ~ 15000 (15 seconds) is about right • Don’t just keep throwing more memory at Java! –Xmx128G
  • 23. Call me maybe - Jepsen https://github.com/aphyr/jepsen • Solr tests being developed by Lucene/Solr committer Shalin Mangar (@shalinmanger) • Prototype in place: • No ack’d writes were lost! • No un-ack’d writes succeeded See: https://github.com/LucidWorks/jepsen/tree/solr-jepsen
  • 24. Solr Scale Toolkit • Open source: https://github.com/LucidWorks/solr-scale-tk • Fabric (Python) toolset for deploying and managing SolrCloud clusters in the cloud • Code to support benchmark tests (Pig script for data generation / indexing, JMeter samplers) • EC2 for now, more cloud providers coming soon via Apache libcloud • Contributors welcome! • More info: http://searchhub.org/2014/06/03/introducing-the-solr-scale-toolkit/
  • 25. Provisioning cluster nodes fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge • Custom built AMI (one for PV instances and one for HVM instances) – Amazon Linux • Block device mapping • dedicated disk per Solr node • Launch and then poll status until they are live • verify SSH connectivity • Tag each instance with a cluster ID and username
  • 26. Deploy ZooKeeper ensemble fab new_zk_ensemble:zk1,n=3 • Two options: • provision 1 to N nodes when you launch Solr cluster • use existing named ensemble • Fabric command simply creates the myid files and zoo.cfg file for the ensemble • and some cron scripts for managing snapshots • Basic health checking of ZooKeeper status: echo srvr | nc localhost 2181
  • 27. Deploy SolrCloud cluster fab new_solrcloud:test1,zk=zk1,nodesPerHost=2 • Uses bin/solr in Solr 4.10 to control Solr nodes • Set system props: jetty.port, host, zkHost, JVM opts • One or more Solr nodes per machine • JVM mem opts dependent on instance type and # of Solr nodes per instance • Optionally configure log4j.properties to append messages to Rabbitmq for SiLK integration
  • 28. Automate day-to-day cluster management tasks • Deploy a configuration directory to ZooKeeper • Create a new collection • Attach a local JConsole/VisualVM to a remote JVM • Rolling restart (with Overseer awareness) • Build Solr locally and patch remote • Use a relay server to scp the JARs to Amazon network once and then scp them to other nodes from within the network • Put/get files • Grep over all log files (across the cluster)
  • 29. Wrap-up and Q & A • LucidWorks: http://www.lucidworks.com -- We’re hiring! • Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk • SiLK: http://www.lucidworks.com/lucidworks-silk/ • Solr In Action: http://www.manning.com/grainger/ • Connect: @thelabdude / tim.potter@lucidworks.com

Editor's Notes

  1. Brett is at Spredfast (ATX), 12-hr sharding scheme (180 shards)
  2. ZooKeeper: Distributed coordination service that provides centralized configuration, cluster state management, and leader election Node: JVM process bound to a specific port on a machine; hosts the Solr web application Collection: Search index distributed across multiple nodes; each collection has a name, shard count, and replication factor Replication Factor: Number of copies of a document in a collection Shard: Logical slice of a collection; each shard has a name, hash range, leader, and replication factor. Documents are assigned to one and only one shard per collection using a hash-based document routing strategy. Replica: Solr index that hosts a copy of a shard in a collection; behind the scenes, each replica is implemented as a Solr core Leader: Replica in a shard that assumes special duties needed to support distributed indexing in Solr; each shard has one and only one leader at any time and leaders are elected using ZooKeeper
  3. You’re not going to tune your way out of every query problem!