SlideShare a Scribd company logo
1 of 25
Search | Discover | Analyze 
Confidential and Proprietary © Copyright 2013 
Benchmarking Solr 
Performance 
June 18, 2014 
Timothy Potter
My SolrCloud Experience 
• At LucidWorks, mostly focused on hardening SolrCloud; Lucene/Solr 
committer 
• Operated 36 node cluster in AWS for Dachis Group (1.5 years ago, 18 
shards ~900M docs) 
• Built a Fabric/boto framework for deploying and managing a cluster in 
EC2 
• Co-author of Solr In Action 
Confidential and Proprietary © Copyright 2013
Agenda 
• Indexing performance tests 
• Solr Scale Toolkit 
• Next steps 
Confidential and Proprietary © Copyright 2013
Cluster sizing 
How many servers do I need to index X docs? 
... shards ... ? 
... replicas ... ? 
I need N queries per second over 
M docs, how many servers do I need? 
It depends?!? 
Confidential and Proprietary © Copyright 2013
Methodology 
• Transparent repeatable results 
– Ideally hoping for something owned by the community 
• Synthetic docs ~ 1K each on disk, mix of field types 
– Data set created using code borrowed from PigMix 
– English text fields generated using a Zipfian distribution 
• Java 1.7u55, Amazon Linux, r3.2xlarge nodes 
– enhanced networking enabled, placement group, same AZ 
• Stock Solr (cloud) 4.8.1 
– Using Shawn Heisey’s GC tuning parameters 
• Use Elastic MapReduce to generate load 
– As many nodes as I need to drive Solr! 
Confidential and Proprietary © Copyright 2013
Indexing Results 
Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec 
10 10 1 48 1762 73,780 
10 10 2 34 3727 34,881 
10 20 1 48 1282 101,404 
10 20 2 34 3207 40,536 
10 30 1 72 1070 121,495 
10 30 2 60 3159 41,152 
15 15 1 60 1106 117,541 
15 15 2 42 2465 52,738 
15 30 1 60 827 157,195 
15 30 2 42 2129 61,062 
Confidential and Proprietary © Copyright 2013
Direct Updates 
Indexing 
Client 1 
<doc> 
Confidential and Proprietary © Copyright 2013 
CloudSolrServer 
(SolrJ) 
ZooKeeper 
/clusterstate.json 
Shard 1 
(leader) 
Shard 2 
(leader) 
Shard 3 
(leader) 
<doc> 
<doc> 
Watch 
/clusterstate.json 
<doc> 
compute shard 
assignment on 
batch client
Replication 
CloudSolrServer 
(SolrJ) 
ZooKeeper 
/clusterstate.json 
Confidential and Proprietary © Copyright 2013 
Shard 1 
(leader) 
Shard 2 
(leader) 
Shard 3 
(leader) 
<doc> 
<doc> 
Watch 
/clusterstate.json 
<doc> 
Shard 1 
(replica) 
Shard 2 
(replica) 
Shard 3 
(replica) 
Blocks for response 
from replica(s)
Don’t swamp your servers! 
Confidential and Proprietary © Copyright 2013
Lessons Learned 
• Know what throughput your client side is capable of 
generating 
– If in MapReduce, index from reducers with speculative execution 
disabled 
• Don’t change Solr config without good reasons for doing 
so 
• Overshard (but not too much) 
• Near-linear scalability as I added nodes! 
Confidential and Proprietary © Copyright 2013
Query Performance Tests 
• All nodes in SolrCloud perform indexing and execute queries 
• Using the TermsComponent to build queries based on the terms in 
each field. 
• Harder to accurately simulate user queries over synthetic data 
– Need mix of faceting, paging, sorting, grouping, boolean clauses, range 
queries, boosting, filters (some cached, some not), etc ... 
• Does the randomness in your test queries model (expected) user 
behavior? 
• Start with one server (1 shard) to determine baseline query 
performance. 
– Look for inefficiencies in your schema and other config settings 
Confidential and Proprietary © Copyright 2013
Solr Scale Toolkit 
• Fabric / Python based toolset for deploying and 
managing SolrCloud clusters 
• SolrJ-based client application useful for building 
tools that need access to cluster state information 
in ZooKeeper 
• Code to support benchmarks for Solr 
Confidential and Proprietary © Copyright 2013
Python-based Tools 
boto – Python API for AWS (EC2, S3, etc) 
Fabric – Python-based tool for automating system admin tasks 
over SSH 
pysolr – Python library for Solr (sending commits, queries, ...) 
kazoo – Python client tools for ZooKeeper 
Supporting Cast: 
JMeter – run tests, generate reports 
collectd – system monitoring 
Logstash4Solr – log aggregation 
JConsole/VisualVM – monitor JVM during indexing / queries 
Confidential and Proprietary © Copyright 2013
Solr Scale Toolkit: Demo 
• Launch a meta node 
– Log agg / basic monitoring using SiLK 
• Launch ZooKeeper Ensemble 
– 3 nodes to establish quorum 
– Setup cron job to clean-up snapshots 
• Launch SolrCloud cluster 
• Create new collection and index some docs 
– Attach JConsole while indexing 
• Run a healthcheck on the collection 
• Checkout Banana Dashboard 
• Backup / Restore 
– Requires patch for SOLR-5956 
– Use fab patch_jars to update jars and do a rolling restart 
Confidential and Proprietary © Copyright 2013
Provisioning machines 
fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge 
• Custom built AMI? 
• Block device mapping 
– dedicated disk per Solr node 
• Launch and then poll status until they are live 
– verify SSH connectivity 
• Tag each instance with a cluster ID and username 
Confidential and Proprietary © Copyright 2013
ZooKeeper 
fab new_zk_ensemble:zk1,n=3 
• Two options: 
– provision 1 to N nodes when you launch Solr cluster 
– use existing named ensemble 
• Fabric command simply creates the myid 
files and zoo.cfg file for the ensemble 
– and some cron scripts for managing snapshots 
• Basic health checking of ZooKeeper status: 
– echo srvr | nc localhost 2181 
Confidential and Proprietary © Copyright 2013
SolrCloud 
fab new_solrcloud:test1,zk=zk1,nodesPerHost=2 
• Upload a BASH script that starts/stops Solr 
• Set system props: jetty.port, host, zkHost, JVM 
opts 
• One or more Solr nodes per machine 
• JVM mem opts dependent on instance type and 
# of Solr nodes per instance 
• Optionally configure log4j.properties to append 
messages to Rabbitmq for Logstash4Solr 
integration 
Confidential and Proprietary © Copyright 2013
solr-ctl.sh 
• BASH script that implements: 
– start/stop Solr nodes on each EC2 instance 
– sets JVM memory options, system properties 
(jetty.port), enable remote JMX, etc 
– backup log files before restarting nodes 
– ensure JVM is killed correctly before restarting 
• Environment variables in: 
solr-ctl-env.sh 
Confidential and Proprietary © Copyright 2013
Miscellaneous Utility Tasks 
• Deploy a configuration directory to ZooKeeper 
• Create a new collection 
• Attach a local JConsole/VisualVM to a remote JVM 
• Rolling restart (with Overseer awareness) 
• Build Solr locally and patch remote 
– Use a relay server to scp the JARs to Amazon network once and then 
scp them to other nodes from within the network 
• Put/get files 
• Grep over all log files (across the cluster) 
Confidential and Proprietary © Copyright 2013
Other useful stuff ... 
• fab mine: See clusters I’m running (or for other users too) 
• fab kill_mine: Terminate all instances I’m running 
– Use termination protection in production 
• fab ssh_to: Quick way to SSH to one of the nodes in a 
cluster 
• fab stop/recover/kill: Basic commands for controlling 
specific Solr nodes in the cluster 
• fab jmeter: Execute a JMeter test plan against your cluster 
– Example test plan and Java sampler is included with the source 
Confidential and Proprietary © Copyright 2013
SolrCloud Tools (SolrJ client app) 
./tools.sh –tool healthcheck 
• Java-based command-line application that uses SolrJ’s 
CloudSolrServer to perform advanced cluster 
management operations: 
– healthcheck: collect metadata and health information from all 
replicas for a collection from ZooKeeper 
– backup: create a snapshot of each shard in a collection for 
backing up to remote storage (S3) 
• Framework for building complex tools that benefit from 
having access to cluster state information in ZooKeeper 
Confidential and Proprietary © Copyright 2013
SiLK Integration 
• SiLK: Solr integrated with Logstash and Kibana 
– Index time-series data, such as log data (collectd, Solr logs, ...) 
– Build cool dashboards with Banana (fork of Kibana) 
• Easily aggregate all WARN and more severe log 
messages from all Solr servers into logstash4solr 
• Send collectd metrics to logstash4solr 
Confidential and Proprietary © Copyright 2013
SiLK Integration 
Confidential and Proprietary © Copyright 2013
What’s Next? 
• Migrate to using Apache libcloud instead of using boto 
directly 
• Benchmark mixed work-loads (queries and indexing) 
• SiLK is improving rapidly! 
• Chaos monkey tests 
– integrate jepsen? 
• Open source so please kick the tires! 
Confidential and Proprietary © Copyright 2013
Wrap-up 
• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk 
• LucidWorks: http://www.lucidworks.com 
• SiLK: http://www.lucidworks.com/lucidworks-silk/ 
• Solr In Action: http://www.manning.com/grainger/ 
• Connect: @thelabdude / tim.potter@lucidworks.com 
Questions? 
Confidential and Proprietary © Copyright 2013

More Related Content

What's hot

Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouseAltinity Ltd
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiGrowth Intelligence
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
Boost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresBoost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresNeo4j
 
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackIntroduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackAhmed AbouZaid
 
Logging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesLogging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesVineet Sabharwal
 
Observability and its application
Observability and its applicationObservability and its application
Observability and its applicationThao Huynh Quang
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaAltinity Ltd
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...confluent
 
Cilium + Istio with Gloo Mesh
Cilium + Istio with Gloo MeshCilium + Istio with Gloo Mesh
Cilium + Istio with Gloo MeshChristian Posta
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonTimothy Spann
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...Altinity Ltd
 

What's hot (20)

Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Boost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresBoost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined Procedures
 
Introduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK StackIntroduction to InfluxDB and TICK Stack
Introduction to InfluxDB and TICK Stack
 
Logging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesLogging using ELK Stack for Microservices
Logging using ELK Stack for Microservices
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Observability and its application
Observability and its applicationObservability and its application
Observability and its application
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache KafkaFast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
 
Cilium + Istio with Gloo Mesh
Cilium + Istio with Gloo MeshCilium + Istio with Gloo Mesh
Cilium + Istio with Gloo Mesh
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
LDAP
LDAPLDAP
LDAP
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 

Viewers also liked

Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Lucene gosenの紹介 solr勉強会第7回
Lucene gosenの紹介 solr勉強会第7回Lucene gosenの紹介 solr勉強会第7回
Lucene gosenの紹介 solr勉強会第7回Jun Ohtani
 
Hotspot Garbage Collection - The Useful Parts
Hotspot Garbage Collection - The Useful PartsHotspot Garbage Collection - The Useful Parts
Hotspot Garbage Collection - The Useful PartsjClarity
 
Elasticsearchベースの全文検索システムFess
Elasticsearchベースの全文検索システムFessElasticsearchベースの全文検索システムFess
Elasticsearchベースの全文検索システムFessShinsuke Sugaya
 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environmentlucenerevolution
 
第10回solr勉強会 solr cloudの導入事例
第10回solr勉強会 solr cloudの導入事例第10回solr勉強会 solr cloudの導入事例
第10回solr勉強会 solr cloudの導入事例Ken Hirose
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance TuningMinh Hoang
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkitthelabdude
 

Viewers also liked (20)

Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Lucene gosenの紹介 solr勉強会第7回
Lucene gosenの紹介 solr勉強会第7回Lucene gosenの紹介 solr勉強会第7回
Lucene gosenの紹介 solr勉強会第7回
 
Hotspot Garbage Collection - The Useful Parts
Hotspot Garbage Collection - The Useful PartsHotspot Garbage Collection - The Useful Parts
Hotspot Garbage Collection - The Useful Parts
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Elasticsearchベースの全文検索システムFess
Elasticsearchベースの全文検索システムFessElasticsearchベースの全文検索システムFess
Elasticsearchベースの全文検索システムFess
 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environment
 
第10回solr勉強会 solr cloudの導入事例
第10回solr勉強会 solr cloudの導入事例第10回solr勉強会 solr cloudの導入事例
第10回solr勉強会 solr cloudの導入事例
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 

Similar to Benchmarking Solr Performance

SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...Lucidworks
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache ZookeeperAnshul Patel
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxBSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxJasonOstrom1
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in AlfrescoAngel Borroy López
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systexJames Chen
 
Developing on SQL Azure
Developing on SQL AzureDeveloping on SQL Azure
Developing on SQL AzureIke Ellis
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceDocker, Inc.
 

Similar to Benchmarking Solr Performance (20)

SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Meetup on Apache Zookeeper
Meetup on Apache ZookeeperMeetup on Apache Zookeeper
Meetup on Apache Zookeeper
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptxBSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
BSidesDFW2022-PurpleTeam_Cloud_Identity.pptx
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
Spark etl
Spark etlSpark etl
Spark etl
 
Developing on SQL Azure
Developing on SQL AzureDeveloping on SQL Azure
Developing on SQL Azure
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
 
Clocker Evolution
Clocker EvolutionClocker Evolution
Clocker Evolution
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Benchmarking Solr Performance

  • 1. Search | Discover | Analyze Confidential and Proprietary © Copyright 2013 Benchmarking Solr Performance June 18, 2014 Timothy Potter
  • 2. My SolrCloud Experience • At LucidWorks, mostly focused on hardening SolrCloud; Lucene/Solr committer • Operated 36 node cluster in AWS for Dachis Group (1.5 years ago, 18 shards ~900M docs) • Built a Fabric/boto framework for deploying and managing a cluster in EC2 • Co-author of Solr In Action Confidential and Proprietary © Copyright 2013
  • 3. Agenda • Indexing performance tests • Solr Scale Toolkit • Next steps Confidential and Proprietary © Copyright 2013
  • 4. Cluster sizing How many servers do I need to index X docs? ... shards ... ? ... replicas ... ? I need N queries per second over M docs, how many servers do I need? It depends?!? Confidential and Proprietary © Copyright 2013
  • 5. Methodology • Transparent repeatable results – Ideally hoping for something owned by the community • Synthetic docs ~ 1K each on disk, mix of field types – Data set created using code borrowed from PigMix – English text fields generated using a Zipfian distribution • Java 1.7u55, Amazon Linux, r3.2xlarge nodes – enhanced networking enabled, placement group, same AZ • Stock Solr (cloud) 4.8.1 – Using Shawn Heisey’s GC tuning parameters • Use Elastic MapReduce to generate load – As many nodes as I need to drive Solr! Confidential and Proprietary © Copyright 2013
  • 6. Indexing Results Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec 10 10 1 48 1762 73,780 10 10 2 34 3727 34,881 10 20 1 48 1282 101,404 10 20 2 34 3207 40,536 10 30 1 72 1070 121,495 10 30 2 60 3159 41,152 15 15 1 60 1106 117,541 15 15 2 42 2465 52,738 15 30 1 60 827 157,195 15 30 2 42 2129 61,062 Confidential and Proprietary © Copyright 2013
  • 7. Direct Updates Indexing Client 1 <doc> Confidential and Proprietary © Copyright 2013 CloudSolrServer (SolrJ) ZooKeeper /clusterstate.json Shard 1 (leader) Shard 2 (leader) Shard 3 (leader) <doc> <doc> Watch /clusterstate.json <doc> compute shard assignment on batch client
  • 8. Replication CloudSolrServer (SolrJ) ZooKeeper /clusterstate.json Confidential and Proprietary © Copyright 2013 Shard 1 (leader) Shard 2 (leader) Shard 3 (leader) <doc> <doc> Watch /clusterstate.json <doc> Shard 1 (replica) Shard 2 (replica) Shard 3 (replica) Blocks for response from replica(s)
  • 9. Don’t swamp your servers! Confidential and Proprietary © Copyright 2013
  • 10. Lessons Learned • Know what throughput your client side is capable of generating – If in MapReduce, index from reducers with speculative execution disabled • Don’t change Solr config without good reasons for doing so • Overshard (but not too much) • Near-linear scalability as I added nodes! Confidential and Proprietary © Copyright 2013
  • 11. Query Performance Tests • All nodes in SolrCloud perform indexing and execute queries • Using the TermsComponent to build queries based on the terms in each field. • Harder to accurately simulate user queries over synthetic data – Need mix of faceting, paging, sorting, grouping, boolean clauses, range queries, boosting, filters (some cached, some not), etc ... • Does the randomness in your test queries model (expected) user behavior? • Start with one server (1 shard) to determine baseline query performance. – Look for inefficiencies in your schema and other config settings Confidential and Proprietary © Copyright 2013
  • 12. Solr Scale Toolkit • Fabric / Python based toolset for deploying and managing SolrCloud clusters • SolrJ-based client application useful for building tools that need access to cluster state information in ZooKeeper • Code to support benchmarks for Solr Confidential and Proprietary © Copyright 2013
  • 13. Python-based Tools boto – Python API for AWS (EC2, S3, etc) Fabric – Python-based tool for automating system admin tasks over SSH pysolr – Python library for Solr (sending commits, queries, ...) kazoo – Python client tools for ZooKeeper Supporting Cast: JMeter – run tests, generate reports collectd – system monitoring Logstash4Solr – log aggregation JConsole/VisualVM – monitor JVM during indexing / queries Confidential and Proprietary © Copyright 2013
  • 14. Solr Scale Toolkit: Demo • Launch a meta node – Log agg / basic monitoring using SiLK • Launch ZooKeeper Ensemble – 3 nodes to establish quorum – Setup cron job to clean-up snapshots • Launch SolrCloud cluster • Create new collection and index some docs – Attach JConsole while indexing • Run a healthcheck on the collection • Checkout Banana Dashboard • Backup / Restore – Requires patch for SOLR-5956 – Use fab patch_jars to update jars and do a rolling restart Confidential and Proprietary © Copyright 2013
  • 15. Provisioning machines fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge • Custom built AMI? • Block device mapping – dedicated disk per Solr node • Launch and then poll status until they are live – verify SSH connectivity • Tag each instance with a cluster ID and username Confidential and Proprietary © Copyright 2013
  • 16. ZooKeeper fab new_zk_ensemble:zk1,n=3 • Two options: – provision 1 to N nodes when you launch Solr cluster – use existing named ensemble • Fabric command simply creates the myid files and zoo.cfg file for the ensemble – and some cron scripts for managing snapshots • Basic health checking of ZooKeeper status: – echo srvr | nc localhost 2181 Confidential and Proprietary © Copyright 2013
  • 17. SolrCloud fab new_solrcloud:test1,zk=zk1,nodesPerHost=2 • Upload a BASH script that starts/stops Solr • Set system props: jetty.port, host, zkHost, JVM opts • One or more Solr nodes per machine • JVM mem opts dependent on instance type and # of Solr nodes per instance • Optionally configure log4j.properties to append messages to Rabbitmq for Logstash4Solr integration Confidential and Proprietary © Copyright 2013
  • 18. solr-ctl.sh • BASH script that implements: – start/stop Solr nodes on each EC2 instance – sets JVM memory options, system properties (jetty.port), enable remote JMX, etc – backup log files before restarting nodes – ensure JVM is killed correctly before restarting • Environment variables in: solr-ctl-env.sh Confidential and Proprietary © Copyright 2013
  • 19. Miscellaneous Utility Tasks • Deploy a configuration directory to ZooKeeper • Create a new collection • Attach a local JConsole/VisualVM to a remote JVM • Rolling restart (with Overseer awareness) • Build Solr locally and patch remote – Use a relay server to scp the JARs to Amazon network once and then scp them to other nodes from within the network • Put/get files • Grep over all log files (across the cluster) Confidential and Proprietary © Copyright 2013
  • 20. Other useful stuff ... • fab mine: See clusters I’m running (or for other users too) • fab kill_mine: Terminate all instances I’m running – Use termination protection in production • fab ssh_to: Quick way to SSH to one of the nodes in a cluster • fab stop/recover/kill: Basic commands for controlling specific Solr nodes in the cluster • fab jmeter: Execute a JMeter test plan against your cluster – Example test plan and Java sampler is included with the source Confidential and Proprietary © Copyright 2013
  • 21. SolrCloud Tools (SolrJ client app) ./tools.sh –tool healthcheck • Java-based command-line application that uses SolrJ’s CloudSolrServer to perform advanced cluster management operations: – healthcheck: collect metadata and health information from all replicas for a collection from ZooKeeper – backup: create a snapshot of each shard in a collection for backing up to remote storage (S3) • Framework for building complex tools that benefit from having access to cluster state information in ZooKeeper Confidential and Proprietary © Copyright 2013
  • 22. SiLK Integration • SiLK: Solr integrated with Logstash and Kibana – Index time-series data, such as log data (collectd, Solr logs, ...) – Build cool dashboards with Banana (fork of Kibana) • Easily aggregate all WARN and more severe log messages from all Solr servers into logstash4solr • Send collectd metrics to logstash4solr Confidential and Proprietary © Copyright 2013
  • 23. SiLK Integration Confidential and Proprietary © Copyright 2013
  • 24. What’s Next? • Migrate to using Apache libcloud instead of using boto directly • Benchmark mixed work-loads (queries and indexing) • SiLK is improving rapidly! • Chaos monkey tests – integrate jepsen? • Open source so please kick the tires! Confidential and Proprietary © Copyright 2013
  • 25. Wrap-up • Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk • LucidWorks: http://www.lucidworks.com • SiLK: http://www.lucidworks.com/lucidworks-silk/ • Solr In Action: http://www.manning.com/grainger/ • Connect: @thelabdude / tim.potter@lucidworks.com Questions? Confidential and Proprietary © Copyright 2013

Editor's Notes

  1. Yes, it does, but we need to start somewhere
  2. more shards == better indexing throughput (if your servers can handle it)
  3. replication is as fast as your slowest replica replicas have to re-analyze documents future – would like to send pre-analyzed docs between leader and replica, esp if text analysis is complex
  4. Not much capacity available for running queries on this node
  5. First couple of passes, I either had really bad performance (too random) or really fast (not random enough)
  6. Make it very easy to launch and manage SolrCloud clusters in Amazon of sizes 1 node to 100’s User has basic control over instance type, # of instances, # of nodes per instance, ZooKeeper ensemble Doesn’t have to care about how to start each Solr, how to connect it with ZooKeeper, host names / IPs, etc.
  7. Custom AMI that we just need to “start” stuff on is preferred to configuring a barebones instance each time