SlideShare a Scribd company logo
1 of 71
Download to read offline
Solr Anti - patterns
Rafał Kuć, Sematext Group, Inc.
About me
Sematext consultant & engineer co-founder
Father & husband
The (not so) perfect migration
From 3.1 to 4.10 (and hopefully not back)
March 2011 September 2014
The lonely solrconfig.xml
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" />
<requestHandler name="/update/csv" class="solr.CSVRequestHandler" />
<requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" />
<directoryFactory name="DirectoryFactory"
And faulty indexing
And faulty indexing
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">0</int>
<lst name="error">
<str name="msg">missing content stream</str>
<int name="code">400</int>
109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
at org.apache.solr.core.SolrCore.execute(
at org.apache.solr.servlet.SolrDispatchFilter.execute(
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(
at org.eclipse.jetty.servlet.ServletHandler.doHandle(
at org.eclipse.jetty.server.handler.ScopedHandler.handle(
at org.eclipse.jetty.server.session.SessionHandler.doHandle(
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(
at org.eclipse.jetty.servlet.ServletHandler.doScope(
at org.eclipse.jetty.server.session.SessionHandler.doScope(
at org.eclipse.jetty.server.handler.ContextHandler.doScope(
at org.eclipse.jetty.server.handler.ScopedHandler.handle(
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
at org.eclipse.jetty.server.handler.HandlerCollection.handle(
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
at org.eclipse.jetty.server.Server.handle(
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(
at org.eclipse.jetty.http.HttpParser.parseNext(
at org.eclipse.jetty.http.HttpParser.parseAvailable(
at org.eclipse.jetty.server.BlockingHttpConnection.handle(
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
at org.eclipse.jetty.util.thread.QueuedThreadPool$
at Source)
Let’s make that right
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/update/json" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="stream.contentType">application/json</str>
<directoryFactory name="DirectoryFactory"
<str name="dir">
The old schema.xml
<fieldType name="int" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
The old schema.xml
The new schema.xml
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
Threads? What threads?
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
I see deadlocks
Threads? What threads?
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
OK, so now we can actually run queries
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">10000</Set>
<Set name="detailedDump">false</Set>
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper – production
The ZooKeeper – production
The ZooKeeper – production
The ZooKeeper – production
The ZooKeeper – production
Let’s cache everything
<filterCache class="solr.LRUCache"
<queryResultCache class="solr.LRUCache"
autowarmCount="524288"/><documentCache class="solr.LRUCache"
And now let’s look at the warmup times
And now let’s look at the warmup times
OK, show us the way „Mr. Consultant”
<filterCache class="solr.FastLRUCache"
<queryResultCache class="solr.LRUCache"
autowarmCount="8000"/><documentCache class="solr.LRUCache"
Let’s look at the warmup times again
Let’s look at the warmup times again
Bulks are for noobs
Application Application Application
Doc Doc Doc
Bulks are for noobs
Application Application Application
Doc Doc Doc
But let’s use bulks, just in case
But let’s use bulks, just in case
We need to refresh and hard commit
Maybe we should only refresh?
OK, let’s go easy with refreshing
But I really need all that data
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9418</int>
<lst name="params">
<str name="start">3000000</str>
<str name="q">*:*</str>
<str name="rows">100</str>
<result name="response" numFound="3284000" start="3000000">
But I really need all that data
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9418</int>
<lst name="params">
<str name="start">3000000</str>
<str name="q">*:*</str>
<str name="rows">5</str>
<result name="response" numFound="3284000" start="3000000">
But I really need all that data
<?xml version="1.0" encoding="UTF-8"?>
<lst name="error">
<str name="msg">java.lang.OutOfMemoryError: Java heap space</str>
<str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.servlet.SolrDispatchFilter.sendError(
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
Caused by: java.lang.OutOfMemoryError: Java heap space
<int name="code">500</int>
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
But I really need all that data
But I really need all that data
But I really need all that data
But I really need all that data
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">189</int>
<lst name="params">
<str name="sort">score desc,id desc</str>
<str name="q">*:*</str>
<str name="cursorMark">*</str>
<result name="response" numFound="3284000" start="0">
<str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str>
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">184</int>
<lst name="params">
<str name="sort">score desc,id desc</str>
<str name="q">*:*</str>
<str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str>
<result name="response" numFound="3284000" start="0">
<str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str>
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9967</int>
<lst name="params">
<result name="response" numFound="3284000" start="0">
<lst name="facet_counts">
<lst name="facet_fields">
<lst name="tag">
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
<?xml version="1.0" encoding="UTF-8"?>
<lst name="error">
<str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str>
<str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields:
java.lang.OutOfMemoryError: Java heap space
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(
<int name="code">500</int>
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Magic happens with small changes
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Monitoring in production
And remember…
Quick summary
We are hiring!
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig Logging?
Dig working with and in open – source?
We’re hiring world – wide!
Thank you!
Rafał Kuć

More Related Content

What's hot

Mastering solr
Mastering solrMastering solr
Mastering solrjurcello
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solrlucenerevolution
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
New SPL Features in PHP 5.3
New SPL Features in PHP 5.3New SPL Features in PHP 5.3
New SPL Features in PHP 5.3Matthew Turland
Let's write secure Drupal code! DUG Belgium - 08/08/2019
Let's write secure Drupal code! DUG Belgium - 08/08/2019Let's write secure Drupal code! DUG Belgium - 08/08/2019
Let's write secure Drupal code! DUG Belgium - 08/08/2019Balázs Tatár
SPL: The Undiscovered Library - DataStructures
SPL: The Undiscovered Library -  DataStructuresSPL: The Undiscovered Library -  DataStructures
SPL: The Undiscovered Library - DataStructuresMark Baker
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
The Origin of Lithium
The Origin of LithiumThe Origin of Lithium
The Origin of LithiumNate Abele
Solr and Lucene at Etsy - By Gregg Donovan
Solr and Lucene at Etsy - By Gregg DonovanSolr and Lucene at Etsy - By Gregg Donovan
Solr and Lucene at Etsy - By Gregg Donovanlucenerevolution
The State of Lithium
The State of LithiumThe State of Lithium
The State of LithiumNate Abele
Class-based views with Django
Class-based views with DjangoClass-based views with Django
Class-based views with DjangoSimon Willison
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_osdstuartnz

What's hot (20)

Mastering solr
Mastering solrMastering solr
Mastering solr
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solr
it's just search
it's just searchit's just search
it's just search
Solr workshop
Solr workshopSolr workshop
Solr workshop
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
New SPL Features in PHP 5.3
New SPL Features in PHP 5.3New SPL Features in PHP 5.3
New SPL Features in PHP 5.3
Let's write secure Drupal code! DUG Belgium - 08/08/2019
Let's write secure Drupal code! DUG Belgium - 08/08/2019Let's write secure Drupal code! DUG Belgium - 08/08/2019
Let's write secure Drupal code! DUG Belgium - 08/08/2019
JSON in Solr: from top to bottom
JSON in Solr: from top to bottomJSON in Solr: from top to bottom
JSON in Solr: from top to bottom
Intro to The PHP SPL
Intro to The PHP SPLIntro to The PHP SPL
Intro to The PHP SPL
SPL: The Undiscovered Library - DataStructures
SPL: The Undiscovered Library -  DataStructuresSPL: The Undiscovered Library -  DataStructures
SPL: The Undiscovered Library - DataStructures
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
My shell
My shellMy shell
My shell
The Origin of Lithium
The Origin of LithiumThe Origin of Lithium
The Origin of Lithium
Solr & Lucene at Etsy
Solr & Lucene at EtsySolr & Lucene at Etsy
Solr & Lucene at Etsy
Solr and Lucene at Etsy - By Gregg Donovan
Solr and Lucene at Etsy - By Gregg DonovanSolr and Lucene at Etsy - By Gregg Donovan
Solr and Lucene at Etsy - By Gregg Donovan
The State of Lithium
The State of LithiumThe State of Lithium
The State of Lithium
Class-based views with Django
Class-based views with DjangoClass-based views with Django
Class-based views with Django
Living with garbage
Living with garbageLiving with garbage
Living with garbage
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_os

Viewers also liked

Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsSematext Group, Inc.
Going All-In With Go For CLI Apps
Going All-In With Go For CLI AppsGoing All-In With Go For CLI Apps
Going All-In With Go For CLI AppsTom Elliott
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseJesse Yates
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingOlga Lavrentieva
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache SolrAnshum Gupta
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBaseSematext Group, Inc.
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application ArchetypesCloudera, Inc.
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...Sematext Group, Inc.
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Sematext Group, Inc.
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchDataWorks Summit/Hadoop Summit
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchSematext Group, Inc.
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrVadim Kirilchuk
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.

Viewers also liked (20)

Tuning Solr for Logs
Tuning Solr for LogsTuning Solr for Logs
Tuning Solr for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
Going All-In With Go For CLI Apps
Going All-In With Go For CLI AppsGoing All-In With Go For CLI Apps
Going All-In With Go For CLI Apps
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Docker Logging Webinar
Docker Logging  WebinarDocker Logging  Webinar
Docker Logging Webinar
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
MongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: BenchmarkingMongoDB and Apache HBase: Benchmarking
MongoDB and Apache HBase: Benchmarking
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache Solr
Search Analytics with Flume and HBase
Search Analytics with Flume and HBaseSearch Analytics with Flume and HBase
Search Analytics with Flume and HBase
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
Apache HBase Application Archetypes
Apache HBase Application ArchetypesApache HBase Application Archetypes
Apache HBase Application Archetypes
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Improvements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba SearchImprovements to Flink & it's Applications in Alibaba Search
Improvements to Flink & it's Applications in Alibaba Search
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL

Similar to Solr Anti Patterns

[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자Donghyeok Kang
A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)BTI360
Apache Solr Search Mastery
Apache Solr Search MasteryApache Solr Search Mastery
Apache Solr Search MasteryAcquia
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UNLucidworks
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
Cassandra summit
Cassandra summitCassandra summit
Cassandra summitmattstump
เกี่ยวกับ Apache solr 4.0
เกี่ยวกับ Apache solr 4.0เกี่ยวกับ Apache solr 4.0
เกี่ยวกับ Apache solr 4.0Somkiat Puisungnoen
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
XamarinとAWSをつないでみた話Takehito Tanabe
Spring Web Service, Spring Integration and Spring Batch
Spring Web Service, Spring Integration and Spring BatchSpring Web Service, Spring Integration and Spring Batch
Spring Web Service, Spring Integration and Spring BatchEberhard Wolff

Similar to Solr Anti Patterns (20)

[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)
Apache Solr Search Mastery
Apache Solr Search MasteryApache Solr Search Mastery
Apache Solr Search Mastery
Solr02 fields
Solr02 fieldsSolr02 fields
Solr02 fields
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
XML Schemas
XML SchemasXML Schemas
XML Schemas
Cassandra summit
Cassandra summitCassandra summit
Cassandra summit
เกี่ยวกับ Apache solr 4.0
เกี่ยวกับ Apache solr 4.0เกี่ยวกับ Apache solr 4.0
เกี่ยวกับ Apache solr 4.0
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Spring Web Service, Spring Integration and Spring Batch
Spring Web Service, Spring Integration and Spring BatchSpring Web Service, Spring Integration and Spring Batch
Spring Web Service, Spring Integration and Spring Batch
Eu odeio OpenSocial
Eu odeio OpenSocialEu odeio OpenSocial
Eu odeio OpenSocial
Create WSDL
Create WSDLCreate WSDL
Create WSDL

More from Sematext Group, Inc.

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedSematext Group, Inc.
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsSematext Group, Inc.
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?Sematext Group, Inc.
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization Sematext Group, Inc.
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySematext Group, Inc.
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaSematext Group, Inc.
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveSematext Group, Inc.
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleSematext Group, Inc.
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSematext Group, Inc.
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersSematext Group, Inc.

More from Sematext Group, Inc. (19)

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
(Elastic)search in big data
(Elastic)search in big data(Elastic)search in big data
(Elastic)search in big data
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
Open Source Search Evolution
Open Source Search EvolutionOpen Source Search Evolution
Open Source Search Evolution
Elasticsearch and Solr for Logs
Elasticsearch and Solr for LogsElasticsearch and Solr for Logs
Elasticsearch and Solr for Logs
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan

Solr Anti Patterns

  • 1.
  • 2. Solr Anti - patterns Rafał Kuć, Sematext Group, Inc. @kucrafal @sematext
  • 3. About me Sematext consultant & engineer co-founder Father & husband
  • 4. The (not so) perfect migration
  • 5. From 3.1 to 4.10 (and hopefully not back) March 2011 September 2014
  • 6. The lonely solrconfig.xml <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" /> <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" /> <requestHandler name="/update/csv" class="solr.CSVRequestHandler" /> <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" /> <luceneMatchVersion>LUCENE_31</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
  • 8. And faulty indexing <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">0</int> </lst> <lst name="error"> <str name="msg">missing content stream</str> <int name="code">400</int> </lst> </response> 109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody( at org.apache.solr.handler.RequestHandlerBase.handleRequest( at org.apache.solr.core.SolrCore.execute( at org.apache.solr.servlet.SolrDispatchFilter.execute( at org.apache.solr.servlet.SolrDispatchFilter.doFilter( at org.apache.solr.servlet.SolrDispatchFilter.doFilter( at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter( at org.eclipse.jetty.servlet.ServletHandler.doHandle( at org.eclipse.jetty.server.handler.ScopedHandler.handle( at at org.eclipse.jetty.server.session.SessionHandler.doHandle( at org.eclipse.jetty.server.handler.ContextHandler.doHandle( at org.eclipse.jetty.servlet.ServletHandler.doScope( at org.eclipse.jetty.server.session.SessionHandler.doScope( at org.eclipse.jetty.server.handler.ContextHandler.doScope( at org.eclipse.jetty.server.handler.ScopedHandler.handle( at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle( at org.eclipse.jetty.server.handler.HandlerCollection.handle( at org.eclipse.jetty.server.handler.HandlerWrapper.handle( at org.eclipse.jetty.server.Server.handle( at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest( at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest( at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete( at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete( at org.eclipse.jetty.http.HttpParser.parseNext( at org.eclipse.jetty.http.HttpParser.parseAvailable( at org.eclipse.jetty.server.BlockingHttpConnection.handle( at$ at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( at org.eclipse.jetty.util.thread.QueuedThreadPool$ at Source)
  • 9. Let’s make that right <requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/update/json" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="stream.contentType">application/json</str> </lst> </requestHandler> <luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> <nrtMode>true</nrtMode> <updateLog> <str name="dir"> ${solr.ulog.dir:} </str> </updateLog>
  • 10. The old schema.xml <fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
  • 11. <fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/> The old schema.xml
  • 12. The new schema.xml <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
  • 13. Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 15. Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 16. OK, so now we can actually run queries <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">10000</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 22. The ZooKeeper – production
  • 23. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 24. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 25. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 26. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 27. Let’s cache everything <filterCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/> <queryResultCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/><documentCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="0"/>
  • 28. And now let’s look at the warmup times
  • 29. And now let’s look at the warmup times
  • 30. OK, show us the way „Mr. Consultant” <filterCache class="solr.FastLRUCache" size="1024" initialSize="1024" autowarmCount="512"/> <queryResultCache class="solr.LRUCache" size="16000" initialSize="16000" autowarmCount="8000"/><documentCache class="solr.LRUCache" size="16384" initialSize="16384" autowarmCount="0"/>
  • 31. Let’s look at the warmup times again
  • 32. Let’s look at the warmup times again
  • 33. Bulks are for noobs Application Application Application Doc Doc Doc
  • 34. Bulks are for noobs Application Application Application Doc Doc Doc
  • 35. But let’s use bulks, just in case
  • 36. But let’s use bulks, just in case
  • 37. We need to refresh and hard commit <autoCommit> <maxTime>1000</maxTime> <openSearcher>true</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  • 38. Maybe we should only refresh? <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  • 39. OK, let’s go easy with refreshing <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit>
  • 40. But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 41. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">100</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 42. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="error"> <str name="msg">java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError( at org.apache.solr.servlet.SolrDispatchFilter.doFilter( . . . Caused by: java.lang.OutOfMemoryError: Java heap space . . . </str> <int name="code">500</int> </lst> </response> curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 43. But I really need all that data Query
  • 44. But I really need all that data
  • 45. But I really need all that data
  • 46. But I really need all that data Response
  • 47. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
  • 48. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">189</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">*</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str> </response>
  • 49. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='
  • 50. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA=' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">184</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str> </response>
  • 51. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0'
  • 52. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9967</int> <lst name="params"> ... </lst> </lst> <result name="response" numFound="3284000" start="0"> . . . </result> <lst name="facet_counts"> <lst name="facet_fields"> <lst name="tag"> ... </lst> </lst> </lst> </response>
  • 53. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> . . . <lst name="error"> <str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space . . . Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.SimpleFacets.getFieldCacheCounts( . . . </str> <int name="code">500</int> </lst> </response>
  • 54. Now let’s look at performance
  • 55. Now let’s look at performance
  • 56. Now let’s look at performance
  • 57. Now let’s look at performance
  • 58. Now let’s look at performance
  • 59. Magic happens with small changes curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=100&facet.mincount=1'
  • 60. Magic happens with small changes
  • 61. Magic happens with small changes
  • 62. Magic happens with small changes
  • 63. Magic happens with small changes
  • 64. Magic happens with small changes
  • 65. Magic happens with small changes
  • 66. Magic happens with small changes
  • 70. We are hiring! Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open – source? We’re hiring world – wide!