SlideShare a Scribd company logo
1 of 30
Download to read offline
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Solr at Scale for Time-Oriented Data
Brett Hoerner
@bretthoerner
Senior Platform Engineer, Rocana
3
• Local to Austin, TX
• Have used Solr(Cloud) since 4.0 (2012)
• Not a contributor, just a user
• Work for startups, typically focused on
scalability & performance
• Generally (have to) handle operations in
addition to development
01
4
• "Tuning Solr for Logs"



Radu Gheorghe's talk at

Lucene/Solr Revolution 2014



bit.ly/tuning-solr-for-logs
02
Quick plug
5
• SaaS social media marketing research tool
• Access to full firehose for multiple networks
• Example SolrCloud collection:

~150+ billion documents spanning 1 year

~10k writes/second

~45-65 fields per document

~800 shards

On 13 machines in EC2

Engineering+Operations team of 1-2
02
Spredfast
6
02
7
02
8
• (Ro)ot (Ca)use A(na)lysis for complex IT
operations (large datacenters)
• On-premises enterprise software (not SaaS)
• Monitors 10s or 100s of thousands of machines
• Customers care about 1TB/day on the low end
• Hadoop ecosystem
02
Rocana
9
02
10
• Each social post or log line becomes a Solr doc
• Almost always sort on time field (not TF-IDF)
• Queries almost always include facets
• Queries always include a time range

"last 30 minutes"

"last 30 days"

"December 2014"
02
Time-Oriented Realtime Search
11
• Typically a part of larger stream processing
system
• Kafka, or something like it, is recommended
02
Time-Oriented Realtime Search
Firehose
Firehose
Firehose
Kafka
Sold
Indexer
Sold
Indexer
Solr
Indexer
KafkaKafka
SolrSolrSolrSolrSolrSolr
S3 Writer S3
12
• Adjust...



JVM heap (up to ~30GB)



ramBufferSizeMB (up to ~512MB)



solr.autoCommit.maxTime (multiple minutes)

(and autoCommit openSearcher = false)

solr.autoSoftCommit.maxTime (as high as possible)



mergeFactor
• Batch writes! (by count and time)
02
Optimizing indexing
13
• DocValues on any field you sort/facet on
• Warm on most common sort (time)
• Small filterCache, only use for time range



fq=ts:[1444755392 TO 1444841789]

q=text:happy+birthday



OR at least cache separately



fq=ts:[1444755392 TO 1444841789]

fq=text:happy+birthday

q=*:*
02
Optimizing queries
14
• By default, Solr hashes the unique field* of
each document to decide which shard it
belongs on.



* uniqueKey in schema.xml
• The effect is that documents are evenly spread
across *all* shards
02
Sharding by time
15
• This means every shard is actively writing and
merging new segments all the time
• Your docs/sec per node is docs/nodes, which is
spreading writes pretty thin if you're thinking
of using, say, 500 shards
02
Sharding by time
16
• Even worse, on the read side this means
*every* query must be sent to *every* shard



(unless you're looking for a document by its unique field, which is a pretty poor use
case for Solr...)
• Given 1 query and 500 shards:



q=text:happy+timestamp:[37 TO 286]&sort=timestamp desc&rows=100



sends 500 requests out

searches/sorts your *entire* data set

waits for 500 responses

merges them

and finally responds
02
Sharding by time
17
• The solution is to take full control of document
routing



/admin/collections?

action=CREATE

&name=my_collection

&router.name=implicit

&shards=1444780800,1444867200,1444953600,...
02
Sharding by time
18
02
Sharding by time
1444780800 1444867200 1444953600 ...
my_collection
Kafka Solr WriterSolr WriterSolr Writer
{
id: "event100",
body: "hello, world",
created_at: 1444965428,
_route_: 1444953600
}
19
02
Sharding by time
1444780800 1444867200 1444953600 ...
my_collection
/solr/my_collection/select?

q=text:hello

&fq=created_at:[1444874953 TO 1444989225]

&shards=1444867200,1444953600
20
• Duplicate cluster that only holds more recent
data
• ... but with more hardware per document
03
Cluster "layering"
12 months of data
30 days of
data
Query for "last hour"Query for "last June"
21
• bit.ly/created-at-hack
• If we can make assumptions about what's in
each shard, we can optimize the "sub" queries
that are sent to each node
• Also optionally disable facet refinement
01
Hacks
22
• Solr on HDFS is one interesting option

Can recover existing distributed indexes on another node (using the *same*
directory!), see "autoAddReplicas" in Collection API CREATE.
• "Normal" replication was historically an issue
(for us) at scale
• Apparently made 100% faster in Solr 5.2
• Remember that replicas aren't backups
01
Replication
23
• So, you have your >100 billion document
cluster running...
• Indexes are slowly created over the course of
months/years by ingesting realtime data...
01
24
• But what if...



We need to add new fields (to old docs)

We need to remove unused fields

We need to change fields (type, content)

We decide we need to query further in the past

We have catastrophic data loss

We want to upgrade Solr (with no risk)
01
25
• Let's say:



We index 5k/docs sec for a year

That means 157,680,000,000 documents



Say the cluster can ingest 50k/sec max

It'd take 36.5 days to reindex a year

... for any/every change

... if nothing went wrong for 36.5 days

... and you need to write the code to do it
01
Timebomb
26
• Hadoop to the rescue (?)
• Under Solr contrib

github.com/apache/lucene-solr/tree/trunk/solr/contrib/map-reduce
• Given raw input data*, run a MapReduce job
that generates Solr indexes (locally!)



* this is one good reason to use something like Kafka and push all your raw data to
HDFS/S3/etc in addition to Solr
01
MapReduceIndexerTool
27
• Amazon ElasticMapReduce works well for this

Plus, you can use spot instances (cheap!)
• The trick is, you have to load the completed
indexes yourself

At that point it becomes an Ops problem, some kind of orchestration like Chef
comes in handy here, but it's not done for you or open-source (yet?)
• Unless you run Solr on HDFS (GoLive)

01
MapReduceIndexerTool
28
• ~150 billion document collection spanning 1
year reindexed from scratch and running on a
new cluster in ~6 days for ~$3k



Bug/bribe Adam McElwee to open source :) twitter.com/txlord
01
MapReduceIndexerTool
29
• Optimize like you would any Solr cluster
• Reduce caching, RAM is probably scarce and
hits are probably low
• Shard based on time
• Be prepared to rebuild the entire collection so
you can iterate on product/design
01
Conclusion
30
• Thanks!



brett@bretthoerner.com



twitter.com/bretthoerner



rocana.com/careers
01
Fin

More Related Content

What's hot

Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrDatabricks
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangDatabricks
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Andra Lungu
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and DatasetKazuaki Ishizaki
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkDatabricks
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengBuilding Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengDatabricks
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Databricks
 
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Databricks
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbJen Aman
 

What's hot (20)

Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Spark etl
Spark etlSpark etl
Spark etl
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengBuilding Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 

Viewers also liked

C1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategyC1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategyDr. Wilfred Lin (Ph.D.)
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introductionGregg Barrett
 
Bim based process mining master thesis presentation
Bim based process mining master thesis presentation Bim based process mining master thesis presentation
Bim based process mining master thesis presentation Stijn van Schaijk
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Grade 3 text structure assessment teaching guide
Grade 3 text structure assessment teaching guideGrade 3 text structure assessment teaching guide
Grade 3 text structure assessment teaching guideEmily Kissner
 
Experimental Photography Artist Research
Experimental Photography Artist ResearchExperimental Photography Artist Research
Experimental Photography Artist ResearchJaskirt Boora
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...Splunk
 
Understanding Camouflage
Understanding CamouflageUnderstanding Camouflage
Understanding CamouflageEmily Kissner
 
Legrand Group Belgium - Brochure Sfera
Legrand Group Belgium - Brochure SferaLegrand Group Belgium - Brochure Sfera
Legrand Group Belgium - Brochure SferaArchitectura
 
Digital transformation - DevOps Day - 02/02/2017
Digital transformation - DevOps Day - 02/02/2017Digital transformation - DevOps Day - 02/02/2017
Digital transformation - DevOps Day - 02/02/2017Clara Feuillet
 
Collaboration with Eclipse final
Collaboration with Eclipse finalCollaboration with Eclipse final
Collaboration with Eclipse finalKenu, GwangNam Heo
 
Brown Bag Lunch sur Hazelcast
Brown Bag Lunch sur HazelcastBrown Bag Lunch sur Hazelcast
Brown Bag Lunch sur HazelcastSylvain Wallez
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudDr. Wilfred Lin (Ph.D.)
 
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VACleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VAClearedJobs.Net
 
Reference Architecture: EMC Hybrid Cloud with VMware
Reference Architecture: EMC Hybrid Cloud with VMwareReference Architecture: EMC Hybrid Cloud with VMware
Reference Architecture: EMC Hybrid Cloud with VMwareEMC
 
소셜 코딩 GitHub & branch & branch strategy
소셜 코딩 GitHub & branch & branch strategy소셜 코딩 GitHub & branch & branch strategy
소셜 코딩 GitHub & branch & branch strategyKenu, GwangNam Heo
 
Love Cloud: 28 June 2017
Love Cloud: 28 June 2017 Love Cloud: 28 June 2017
Love Cloud: 28 June 2017 Chloe Mustafa
 

Viewers also liked (20)

C1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategyC1 keynote creating_your_enterprise_cloud_strategy
C1 keynote creating_your_enterprise_cloud_strategy
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introduction
 
Bim based process mining master thesis presentation
Bim based process mining master thesis presentation Bim based process mining master thesis presentation
Bim based process mining master thesis presentation
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Grade 3 text structure assessment teaching guide
Grade 3 text structure assessment teaching guideGrade 3 text structure assessment teaching guide
Grade 3 text structure assessment teaching guide
 
Experimental Photography Artist Research
Experimental Photography Artist ResearchExperimental Photography Artist Research
Experimental Photography Artist Research
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
 
Understanding Camouflage
Understanding CamouflageUnderstanding Camouflage
Understanding Camouflage
 
Bennett raglinphotography
Bennett raglinphotographyBennett raglinphotography
Bennett raglinphotography
 
Legrand Group Belgium - Brochure Sfera
Legrand Group Belgium - Brochure SferaLegrand Group Belgium - Brochure Sfera
Legrand Group Belgium - Brochure Sfera
 
Unc plus delta
Unc plus deltaUnc plus delta
Unc plus delta
 
Digital transformation - DevOps Day - 02/02/2017
Digital transformation - DevOps Day - 02/02/2017Digital transformation - DevOps Day - 02/02/2017
Digital transformation - DevOps Day - 02/02/2017
 
Collaboration with Eclipse final
Collaboration with Eclipse finalCollaboration with Eclipse final
Collaboration with Eclipse final
 
Azure OMS
Azure OMSAzure OMS
Azure OMS
 
Brown Bag Lunch sur Hazelcast
Brown Bag Lunch sur HazelcastBrown Bag Lunch sur Hazelcast
Brown Bag Lunch sur Hazelcast
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
 
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VACleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
Cleared Job Fair Job Seeker Handbook June 15, 2017, Dulles, VA
 
Reference Architecture: EMC Hybrid Cloud with VMware
Reference Architecture: EMC Hybrid Cloud with VMwareReference Architecture: EMC Hybrid Cloud with VMware
Reference Architecture: EMC Hybrid Cloud with VMware
 
소셜 코딩 GitHub & branch & branch strategy
소셜 코딩 GitHub & branch & branch strategy소셜 코딩 GitHub & branch & branch strategy
소셜 코딩 GitHub & branch & branch strategy
 
Love Cloud: 28 June 2017
Love Cloud: 28 June 2017 Love Cloud: 28 June 2017
Love Cloud: 28 June 2017
 

Similar to Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana

Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksLucidworks
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction abenyeung1
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solrthelabdude
 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...DataStax Academy
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionLucidworks
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDeltares
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 

Similar to Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana (20)

Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De Boer
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Solr at Scale for Time-Oriented Data Brett Hoerner @bretthoerner Senior Platform Engineer, Rocana
  • 3. 3 • Local to Austin, TX • Have used Solr(Cloud) since 4.0 (2012) • Not a contributor, just a user • Work for startups, typically focused on scalability & performance • Generally (have to) handle operations in addition to development 01
  • 4. 4 • "Tuning Solr for Logs"
 
 Radu Gheorghe's talk at
 Lucene/Solr Revolution 2014
 
 bit.ly/tuning-solr-for-logs 02 Quick plug
  • 5. 5 • SaaS social media marketing research tool • Access to full firehose for multiple networks • Example SolrCloud collection:
 ~150+ billion documents spanning 1 year
 ~10k writes/second
 ~45-65 fields per document
 ~800 shards
 On 13 machines in EC2
 Engineering+Operations team of 1-2 02 Spredfast
  • 8. 8 • (Ro)ot (Ca)use A(na)lysis for complex IT operations (large datacenters) • On-premises enterprise software (not SaaS) • Monitors 10s or 100s of thousands of machines • Customers care about 1TB/day on the low end • Hadoop ecosystem 02 Rocana
  • 10. 10 • Each social post or log line becomes a Solr doc • Almost always sort on time field (not TF-IDF) • Queries almost always include facets • Queries always include a time range
 "last 30 minutes"
 "last 30 days"
 "December 2014" 02 Time-Oriented Realtime Search
  • 11. 11 • Typically a part of larger stream processing system • Kafka, or something like it, is recommended 02 Time-Oriented Realtime Search Firehose Firehose Firehose Kafka Sold Indexer Sold Indexer Solr Indexer KafkaKafka SolrSolrSolrSolrSolrSolr S3 Writer S3
  • 12. 12 • Adjust...
 
 JVM heap (up to ~30GB)
 
 ramBufferSizeMB (up to ~512MB)
 
 solr.autoCommit.maxTime (multiple minutes)
 (and autoCommit openSearcher = false)
 solr.autoSoftCommit.maxTime (as high as possible)
 
 mergeFactor • Batch writes! (by count and time) 02 Optimizing indexing
  • 13. 13 • DocValues on any field you sort/facet on • Warm on most common sort (time) • Small filterCache, only use for time range
 
 fq=ts:[1444755392 TO 1444841789]
 q=text:happy+birthday
 
 OR at least cache separately
 
 fq=ts:[1444755392 TO 1444841789]
 fq=text:happy+birthday
 q=*:* 02 Optimizing queries
  • 14. 14 • By default, Solr hashes the unique field* of each document to decide which shard it belongs on.
 
 * uniqueKey in schema.xml • The effect is that documents are evenly spread across *all* shards 02 Sharding by time
  • 15. 15 • This means every shard is actively writing and merging new segments all the time • Your docs/sec per node is docs/nodes, which is spreading writes pretty thin if you're thinking of using, say, 500 shards 02 Sharding by time
  • 16. 16 • Even worse, on the read side this means *every* query must be sent to *every* shard
 
 (unless you're looking for a document by its unique field, which is a pretty poor use case for Solr...) • Given 1 query and 500 shards:
 
 q=text:happy+timestamp:[37 TO 286]&sort=timestamp desc&rows=100
 
 sends 500 requests out
 searches/sorts your *entire* data set
 waits for 500 responses
 merges them
 and finally responds 02 Sharding by time
  • 17. 17 • The solution is to take full control of document routing
 
 /admin/collections?
 action=CREATE
 &name=my_collection
 &router.name=implicit
 &shards=1444780800,1444867200,1444953600,... 02 Sharding by time
  • 18. 18 02 Sharding by time 1444780800 1444867200 1444953600 ... my_collection Kafka Solr WriterSolr WriterSolr Writer { id: "event100", body: "hello, world", created_at: 1444965428, _route_: 1444953600 }
  • 19. 19 02 Sharding by time 1444780800 1444867200 1444953600 ... my_collection /solr/my_collection/select?
 q=text:hello
 &fq=created_at:[1444874953 TO 1444989225]
 &shards=1444867200,1444953600
  • 20. 20 • Duplicate cluster that only holds more recent data • ... but with more hardware per document 03 Cluster "layering" 12 months of data 30 days of data Query for "last hour"Query for "last June"
  • 21. 21 • bit.ly/created-at-hack • If we can make assumptions about what's in each shard, we can optimize the "sub" queries that are sent to each node • Also optionally disable facet refinement 01 Hacks
  • 22. 22 • Solr on HDFS is one interesting option
 Can recover existing distributed indexes on another node (using the *same* directory!), see "autoAddReplicas" in Collection API CREATE. • "Normal" replication was historically an issue (for us) at scale • Apparently made 100% faster in Solr 5.2 • Remember that replicas aren't backups 01 Replication
  • 23. 23 • So, you have your >100 billion document cluster running... • Indexes are slowly created over the course of months/years by ingesting realtime data... 01
  • 24. 24 • But what if...
 
 We need to add new fields (to old docs)
 We need to remove unused fields
 We need to change fields (type, content)
 We decide we need to query further in the past
 We have catastrophic data loss
 We want to upgrade Solr (with no risk) 01
  • 25. 25 • Let's say:
 
 We index 5k/docs sec for a year
 That means 157,680,000,000 documents
 
 Say the cluster can ingest 50k/sec max
 It'd take 36.5 days to reindex a year
 ... for any/every change
 ... if nothing went wrong for 36.5 days
 ... and you need to write the code to do it 01 Timebomb
  • 26. 26 • Hadoop to the rescue (?) • Under Solr contrib
 github.com/apache/lucene-solr/tree/trunk/solr/contrib/map-reduce • Given raw input data*, run a MapReduce job that generates Solr indexes (locally!)
 
 * this is one good reason to use something like Kafka and push all your raw data to HDFS/S3/etc in addition to Solr 01 MapReduceIndexerTool
  • 27. 27 • Amazon ElasticMapReduce works well for this
 Plus, you can use spot instances (cheap!) • The trick is, you have to load the completed indexes yourself
 At that point it becomes an Ops problem, some kind of orchestration like Chef comes in handy here, but it's not done for you or open-source (yet?) • Unless you run Solr on HDFS (GoLive)
 01 MapReduceIndexerTool
  • 28. 28 • ~150 billion document collection spanning 1 year reindexed from scratch and running on a new cluster in ~6 days for ~$3k
 
 Bug/bribe Adam McElwee to open source :) twitter.com/txlord 01 MapReduceIndexerTool
  • 29. 29 • Optimize like you would any Solr cluster • Reduce caching, RAM is probably scarce and hits are probably low • Shard based on time • Be prepared to rebuild the entire collection so you can iterate on product/design 01 Conclusion