SlideShare a Scribd company logo
1 of 22
Download to read offline
ElasticSearch in
     Production
                  lessons
                  learned




Anne Veling, ApacheCon EU, November 6, 2012
agenda
Introduction

ElasticSearch

Udini

Upcoming Tool

Lessons Learned
introduction
Anne Veling, @anneveling

Self-employed contractor
  Software Architect
  Agile process management
  Performance optimization
  Lucene/SOLR/ElasticSearch implementations & training
ElasticSearch
Apache Lucene

Started in 2010 by Shay Banon

Open Source – Apache License

A company was formed in 2012: ElasticSearch
  Training, support and development

Careful feature development
  vs. build because you can
ElasticSearch
Scalable
  Distributed, Node Discovery
  Automatic sharding
  Query distribution

RESTful, HTTP API
  With API wrappers for Ruby, Java, Scala, …
  JSON in, JSON out

Document Model
  Maps book.author.lastName
  “schemaless” -> field type recognition
  Keeps source, keeps „version‟ number
ElasticSearch
Integrated faceting
  With statistical aggregates (sum/avg/…) for free

Field types and analyzers
  String, numerics, geo, attachment, …
  Arrays, subdocuments, nested documents

Integrated sharding
  Routing and alias
  Cross-index searching / multi-document type
udini.proquest.com
ProQuest

The World‟s Article Store

Stack
  Amazon EC2
  Scala with Unfiltered
  MongoDB, ElasticSearch
architecture



 Fulfillment
                          Udini         Summon
 Providers




PDF pipeline      mongo       Elastic
                                        SOLR
                   DB         Search
SOLR at Udini
Connecting to Summon API
  700M SOLR Cluster

In Udini, we serve a subset of 160M full text articles
  Including fulfillment mechanisms
  PDF and HTML5 viewing and annotation
ElasticSearch at Udini
Local index to search your articles

Many small user libraries, searching only locally
  User-id as sharding key
  Include key in all queries
Exciting new product
Developing for ProQuest

Exciting new research tool for scientific researchers

Creating a large ElasticSearch index for journal article
canonicalization

Currently in private beta, launching in the coming months
Lessons Learned
   Very fast indexing

   Bulk indexing ftw
      Set up without replicas (replicas = 0, not 1)
      Play with bulk size
      Simple write to disk and CURL it in, is very fast
      1M records in 40s
for f in ${BATCH_DIR}/batch-*.json
do
  echo "about to index $f"
  curl --silent --show-error --request POST
         --data-binary @$f localhost:9200/_bulk > /dev/null
  echo
done
Lessons learned
Schema(less)?

Automatic field type recognition
   Can miss types
   Strict about types #duh

Mapping of subfields (doc.title vs doc.publication.title)
   Version dependent

In reality
   Schema still needed
   Mapping changes still non trivial
Lessons learned
Learn to trust ElasticSearch
  Analyzers: do not pretokenize queries yourself…

Difference between “term” and “text” type queries
  tokenized or not

ElasticSearch probably already does what you want it to
do
  Search for it
  Try it
Lessons learned
Issues with automated testing and node discovery/startup

Start/stop hundreds of times during Jenkins test jobs or
development boxes
  Takes time
  Locally sometimes picks up previous versions

Memory issues: ElasticSearch manages a large part of its
memory outside of the heap
  Do not simply increase -Xmx
Lessons learned
New tools every month

waitForYellowStatus

Aliases, routing allow for clever control
API
ElasticSearch is new, connection libraries still in infancy,
documentation growing

Issues using the Java API in Scala

Happy with Scalastic now
  synchronous
  asynchronous
  bulk prepare          https://github.com/bsadeh/scalastic
#nodb
ElasticSearch used as a full nosql datastore?

Using “version” and optimistic locking scheme

Could replace MongoDb in our setup



ElasticSearch is actually a store optimized for getting stuff
out, not for getting stuff in
  With free faceting
  Who needs multi-table transactions anyway?
SOLR vs ElasticSearch
SOLR                      ElasticSearch
  Well-known, many          New kid on the block
  tools, extensions         Very easy to configure
  Feels clunky to           Handles document to
  configure                 lucene mapping
  Manual document to        Horizontally scalable
  lucene mapping
                               Easy replication
  Replication and
                               But: shard key
  indexing in a cluster
  non-trivial

                          New school
Old school ;-)
search evolution
     • Custom indexers

     • Inverted index
     • Segment merges

     • Custom analyzers
     • Faceting

     • Configuration of analyzers
     • Faceting, Geospatial

     • Document mapping
     • Sub-document queries
     • Replication

     • JSON document input
     • Faceting, complex queries just work
conclusions
ElasticSearch benefits
  Easy to setup
  Very clever architecture

Drawbacks
  Very new software, tool support limited
     But lots of movement
  Change sharding in a full index non-trivial

ElasticSearch
  Clever architecture, fast, stable
  Does exactly what you need
thank you


Are you still using Solr?
Come on, it’s 2012 already ;-)




                      anne@beyondtrees.com

                                 @anneveling

More Related Content

What's hot

Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in actionCodemotion
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchBo Andersen
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with ElasticsearchSamantha Quiñones
 
Search domain basics
Search domain basicsSearch domain basics
Search domain basicspmanvi
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
 
Scala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologistScala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologistpmanvi
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchSigmoid
 

What's hot (20)

Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Elastic search
Elastic searchElastic search
Elastic search
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Search domain basics
Search domain basicsSearch domain basics
Search domain basics
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
 
Scala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologistScala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologist
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
 

Viewers also liked

Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for ElasticsearchFlorian Hopf
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearchsirensolutions
 
Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in ZalandoAlaa Elhadba
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsAlaa Elhadba
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 

Viewers also liked (6)

Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in Zalando
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & Aggregations
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 

Similar to ElasticSearch in Production: lessons learned

Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting StartedOnuralp Taner
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearchAnton Udovychenko
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 reportKoji Kawamura
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Seravia in the Cloud
Seravia in the CloudSeravia in the Cloud
Seravia in the Cloudkidrane
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonChetan Giridhar
 
Elasticsearch python
Elasticsearch pythonElasticsearch python
Elasticsearch pythonvaliantval2
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 

Similar to ElasticSearch in Production: lessons learned (20)

Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting Started
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearch
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Seravia in the Cloud
Seravia in the CloudSeravia in the Cloud
Seravia in the Cloud
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 
Elasticsearch python
Elasticsearch pythonElasticsearch python
Elasticsearch python
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Solr 8 interview
Solr 8 interview Solr 8 interview
Solr 8 interview
 
963
963963
963
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
 

Recently uploaded

Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Recently uploaded (20)

Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

ElasticSearch in Production: lessons learned

  • 1. ElasticSearch in Production lessons learned Anne Veling, ApacheCon EU, November 6, 2012
  • 3. introduction Anne Veling, @anneveling Self-employed contractor Software Architect Agile process management Performance optimization Lucene/SOLR/ElasticSearch implementations & training
  • 4. ElasticSearch Apache Lucene Started in 2010 by Shay Banon Open Source – Apache License A company was formed in 2012: ElasticSearch Training, support and development Careful feature development vs. build because you can
  • 5. ElasticSearch Scalable Distributed, Node Discovery Automatic sharding Query distribution RESTful, HTTP API With API wrappers for Ruby, Java, Scala, … JSON in, JSON out Document Model Maps book.author.lastName “schemaless” -> field type recognition Keeps source, keeps „version‟ number
  • 6. ElasticSearch Integrated faceting With statistical aggregates (sum/avg/…) for free Field types and analyzers String, numerics, geo, attachment, … Arrays, subdocuments, nested documents Integrated sharding Routing and alias Cross-index searching / multi-document type
  • 7. udini.proquest.com ProQuest The World‟s Article Store Stack Amazon EC2 Scala with Unfiltered MongoDB, ElasticSearch
  • 8. architecture Fulfillment Udini Summon Providers PDF pipeline mongo Elastic SOLR DB Search
  • 9. SOLR at Udini Connecting to Summon API 700M SOLR Cluster In Udini, we serve a subset of 160M full text articles Including fulfillment mechanisms PDF and HTML5 viewing and annotation
  • 10. ElasticSearch at Udini Local index to search your articles Many small user libraries, searching only locally User-id as sharding key Include key in all queries
  • 11. Exciting new product Developing for ProQuest Exciting new research tool for scientific researchers Creating a large ElasticSearch index for journal article canonicalization Currently in private beta, launching in the coming months
  • 12. Lessons Learned Very fast indexing Bulk indexing ftw Set up without replicas (replicas = 0, not 1) Play with bulk size Simple write to disk and CURL it in, is very fast 1M records in 40s for f in ${BATCH_DIR}/batch-*.json do echo "about to index $f" curl --silent --show-error --request POST --data-binary @$f localhost:9200/_bulk > /dev/null echo done
  • 13. Lessons learned Schema(less)? Automatic field type recognition Can miss types Strict about types #duh Mapping of subfields (doc.title vs doc.publication.title) Version dependent In reality Schema still needed Mapping changes still non trivial
  • 14. Lessons learned Learn to trust ElasticSearch Analyzers: do not pretokenize queries yourself… Difference between “term” and “text” type queries tokenized or not ElasticSearch probably already does what you want it to do Search for it Try it
  • 15. Lessons learned Issues with automated testing and node discovery/startup Start/stop hundreds of times during Jenkins test jobs or development boxes Takes time Locally sometimes picks up previous versions Memory issues: ElasticSearch manages a large part of its memory outside of the heap Do not simply increase -Xmx
  • 16. Lessons learned New tools every month waitForYellowStatus Aliases, routing allow for clever control
  • 17. API ElasticSearch is new, connection libraries still in infancy, documentation growing Issues using the Java API in Scala Happy with Scalastic now synchronous asynchronous bulk prepare https://github.com/bsadeh/scalastic
  • 18. #nodb ElasticSearch used as a full nosql datastore? Using “version” and optimistic locking scheme Could replace MongoDb in our setup ElasticSearch is actually a store optimized for getting stuff out, not for getting stuff in With free faceting Who needs multi-table transactions anyway?
  • 19. SOLR vs ElasticSearch SOLR ElasticSearch Well-known, many New kid on the block tools, extensions Very easy to configure Feels clunky to Handles document to configure lucene mapping Manual document to Horizontally scalable lucene mapping Easy replication Replication and But: shard key indexing in a cluster non-trivial New school Old school ;-)
  • 20. search evolution • Custom indexers • Inverted index • Segment merges • Custom analyzers • Faceting • Configuration of analyzers • Faceting, Geospatial • Document mapping • Sub-document queries • Replication • JSON document input • Faceting, complex queries just work
  • 21. conclusions ElasticSearch benefits Easy to setup Very clever architecture Drawbacks Very new software, tool support limited But lots of movement Change sharding in a full index non-trivial ElasticSearch Clever architecture, fast, stable Does exactly what you need
  • 22. thank you Are you still using Solr? Come on, it’s 2012 already ;-) anne@beyondtrees.com @anneveling