SlideShare a Scribd company logo
1 of 22
ElasticSearch in
     Production
                  lessons
                  learned




Anne Veling, ApacheCon EU, November 6, 2012
agenda
Introduction

ElasticSearch

Udini

Upcoming Tool

Lessons Learned
introduction
Anne Veling, @anneveling

Self-employed contractor
  Software Architect
  Agile process management
  Performance optimization
  Lucene/SOLR/ElasticSearch implementations & training
ElasticSearch
Apache Lucene

Started in 2010 by Shay Banon

Open Source – Apache License

A company was formed in 2012: ElasticSearch
  Training, support and development

Careful feature development
  vs. build because you can
ElasticSearch
Scalable
  Distributed, Node Discovery
  Automatic sharding
  Query distribution

RESTful, HTTP API
  With API wrappers for Ruby, Java, Scala, …
  JSON in, JSON out

Document Model
  Maps book.author.lastName
  “schemaless” -> field type recognition
  Keeps source, keeps „version‟ number
ElasticSearch
Integrated faceting
  With statistical aggregates (sum/avg/…) for free

Field types and analyzers
  String, numerics, geo, attachment, …
  Arrays, subdocuments, nested documents

Integrated sharding
  Routing and alias
  Cross-index searching / multi-document type
udini.proquest.com
ProQuest

The World‟s Article Store

Stack
  Amazon EC2
  Scala with Unfiltered
  MongoDB, ElasticSearch
architecture



 Fulfillment
                          Udini         Summon
 Providers




PDF pipeline      mongo       Elastic
                                        SOLR
                   DB         Search
SOLR at Udini
Connecting to Summon API
  700M SOLR Cluster

In Udini, we serve a subset of 160M full text articles
  Including fulfillment mechanisms
  PDF and HTML5 viewing and annotation
ElasticSearch at Udini
Local index to search your articles

Many small user libraries, searching only locally
  User-id as sharding key
  Include key in all queries
Exciting new product
Developing for ProQuest

Exciting new research tool for scientific researchers

Creating a large ElasticSearch index for journal article
canonicalization

Currently in private beta, launching in the coming months
Lessons Learned
   Very fast indexing

   Bulk indexing ftw
      Set up without replicas (replicas = 0, not 1)
      Play with bulk size
      Simple write to disk and CURL it in, is very fast
      1M records in 40s
for f in ${BATCH_DIR}/batch-*.json
do
  echo "about to index $f"
  curl --silent --show-error --request POST
         --data-binary @$f localhost:9200/_bulk > /dev/null
  echo
done
Lessons learned
Schema(less)?

Automatic field type recognition
   Can miss types
   Strict about types #duh

Mapping of subfields (doc.title vs doc.publication.title)
   Version dependent

In reality
   Schema still needed
   Mapping changes still non trivial
Lessons learned
Learn to trust ElasticSearch
  Analyzers: do not pretokenize queries yourself…

Difference between “term” and “text” type queries
  tokenized or not

ElasticSearch probably already does what you want it to
do
  Search for it
  Try it
Lessons learned
Issues with automated testing and node discovery/startup

Start/stop hundreds of times during Jenkins test jobs or
development boxes
  Takes time
  Locally sometimes picks up previous versions

Memory issues: ElasticSearch manages a large part of its
memory outside of the heap
  Do not simply increase -Xmx
Lessons learned
New tools every month

waitForYellowStatus

Aliases, routing allow for clever control
API
ElasticSearch is new, connection libraries still in infancy,
documentation growing

Issues using the Java API in Scala

Happy with Scalastic now
  synchronous
  asynchronous
  bulk prepare          https://github.com/bsadeh/scalastic
#nodb
ElasticSearch used as a full nosql datastore?

Using “version” and optimistic locking scheme

Could replace MongoDb in our setup



ElasticSearch is actually a store optimized for getting stuff
out, not for getting stuff in
  With free faceting
  Who needs multi-table transactions anyway?
SOLR vs ElasticSearch
SOLR                      ElasticSearch
  Well-known, many          New kid on the block
  tools, extensions         Very easy to configure
  Feels clunky to           Handles document to
  configure                 lucene mapping
  Manual document to        Horizontally scalable
  lucene mapping
                               Easy replication
  Replication and
                               But: shard key
  indexing in a cluster
  non-trivial

                          New school
Old school ;-)
search evolution
     • Custom indexers

     • Inverted index
     • Segment merges

     • Custom analyzers
     • Faceting

     • Configuration of analyzers
     • Faceting, Geospatial

     • Document mapping
     • Sub-document queries
     • Replication

     • JSON document input
     • Faceting, complex queries just work
conclusions
ElasticSearch benefits
  Easy to setup
  Very clever architecture

Drawbacks
  Very new software, tool support limited
     But lots of movement
  Change sharding in a full index non-trivial

ElasticSearch
  Clever architecture, fast, stable
  Does exactly what you need
thank you


Are you still using Solr?
Come on, it’s 2012 already ;-)




                      anne@beyondtrees.com

                                 @anneveling

More Related Content

What's hot

Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in actionCodemotion
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchBo Andersen
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with ElasticsearchSamantha Quiñones
 
Search domain basics
Search domain basicsSearch domain basics
Search domain basicspmanvi
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiRobert Calcavecchia
 
Scala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologistScala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologistpmanvi
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchSigmoid
 

What's hot (20)

Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Elastic search
Elastic searchElastic search
Elastic search
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
Search domain basics
Search domain basicsSearch domain basics
Search domain basics
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
 
Scala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologistScala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologist
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
 

Viewers also liked

Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for ElasticsearchFlorian Hopf
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearchsirensolutions
 
Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in ZalandoAlaa Elhadba
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsAlaa Elhadba
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 

Viewers also liked (6)

Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in Zalando
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & Aggregations
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 

Similar to ElasticSearch in Production: lessons learned

Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting StartedOnuralp Taner
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearchAnton Udovychenko
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 reportKoji Kawamura
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Seravia in the Cloud
Seravia in the CloudSeravia in the Cloud
Seravia in the Cloudkidrane
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonChetan Giridhar
 
Elasticsearch python
Elasticsearch pythonElasticsearch python
Elasticsearch pythonvaliantval2
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 

Similar to ElasticSearch in Production: lessons learned (20)

Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting Started
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearch
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Seravia in the Cloud
Seravia in the CloudSeravia in the Cloud
Seravia in the Cloud
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 
Elasticsearch python
Elasticsearch pythonElasticsearch python
Elasticsearch python
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Solr 8 interview
Solr 8 interview Solr 8 interview
Solr 8 interview
 
963
963963
963
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

ElasticSearch in Production: lessons learned

  • 1. ElasticSearch in Production lessons learned Anne Veling, ApacheCon EU, November 6, 2012
  • 3. introduction Anne Veling, @anneveling Self-employed contractor Software Architect Agile process management Performance optimization Lucene/SOLR/ElasticSearch implementations & training
  • 4. ElasticSearch Apache Lucene Started in 2010 by Shay Banon Open Source – Apache License A company was formed in 2012: ElasticSearch Training, support and development Careful feature development vs. build because you can
  • 5. ElasticSearch Scalable Distributed, Node Discovery Automatic sharding Query distribution RESTful, HTTP API With API wrappers for Ruby, Java, Scala, … JSON in, JSON out Document Model Maps book.author.lastName “schemaless” -> field type recognition Keeps source, keeps „version‟ number
  • 6. ElasticSearch Integrated faceting With statistical aggregates (sum/avg/…) for free Field types and analyzers String, numerics, geo, attachment, … Arrays, subdocuments, nested documents Integrated sharding Routing and alias Cross-index searching / multi-document type
  • 7. udini.proquest.com ProQuest The World‟s Article Store Stack Amazon EC2 Scala with Unfiltered MongoDB, ElasticSearch
  • 8. architecture Fulfillment Udini Summon Providers PDF pipeline mongo Elastic SOLR DB Search
  • 9. SOLR at Udini Connecting to Summon API 700M SOLR Cluster In Udini, we serve a subset of 160M full text articles Including fulfillment mechanisms PDF and HTML5 viewing and annotation
  • 10. ElasticSearch at Udini Local index to search your articles Many small user libraries, searching only locally User-id as sharding key Include key in all queries
  • 11. Exciting new product Developing for ProQuest Exciting new research tool for scientific researchers Creating a large ElasticSearch index for journal article canonicalization Currently in private beta, launching in the coming months
  • 12. Lessons Learned Very fast indexing Bulk indexing ftw Set up without replicas (replicas = 0, not 1) Play with bulk size Simple write to disk and CURL it in, is very fast 1M records in 40s for f in ${BATCH_DIR}/batch-*.json do echo "about to index $f" curl --silent --show-error --request POST --data-binary @$f localhost:9200/_bulk > /dev/null echo done
  • 13. Lessons learned Schema(less)? Automatic field type recognition Can miss types Strict about types #duh Mapping of subfields (doc.title vs doc.publication.title) Version dependent In reality Schema still needed Mapping changes still non trivial
  • 14. Lessons learned Learn to trust ElasticSearch Analyzers: do not pretokenize queries yourself… Difference between “term” and “text” type queries tokenized or not ElasticSearch probably already does what you want it to do Search for it Try it
  • 15. Lessons learned Issues with automated testing and node discovery/startup Start/stop hundreds of times during Jenkins test jobs or development boxes Takes time Locally sometimes picks up previous versions Memory issues: ElasticSearch manages a large part of its memory outside of the heap Do not simply increase -Xmx
  • 16. Lessons learned New tools every month waitForYellowStatus Aliases, routing allow for clever control
  • 17. API ElasticSearch is new, connection libraries still in infancy, documentation growing Issues using the Java API in Scala Happy with Scalastic now synchronous asynchronous bulk prepare https://github.com/bsadeh/scalastic
  • 18. #nodb ElasticSearch used as a full nosql datastore? Using “version” and optimistic locking scheme Could replace MongoDb in our setup ElasticSearch is actually a store optimized for getting stuff out, not for getting stuff in With free faceting Who needs multi-table transactions anyway?
  • 19. SOLR vs ElasticSearch SOLR ElasticSearch Well-known, many New kid on the block tools, extensions Very easy to configure Feels clunky to Handles document to configure lucene mapping Manual document to Horizontally scalable lucene mapping Easy replication Replication and But: shard key indexing in a cluster non-trivial New school Old school ;-)
  • 20. search evolution • Custom indexers • Inverted index • Segment merges • Custom analyzers • Faceting • Configuration of analyzers • Faceting, Geospatial • Document mapping • Sub-document queries • Replication • JSON document input • Faceting, complex queries just work
  • 21. conclusions ElasticSearch benefits Easy to setup Very clever architecture Drawbacks Very new software, tool support limited But lots of movement Change sharding in a full index non-trivial ElasticSearch Clever architecture, fast, stable Does exactly what you need
  • 22. thank you Are you still using Solr? Come on, it’s 2012 already ;-) anne@beyondtrees.com @anneveling