Elasticsearch 101 - Cluster setup and tuning

•

3 likes•1,914 views

Petar Djekic

Setting up, configuring and tuning your ElasticSearch cluster. From JVM setting to useful plugins

Technology

ElasticSearch 101
Setting up, configuring and tuning your ElasticSearch
cluster

Our ElasticSearch setup
Client
node
Client
node
Data
node
Data
node
Data
node
Apps
● 8 cores, 30GB
RAM, 2TB EBS
● Running in Docker
● Apache Mesos /
Marathon
● Dedicated DN
machines

https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0
https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1195474
It’s not a database
● You don’t get the same guarantees as
from databases (ACID)
● Writes acknowledged before flushed to
persistent storage
● Network partitions can lead to data loss:
- Long GC pauses
- Kernel bugs (!)
● Deletes take longer

Rolling your own ES Cluster (1)
● Name your cluster
● Disable multicast for discovery
● Set minimum master nodes (N/2+1) => Split-Brain!

● Check open file descriptors limit
● Disable swap (or mlockall)
● Configure gateway settings
○ recover_after_time
○ expected_nodes
● Avoid tribe nodes
Rolling your own ES Cluster (2)

Exhausting available JVM heap mem
Nodes will become
unresponsive!

Memory requirements
● Bottom peaks of the used
JVM heap after the GC
run mark the required
memory (add safety
buffer)
● At least 4GB per node
● 50% for JVM, 50% for FS
cache / Lucene

JVM settings
● Define heap memory (ES_HEAP_SIZE)
● Don’t tune JVM settings
● Don’t tune thread pool
■ In some case you might have to
■ Increasing will introduce memory pressure
● Don’t use G1 garbage collector

Indexing data
● Define data schemas and types ≠ Schemaless
○ Default: string mapping = analyzed = memory costly
○ Understand tokenizers and analyzers
● Prefer bulk indexing
● Refresh interval
● Time based indexes for log data

Querying for data
● Use filters as much as possible
● `Scan & scroll` for dumping large data, e.g. when
reindexing
● Transform data during indexing if possible
● ORMs make debugging a pain.
https://www.found.no/foundation/optimizing-elasticsearch-searches/
https://abhishek376.wordpress.com/2014/11/24/how-we-optimized-100-sec-elasticsearch-queries-to-be-under-a-sub-second/

Avoid high cardinality fields
● Aggregation => field data
● Often major consumer of heap
memory
● Use doc values (on disk field data)
● Avoid aggregation on analyzed
fields

More things to watch out
● Cluster health (duh!)
● Field data cache size
● Filter cache eviction
● Slow queries
● GC pauses
● Security settings
○ no authentication by default
● Backup

Tooling
● Use official SDKs
● For Go we use ElastiGo (not so great)
● Elastic HQ
● Inquisitor
● Sense

What's hot

Intro to elasticsearchJoey Wen

What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan

ElasticSearch AJUG 2013Roy Russo

Elasticsearch - DevNexus 2015Roy Russo

Elasticsearch IntroductionRoopendra Vishwakarma

BigData, NoSQL & ElasticSearchSanura Hettiarachchi

Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch

ElasticSearch - DevNexus Atlanta - 2014Roy Russo

ElasticSearch in Production: lessons learnedBeyondTrees

Elastic SearchLukas Vlcek

Elasticsearch: You know, for search! and more!Philips Kokoh Prasetyo

ElasticSearch in actionCodemotion

Introduction to elasticsearchhypto

quick intro to elastic search medcl

Elastic searchNexThoughts Technologies

Elastic search apache_solrmacrochen

Introduction to ElasticsearchBo Andersen

Managing Your Content with ElasticsearchSamantha Quiñones

ElasticSearch Basic IntroductionMayur Rathod

What's hot (19)

Intro to elasticsearch

What I learnt: Elastic search & Kibana : introduction, installtion & configur...

ElasticSearch AJUG 2013

Elasticsearch - DevNexus 2015

Elasticsearch Introduction

BigData, NoSQL & ElasticSearch

Solr vs. Elasticsearch - Case by Case

ElasticSearch - DevNexus Atlanta - 2014

ElasticSearch in Production: lessons learned

Elastic Search

Elasticsearch: You know, for search! and more!

ElasticSearch in action

Introduction to elasticsearch

quick intro to elastic search

Elastic search

Elastic search apache_solr

Introduction to Elasticsearch

Managing Your Content with Elasticsearch

ElasticSearch Basic Introduction

Viewers also liked

Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć

03. ElasticSearch : Data In, Data OutOpenThink Labs

Elasticsearch cluster deep diveChristophe Marchal

Elastic Search Meetup Special - Yann Cluchey, Cogenta Internet World

Elasticsearch - OSDC France 2012David Pilato

Getting the Most Out of Your NoSQL DBBigstep

Perl and ElasticsearchDean Hamstead

The Five Stages of Chef Grief: My First 6 months with Chef, and Getting Aroun...DevOpsDays Austin 2014

2016 - IGNITE - An ElasticSearch Cluster Named George Armstrong Custerdevopsdaysaustin

06. ElasticSearch : Mapping and AnalysisOpenThink Labs

Elasticsearch 1.x Cluster Installation (VirtualBox)Amir Sedighi

ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) Andreas Chatzakis

An Introduction to Elasticsearch for BeginnersAmir Sedighi

Sharding with MongoDB (Eliot Horowitz)MongoSF

Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.

Intro to ElasticsearchClifford James

Elastic search overviewABC Talks

Tuning Elasticsearch Indexing Pipeline for LogsSematext Group, Inc.

Apache ignite DatagridSurinder Mehra

[2D1]Elasticsearch 성능 최적화NAVER D2

Viewers also liked (20)

Scaling massive elastic search clusters - Rafał Kuć - Sematext

03. ElasticSearch : Data In, Data Out

Elasticsearch cluster deep dive

Elastic Search Meetup Special - Yann Cluchey, Cogenta

Elasticsearch - OSDC France 2012

Getting the Most Out of Your NoSQL DB

Perl and Elasticsearch

The Five Stages of Chef Grief: My First 6 months with Chef, and Getting Aroun...

2016 - IGNITE - An ElasticSearch Cluster Named George Armstrong Custer

06. ElasticSearch : Mapping and Analysis

Elasticsearch 1.x Cluster Installation (VirtualBox)

ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)

An Introduction to Elasticsearch for Beginners

Sharding with MongoDB (Eliot Horowitz)

Side by Side with Elasticsearch & Solr, Part 2

Intro to Elasticsearch

Elastic search overview

Tuning Elasticsearch Indexing Pipeline for Logs

Apache ignite Datagrid

[2D1]Elasticsearch 성능 최적화

Similar to Elasticsearch 101 - Cluster setup and tuning

Caching inRichardWarburton

MongoDB Operational Best Practices (mongosf2012)Scott Hernandez

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez

Journey through high performance django applicationbangaloredjangousergroup

Managing Security At 1M Events a Second using ElasticsearchJoe Alex

Caching in (DevoxxUK 2013)RichardWarburton

Storage talkchristkv

WiredTiger In-Memory vs WiredTiger B-TreeSveta Smirnova

The Accidental DBAPostgreSQL Experts, Inc.

My Database Skills Killed the ServerColdFusionConference

Mongo nyc nyt + mongodbDeep Kapadia

Investigate SQL Server Memory Like Sherlock HolmesRichard Douglas

Elasticsearch Data AnalysesAlaa Elhadba

Deployment Strategies (Mongo Austin)MongoDB

Elasticsearch for Logs & Metrics - a deep diveSematext Group, Inc.

Oracle Performance On Linux X86 systems Baruch Osoveskiy

Deployment StrategyMongoDB

Loadays MySQLlefredbe

ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach

Deployment StrategiesMongoDB

Similar to Elasticsearch 101 - Cluster setup and tuning (20)

Caching in

MongoDB Operational Best Practices (mongosf2012)

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...

Journey through high performance django application

Managing Security At 1M Events a Second using Elasticsearch

Caching in (DevoxxUK 2013)

Storage talk

WiredTiger In-Memory vs WiredTiger B-Tree

The Accidental DBA

My Database Skills Killed the Server

Mongo nyc nyt + mongodb

Investigate SQL Server Memory Like Sherlock Holmes

Elasticsearch Data Analyses

Deployment Strategies (Mongo Austin)

Elasticsearch for Logs & Metrics - a deep dive

Oracle Performance On Linux X86 systems

Deployment Strategy

Loadays MySQL

ScyllaDB: NoSQL at Ludicrous Speed

Deployment Strategies

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Histor y of HAM Radio presentation slidevu2urc

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

GenCyber Cyber Security Day PresentationMichael W. Hawkins

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

How to convert PDF to text with Nanonetsnaman860154

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

2024: Domino Containers - The Next Step. News from the Domino Container commu...

08448380779 Call Girls In Civil Lines Women Seeking Men

Histor y of HAM Radio presentation slide

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Exploring the Future Potential of AI-Enabled Smartphone Processors

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Boost Fertility New Invention Ups Success Rates.pdf

GenCyber Cyber Security Day Presentation

How to Troubleshoot Apps for the Modern Connected Worker

Boost PC performance: How more available memory can improve productivity

How to convert PDF to text with Nanonets

Powerful Google developer tools for immediate impact! (2023-24 C)

Scaling API-first – The story of a global engineering organization

Axa Assurance Maroc - Insurer Innovation Award 2024

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

What Are The Drone Anti-jamming Systems Technology?

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Elasticsearch 101 - Cluster setup and tuning

1. ElasticSearch 101 Setting up, configuring and tuning your ElasticSearch cluster

2. Our ElasticSearch setup Client node Client node Data node Data node Data node Apps ● 8 cores, 30GB RAM, 2TB EBS ● Running in Docker ● Apache Mesos / Marathon ● Dedicated DN machines

3. https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0 https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1195474 It’s not a database ● You don’t get the same guarantees as from databases (ACID) ● Writes acknowledged before flushed to persistent storage ● Network partitions can lead to data loss: - Long GC pauses - Kernel bugs (!) ● Deletes take longer

4. Rolling your own ES Cluster (1) ● Name your cluster ● Disable multicast for discovery ● Set minimum master nodes (N/2+1) => Split-Brain!

5. ● Check open file descriptors limit ● Disable swap (or mlockall) ● Configure gateway settings ○ recover_after_time ○ expected_nodes ● Avoid tribe nodes Rolling your own ES Cluster (2)

6. Exhausting available JVM heap mem Nodes will become unresponsive!

7. Memory requirements ● Bottom peaks of the used JVM heap after the GC run mark the required memory (add safety buffer) ● At least 4GB per node ● 50% for JVM, 50% for FS cache / Lucene

8. JVM settings ● Define heap memory (ES_HEAP_SIZE) ● Don’t tune JVM settings ● Don’t tune thread pool ■ In some case you might have to ■ Increasing will introduce memory pressure ● Don’t use G1 garbage collector

9. Indexing data ● Define data schemas and types ≠ Schemaless ○ Default: string mapping = analyzed = memory costly ○ Understand tokenizers and analyzers ● Prefer bulk indexing ● Refresh interval ● Time based indexes for log data

10. Querying for data ● Use filters as much as possible ● `Scan & scroll` for dumping large data, e.g. when reindexing ● Transform data during indexing if possible ● ORMs make debugging a pain. https://www.found.no/foundation/optimizing-elasticsearch-searches/ https://abhishek376.wordpress.com/2014/11/24/how-we-optimized-100-sec-elasticsearch-queries-to-be-under-a-sub-second/

11. Avoid high cardinality fields ● Aggregation => field data ● Often major consumer of heap memory ● Use doc values (on disk field data) ● Avoid aggregation on analyzed fields

12. More things to watch out ● Cluster health (duh!) ● Field data cache size ● Filter cache eviction ● Slow queries ● GC pauses ● Security settings ○ no authentication by default ● Backup

13. Tooling ● Use official SDKs ● For Go we use ElastiGo (not so great) ● Elastic HQ ● Inquisitor ● Sense

Elasticsearch 101 - Cluster setup and tuning

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Elasticsearch 101 - Cluster setup and tuning

Similar to Elasticsearch 101 - Cluster setup and tuning (20)

Recently uploaded

Recently uploaded (20)

Elasticsearch 101 - Cluster setup and tuning