SlideShare a Scribd company logo
1 of 27
Download to read offline
ElasticSearch 7
Presented By
Anurag
ES1.1 -
Introduction to
ELK Stack
ElasticSearch
● Elasticsearch is a search engine based on the Apache Lucene
library.
● Open Code Business Model
● Rest based
● Distributed
● Most Popular enterprise search engine
● Netflix, Linkedin, Amazon, Oracle and many big names
Elastic (ELK) Stack
The Beats are lightweight data shippers, written in
Go, that run on your servers to capture all sorts of
operational data (logs, metrics, or network packet
data). Beats send the operational data to
Elasticsearch, either directly or via Logstash
Logstash is a server-side data processing
pipeline that ingests data from a multitude of
sources, transforms it, and then sends it to your
favorite "stash."
Kibana is a browser-based analytics and
search dashboard for Elasticsearch.
Distributed RESTful search Engine
How do ElasticSearch and Lucene Differ
Just as a car (ES) and the engine (Lucene) of a car differ
ES makes use of Lucene to manage the indices.
Lucene is a Java library. You can include it in your project and refer to its functions using function calls.
Elasticsearch is a JSON Based, Distributed, web server built over Lucene. Though it's Lucene who is doing the actual work
beneath, Elasticsearch provides us a convenient layer over Lucene. Each shard that gets created in Elasticsearch is a separate
Lucene instance. So to summarize
1. Elasticsearch is built over Lucene and provides a JSON based REST API to refer to Lucene features.
2. Elasticsearch provides a distributed system on top of Lucene. A distributed system is not something Lucene is
aware of or built for. Elasticsearch provides this abstraction of distributed structure.
3. Elasticsearch provides other supporting features like thread-pool, queues, node/cluster monitoring API, data
monitoring API, Cluster management, etc.
ES 1.2 Document
Ranking
Indexing
● Elasticsearch is able to achieve low
latency in responses because, instead of
searching the text directly, it searches in
an index instead.
● Document? The basic unit of data in ES
● Inverted Index (like at the back of a book)
○ Created by tokenizing the terms in
each document
○ Created a sorted list of all unique
terms (terms are normalized,
stemmed etc)
○ Assosciate list of documents where
the word can be found
○ Similar to the index at the back of a
book
Doc1: I am learning the cool stuff
Doc2: I am learning to learn
Inverted Index:
Am -> [Doc1, Doc2]
Cool -> [Doc1]
I -> [Doc1, Doc2]
Learn -> [Doc1, Doc2] // root for of learning
the -> [Doc1]
…
Retrieving
● Term Frequency (TF)
○ Frequency of term in given
document
● Document Frequency (DF)
○ Frequency of term in all
documents
● IDF (Inverse Document
Frequency)
○ IDF = 1 / DF
● Relevance
○ Relevance = TF * IDF
○ Relevance = TF / DF
Search Term: learn
TF1 = 1
TF2 = 2
IDF = ⅓
Rev1 = TF1 * IDF = ⅓
Rev2 = TF2 * IDF = ⅔
Rev2 > Rev1
ES 1.3 ES Cluster
Node Structure
● Index - Logical Namespace of collection of documents
● Shard - Horizontal Partition of an Index
○ Eg Documents 1-10 in one shard, 11-20 in other and so on.
○ In Elasticsearch, each Shard is a self-contained Lucene index in itself.
Cluster Structure
P1
R4
P2
R1
P3
R2
P4
R3
● Here we can see a cluster of 4
nodes
● Each node has 2 shards
● Primary and Replica shards
● For robustness and fault
tolerance, each shard is replicated
● Even if a node goes down, and a
primary shard is lost, a replica can
be made primary until recovery
● Number of replica shards has to be
set at the time of cluster creation
● Write operations on Primary and
repeated on replicas and read from
either
Types on Nodes
● Master Node
○ Cluster wide operations (creating and deleting indexes, keeping track of
index nodes, assigning shards, healthchecks etc)
● Data Node
○ Hold data and index
● Client Node
○ Load Balancer (neither data nor master nodes)
ElasticSearch 1.4
CRUD - Write
Operations
Breaking a shard into Segments
● For ES the basic unit of storage is a shard
● For Lucene the basic unit of storage is a segment
● Each segment is an inverted index
● New documents are added to new segment
● Segments are in memory and data is later persisted to
disk
● Segments are immutable
Coordination Stage
● shard_number = hash(document_id) % (num_of_primary_shards)
● All nodes know where a shard exists
● Document passed to node which contains particular shard_number
Translog
Source:
https://www.elastic.co/guide/en/elasticsearch/referenc
e/current/index-modules-translog.html
Translog and Memory Buffer
● Request written to translog
● Document added to memory buffer (which stores all the newly index documents)
● If the request is successful on the primary shard, the request is parallelly sent to the replica shards.
● In-sync shards which are always in sync with primary
● The client receives acknowledgement that the request was successful only after the translog is fsync’ed on all
primary and insync shards.
Refresh Operation
● In Elasticsearch, the _refresh operation is set to be executed every second by default.
● During this operation, the in-memory buffer contents is copied to a newly created segment in the memory.
● As a result, new data becomes available for search.
Flush Operation
● Flush essentially means that all the documents in the in-memory buffer are written to new Lucene
segments.
● These, along with all existing in-memory segments, are committed to the disk, which clears the
translog. This commit is essentially a Lucene commit.
ElasticSearch 1.5
CRUD - Update &
Delete
Elasticsearch Delete
● Documents in Elasticsearch are immutable and hence, cannot be deleted or modified to
represent any changes.
● Every segment on disk has a .del file associated with it.
● When a delete request is sent, the document is not really deleted, but marked as deleted
in the .del file.
● This document may still match a search query but is filtered out of the results.
● When segments are merged, the documents marked as deleted in the .del file are not
included in the new merged segment.
Elasticsearch Update
● When a new document is created, Elasticsearch assigns a version number to that
document.
● Every change to the document results in a new version number.
● When an update is performed, the old version is marked as deleted in the .del file and
the new version is indexed in a new segment.
● The older version may still match a search query, however, it is filtered out from the
results.
ElasticSearch 1.6
CRUD - Read
Operations
ElasticSearch Read
● In this phase, the coordinating node routes the search request to all the shards
(primary or replica) in the index.
● The shards perform search independently and create a set of results sorted by
relevance score.
● All the shards return the document IDs of the matched documents and relevant
scores to the coordinating node.
● By default, each shard sends the top 10 results to the coordinating node
● The coordinating node sorts the results globally, and creates a list of the top 10 hits.
● The coordinating node then requests the original documents from all the shards.
All the shards enrich the documents and return them to the coordinating node.
● Results are aggregated and sent to the clients
ElasticSearch Read
That’s all folks!
References
1. https://qbox.io/blog/refresh-flush-operations-elasticsearch-guide
2. https://www.elastic.co/guide/index.html
3. https://blog.insightdatascience.com/anatomy-of-an-elasticsearch-cluster-part-i-
7ac9a13b05db

More Related Content

What's hot

Testing Asynchronous Algorithms Exhaustively on node.js
Testing Asynchronous Algorithms Exhaustively on node.jsTesting Asynchronous Algorithms Exhaustively on node.js
Testing Asynchronous Algorithms Exhaustively on node.jsMaxMotovilov
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using LokiKnoldus Inc.
 
cisco uccx - creating script to read xml files
cisco uccx - creating script to read xml filescisco uccx - creating script to read xml files
cisco uccx - creating script to read xml filesFaisal Khan
 
Text tagging with finite state transducers
Text tagging with finite state transducersText tagging with finite state transducers
Text tagging with finite state transducerslucenerevolution
 
Advanced database protocols
Advanced database protocolsAdvanced database protocols
Advanced database protocolsHitesh Mohapatra
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairsphanleson
 

What's hot (9)

Testing Asynchronous Algorithms Exhaustively on node.js
Testing Asynchronous Algorithms Exhaustively on node.jsTesting Asynchronous Algorithms Exhaustively on node.js
Testing Asynchronous Algorithms Exhaustively on node.js
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Angular meteor presentation
Angular meteor presentationAngular meteor presentation
Angular meteor presentation
 
Inverted index
Inverted indexInverted index
Inverted index
 
cisco uccx - creating script to read xml files
cisco uccx - creating script to read xml filescisco uccx - creating script to read xml files
cisco uccx - creating script to read xml files
 
Text tagging with finite state transducers
Text tagging with finite state transducersText tagging with finite state transducers
Text tagging with finite state transducers
 
Advanced database protocols
Advanced database protocolsAdvanced database protocols
Advanced database protocols
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
 
Android Database
Android DatabaseAndroid Database
Android Database
 

Similar to Elasticsearch Architechture

Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfcadejaumafiq
 
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1Maruf Hassan
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and ElasticsearchDean Hamstead
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
Is Your Index Reader Really Atomic or Maybe Slow?
Is Your Index Reader Really Atomic or Maybe Slow?Is Your Index Reader Really Atomic or Maybe Slow?
Is Your Index Reader Really Atomic or Maybe Slow?lucenerevolution
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document ClassificationAlessandro Benedetti
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationSease
 

Similar to Elasticsearch Architechture (20)

Elastic search
Elastic searchElastic search
Elastic search
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
 
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
 
Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Is Your Index Reader Really Atomic or Maybe Slow?
Is Your Index Reader Really Atomic or Maybe Slow?Is Your Index Reader Really Atomic or Maybe Slow?
Is Your Index Reader Really Atomic or Maybe Slow?
 
Lecture2 oracle ppt
Lecture2 oracle pptLecture2 oracle ppt
Lecture2 oracle ppt
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document Classification
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 

Recently uploaded

Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 

Recently uploaded (20)

Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 

Elasticsearch Architechture

  • 3. ElasticSearch ● Elasticsearch is a search engine based on the Apache Lucene library. ● Open Code Business Model ● Rest based ● Distributed ● Most Popular enterprise search engine ● Netflix, Linkedin, Amazon, Oracle and many big names
  • 4. Elastic (ELK) Stack The Beats are lightweight data shippers, written in Go, that run on your servers to capture all sorts of operational data (logs, metrics, or network packet data). Beats send the operational data to Elasticsearch, either directly or via Logstash Logstash is a server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash." Kibana is a browser-based analytics and search dashboard for Elasticsearch. Distributed RESTful search Engine
  • 5. How do ElasticSearch and Lucene Differ Just as a car (ES) and the engine (Lucene) of a car differ ES makes use of Lucene to manage the indices. Lucene is a Java library. You can include it in your project and refer to its functions using function calls. Elasticsearch is a JSON Based, Distributed, web server built over Lucene. Though it's Lucene who is doing the actual work beneath, Elasticsearch provides us a convenient layer over Lucene. Each shard that gets created in Elasticsearch is a separate Lucene instance. So to summarize 1. Elasticsearch is built over Lucene and provides a JSON based REST API to refer to Lucene features. 2. Elasticsearch provides a distributed system on top of Lucene. A distributed system is not something Lucene is aware of or built for. Elasticsearch provides this abstraction of distributed structure. 3. Elasticsearch provides other supporting features like thread-pool, queues, node/cluster monitoring API, data monitoring API, Cluster management, etc.
  • 7. Indexing ● Elasticsearch is able to achieve low latency in responses because, instead of searching the text directly, it searches in an index instead. ● Document? The basic unit of data in ES ● Inverted Index (like at the back of a book) ○ Created by tokenizing the terms in each document ○ Created a sorted list of all unique terms (terms are normalized, stemmed etc) ○ Assosciate list of documents where the word can be found ○ Similar to the index at the back of a book Doc1: I am learning the cool stuff Doc2: I am learning to learn Inverted Index: Am -> [Doc1, Doc2] Cool -> [Doc1] I -> [Doc1, Doc2] Learn -> [Doc1, Doc2] // root for of learning the -> [Doc1] …
  • 8. Retrieving ● Term Frequency (TF) ○ Frequency of term in given document ● Document Frequency (DF) ○ Frequency of term in all documents ● IDF (Inverse Document Frequency) ○ IDF = 1 / DF ● Relevance ○ Relevance = TF * IDF ○ Relevance = TF / DF Search Term: learn TF1 = 1 TF2 = 2 IDF = ⅓ Rev1 = TF1 * IDF = ⅓ Rev2 = TF2 * IDF = ⅔ Rev2 > Rev1
  • 9. ES 1.3 ES Cluster
  • 10. Node Structure ● Index - Logical Namespace of collection of documents ● Shard - Horizontal Partition of an Index ○ Eg Documents 1-10 in one shard, 11-20 in other and so on. ○ In Elasticsearch, each Shard is a self-contained Lucene index in itself.
  • 11. Cluster Structure P1 R4 P2 R1 P3 R2 P4 R3 ● Here we can see a cluster of 4 nodes ● Each node has 2 shards ● Primary and Replica shards ● For robustness and fault tolerance, each shard is replicated ● Even if a node goes down, and a primary shard is lost, a replica can be made primary until recovery ● Number of replica shards has to be set at the time of cluster creation ● Write operations on Primary and repeated on replicas and read from either
  • 12. Types on Nodes ● Master Node ○ Cluster wide operations (creating and deleting indexes, keeping track of index nodes, assigning shards, healthchecks etc) ● Data Node ○ Hold data and index ● Client Node ○ Load Balancer (neither data nor master nodes)
  • 13. ElasticSearch 1.4 CRUD - Write Operations
  • 14. Breaking a shard into Segments ● For ES the basic unit of storage is a shard ● For Lucene the basic unit of storage is a segment ● Each segment is an inverted index ● New documents are added to new segment ● Segments are in memory and data is later persisted to disk ● Segments are immutable
  • 15. Coordination Stage ● shard_number = hash(document_id) % (num_of_primary_shards) ● All nodes know where a shard exists ● Document passed to node which contains particular shard_number
  • 17. Translog and Memory Buffer ● Request written to translog ● Document added to memory buffer (which stores all the newly index documents) ● If the request is successful on the primary shard, the request is parallelly sent to the replica shards. ● In-sync shards which are always in sync with primary ● The client receives acknowledgement that the request was successful only after the translog is fsync’ed on all primary and insync shards.
  • 18. Refresh Operation ● In Elasticsearch, the _refresh operation is set to be executed every second by default. ● During this operation, the in-memory buffer contents is copied to a newly created segment in the memory. ● As a result, new data becomes available for search.
  • 19. Flush Operation ● Flush essentially means that all the documents in the in-memory buffer are written to new Lucene segments. ● These, along with all existing in-memory segments, are committed to the disk, which clears the translog. This commit is essentially a Lucene commit.
  • 20. ElasticSearch 1.5 CRUD - Update & Delete
  • 21. Elasticsearch Delete ● Documents in Elasticsearch are immutable and hence, cannot be deleted or modified to represent any changes. ● Every segment on disk has a .del file associated with it. ● When a delete request is sent, the document is not really deleted, but marked as deleted in the .del file. ● This document may still match a search query but is filtered out of the results. ● When segments are merged, the documents marked as deleted in the .del file are not included in the new merged segment.
  • 22. Elasticsearch Update ● When a new document is created, Elasticsearch assigns a version number to that document. ● Every change to the document results in a new version number. ● When an update is performed, the old version is marked as deleted in the .del file and the new version is indexed in a new segment. ● The older version may still match a search query, however, it is filtered out from the results.
  • 23. ElasticSearch 1.6 CRUD - Read Operations
  • 24. ElasticSearch Read ● In this phase, the coordinating node routes the search request to all the shards (primary or replica) in the index. ● The shards perform search independently and create a set of results sorted by relevance score. ● All the shards return the document IDs of the matched documents and relevant scores to the coordinating node. ● By default, each shard sends the top 10 results to the coordinating node ● The coordinating node sorts the results globally, and creates a list of the top 10 hits. ● The coordinating node then requests the original documents from all the shards. All the shards enrich the documents and return them to the coordinating node. ● Results are aggregated and sent to the clients
  • 27. References 1. https://qbox.io/blog/refresh-flush-operations-elasticsearch-guide 2. https://www.elastic.co/guide/index.html 3. https://blog.insightdatascience.com/anatomy-of-an-elasticsearch-cluster-part-i- 7ac9a13b05db