SlideShare a Scribd company logo
1 of 13
GRAYLOG ENGINEERING
DESIGN YOUR ARCHITECTURE.
README.
This is not a guide for the squeamish.
This is a peek for those who like to go off the beaten path, sometimes alone.
For those who aren’t afraid of pulling open the hood and getting their hands dirty.
This is the culmination of five years of engineering design in our hope to bring you
the fastest machine data processing engine on the planet.
Don’t call your sales rep, they won’t know the answers.
-GRAYLOG ENGINEERING
1.
2.
3.
4.
4 1/2
5.
6.
7.
8. 9.
LEGEND:
1 & 2. LOG MESSAGES & LOAD
BALANCER.
3. TRANSPORT LAYER.
4. PROCESSING CHAIN.
4½ - REST API.
5. MONGODB REPLICA SET.
6. ELASTICSEARCH CLUSTER.
7. ANATOMY OF A SINGLE INDEX.
8. INDEX MODEL.
9. DEFLECTOR QUEUE.
1 & 2, LOG MESSAGES & LOAD BALANCER.
tl;dr
We’re not going to spend any time here. Basically, send us any machine data
(structured or not) and use whatever load balancer you like.
The # of messages, their peak rates, average size and extractions performed will
affect performance, but we’ll cover that later.
3, TRANSPORT LAYER.
This is the inputs and journal on top of the Graylog server. It consists of inputs from
the message cloud (this is our syslog stream, as well as other inputs). These get
pre-processed without user configurability into parts of a message.
While the journal is on disk (I/O), it is an *append only* journal where there is no
seek time. (Internally we re-use Apache Kafka code to do this - thanks LinkedIn).
The write “needle” is always close to the same point on the disk so it does not
constantly scan. This makes it blazing fast. You can turn it off, but we do not
recommend it.
Why we did this: Other systems do not have this, so they will lose messages coming
in when message spikes happen because the network layer will start to reject them
or your local memory will explode.
4, PROCESSING CHAIN.
These messages are then taken and written into a process buffer, which is a ring
buffer. We are using the Disruptor library from LMAX, a high speed trading company
that relies on high speed and low latency.
Messages are then processed by the process buffer processor, where stream
routing and extracting of fields happens. This part can get CPU intensive! The filtered
message then goes into the output buffer (another ring buffer), then the output buffer
processor, and onwards to Elasticsearch (ES) or user defined output.
ProTip: Tuning the number of processors run per buffer is important and should
never exceed the number of CPU cores you have available for graylog-server.
Increase number of processors if you see too low throughput and try to focus on
process buffer processors because the output buffer usually does not need many. A
symptom of not enough processors is full buffers.
4½ , REST API.
Why is this different than any other rest API?
This is the same API we use on our web front end, hence you can make any
read/write call we do in your own UI. Yup, you can build your own front end.
Also, it has to be high quality, because this is the same API we use ourselves day to
day. It is not like others where it is just an API that is provided for external users to
integrate with, built once and patched with duct tape every release. Not that we don’t
like duct tape….
5, MONGO.
Then there is Mongo, which is storing only metadata: users, settings and
configuration data on all items: streams, dashboards, extractors, etc. Anything you
configure. If Mongo goes down, Graylog will continue to run. So, it is your choice
whether to include it in a high availability design.
Mongo recommends for HA scenario’s three instances of it. This is because if one
goes down then Mongo has to recommend a primary, and without two more it can
get confused between the first two. See Mongo Replication set for instructions.
6, ELASTICSEARCH CLUSTER.
We connect to ES servers as an embedded ES node that does not store data. So,
we look and act like an ES node, and know about configuration data (indexes,
shards, etc) for each ES server.
When writing to ES and when you are not a node, you have encode and transmit
over the wire as HTTP and then JSON and then decode it, etc. As a node you can
send it in native format, and it is fast.
For HA, we recommend having at least one replica configured.
7, ANATOMY OF AN INDEX.
A single index (In this example, Graylog Index #25), is broken into shards. This
means the index is broken up and the parts are run on different ES nodes. This
makes for faster searches because the query result can be computed on multiple ES
nodes in parallel.
An index can also have replicas configured. This means that each shard is mirrored
to other nodes, which is great for HA.
8, INDEX MODEL.
Each index is numbered starting with 0 the first time. In a time series database, all
data is stored with a time stamp, and once it is stored it is not gone back to be re-
written (hence is marked as READ_ONLY vs WRITE_ACTIVE for performance). So,
messages are not gone back to be re-inserted. This makes it fast. Because of the
time based storage, this also means when you query it you must give a time bound
search (i.e. in the last hour…).
Pro Tip: So the size of these indexes matter when performance tuning. You don’t
want to make the indexes too big because then the searches will take much longer,
and you don’t want them too short for the same reason. The indexes should be sized
based on the amount of data a you have and how far you normally search.
Sometimes people use it for longer historical strategic type searches. It is important
to know and size this correctly.
9, DEFLECTOR.
We write to an index alias called ‘deflector’ that can be atomically switched to a new
index. This allows us not to worry about having to stop message processing when
creating a new index because that is error-prone to manage (oh, index #25 is now
closed, ahh wait, okay the next one is #26, go ahead and write).
Why are we telling you this? Because, well, it’s these kinds of things that makes us
different. We are proud of thinking about all the small things that give you great
performance and stability, and hope you have enjoyed reading this as much as we
did writing it.
Graylog Engineering - Design Your Architecture

More Related Content

What's hot

Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureDatabricks
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google CloudPgDay.Seoul
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarScyllaDB
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyftTao Feng
 
Hazelcast Distributed Lock
Hazelcast Distributed LockHazelcast Distributed Lock
Hazelcast Distributed LockJadson Santos
 
OSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchOSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchNETWAYS
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxRomanKhavronenko
 

What's hot (20)

Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud[pgday.Seoul 2022] PostgreSQL with Google Cloud
[pgday.Seoul 2022] PostgreSQL with Google Cloud
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel Quarkus
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
Hazelcast Distributed Lock
Hazelcast Distributed LockHazelcast Distributed Lock
Hazelcast Distributed Lock
 
OSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchOSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearch
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
ELK Stack
ELK StackELK Stack
ELK Stack
 

Similar to Graylog Engineering - Design Your Architecture

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practiceswebuploader
 
PASS Spanish Recomendaciones para entornos de SQL Server productivos
PASS Spanish   Recomendaciones para entornos de SQL Server productivosPASS Spanish   Recomendaciones para entornos de SQL Server productivos
PASS Spanish Recomendaciones para entornos de SQL Server productivosJavier Villegas
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesJohn Coggeshall
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101Itiel Shwartz
 
Architecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedArchitecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedJoão Pedro Martins
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysqlliufabin 66688
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaPeter Lawrey
 
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareBeyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareQuantum Leaps, LLC
 
10 things you're doing wrong in Talend
10 things you're doing wrong in Talend10 things you're doing wrong in Talend
10 things you're doing wrong in TalendMatthew Schroeder
 

Similar to Graylog Engineering - Design Your Architecture (20)

Spring batch
Spring batchSpring batch
Spring batch
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Hadoop bank
Hadoop bankHadoop bank
Hadoop bank
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practices
 
PASS Spanish Recomendaciones para entornos de SQL Server productivos
PASS Spanish   Recomendaciones para entornos de SQL Server productivosPASS Spanish   Recomendaciones para entornos de SQL Server productivos
PASS Spanish Recomendaciones para entornos de SQL Server productivos
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 Mistakes
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
pm1
pm1pm1
pm1
 
Speed up sql
Speed up sqlSpeed up sql
Speed up sql
 
Architecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons LearnedArchitecting a Large Software Project - Lessons Learned
Architecting a Large Software Project - Lessons Learned
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysql
 
Writing and testing high frequency trading engines in java
Writing and testing high frequency trading engines in javaWriting and testing high frequency trading engines in java
Writing and testing high frequency trading engines in java
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
Concurrency and parallel in .net
Concurrency and parallel in .netConcurrency and parallel in .net
Concurrency and parallel in .net
 
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareBeyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
 
10 things you're doing wrong in Talend
10 things you're doing wrong in Talend10 things you're doing wrong in Talend
10 things you're doing wrong in Talend
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Graylog Engineering - Design Your Architecture

  • 2. README. This is not a guide for the squeamish. This is a peek for those who like to go off the beaten path, sometimes alone. For those who aren’t afraid of pulling open the hood and getting their hands dirty. This is the culmination of five years of engineering design in our hope to bring you the fastest machine data processing engine on the planet. Don’t call your sales rep, they won’t know the answers. -GRAYLOG ENGINEERING
  • 3. 1. 2. 3. 4. 4 1/2 5. 6. 7. 8. 9. LEGEND: 1 & 2. LOG MESSAGES & LOAD BALANCER. 3. TRANSPORT LAYER. 4. PROCESSING CHAIN. 4½ - REST API. 5. MONGODB REPLICA SET. 6. ELASTICSEARCH CLUSTER. 7. ANATOMY OF A SINGLE INDEX. 8. INDEX MODEL. 9. DEFLECTOR QUEUE.
  • 4. 1 & 2, LOG MESSAGES & LOAD BALANCER. tl;dr We’re not going to spend any time here. Basically, send us any machine data (structured or not) and use whatever load balancer you like. The # of messages, their peak rates, average size and extractions performed will affect performance, but we’ll cover that later.
  • 5. 3, TRANSPORT LAYER. This is the inputs and journal on top of the Graylog server. It consists of inputs from the message cloud (this is our syslog stream, as well as other inputs). These get pre-processed without user configurability into parts of a message. While the journal is on disk (I/O), it is an *append only* journal where there is no seek time. (Internally we re-use Apache Kafka code to do this - thanks LinkedIn). The write “needle” is always close to the same point on the disk so it does not constantly scan. This makes it blazing fast. You can turn it off, but we do not recommend it. Why we did this: Other systems do not have this, so they will lose messages coming in when message spikes happen because the network layer will start to reject them or your local memory will explode.
  • 6. 4, PROCESSING CHAIN. These messages are then taken and written into a process buffer, which is a ring buffer. We are using the Disruptor library from LMAX, a high speed trading company that relies on high speed and low latency. Messages are then processed by the process buffer processor, where stream routing and extracting of fields happens. This part can get CPU intensive! The filtered message then goes into the output buffer (another ring buffer), then the output buffer processor, and onwards to Elasticsearch (ES) or user defined output. ProTip: Tuning the number of processors run per buffer is important and should never exceed the number of CPU cores you have available for graylog-server. Increase number of processors if you see too low throughput and try to focus on process buffer processors because the output buffer usually does not need many. A symptom of not enough processors is full buffers.
  • 7. 4½ , REST API. Why is this different than any other rest API? This is the same API we use on our web front end, hence you can make any read/write call we do in your own UI. Yup, you can build your own front end. Also, it has to be high quality, because this is the same API we use ourselves day to day. It is not like others where it is just an API that is provided for external users to integrate with, built once and patched with duct tape every release. Not that we don’t like duct tape….
  • 8. 5, MONGO. Then there is Mongo, which is storing only metadata: users, settings and configuration data on all items: streams, dashboards, extractors, etc. Anything you configure. If Mongo goes down, Graylog will continue to run. So, it is your choice whether to include it in a high availability design. Mongo recommends for HA scenario’s three instances of it. This is because if one goes down then Mongo has to recommend a primary, and without two more it can get confused between the first two. See Mongo Replication set for instructions.
  • 9. 6, ELASTICSEARCH CLUSTER. We connect to ES servers as an embedded ES node that does not store data. So, we look and act like an ES node, and know about configuration data (indexes, shards, etc) for each ES server. When writing to ES and when you are not a node, you have encode and transmit over the wire as HTTP and then JSON and then decode it, etc. As a node you can send it in native format, and it is fast. For HA, we recommend having at least one replica configured.
  • 10. 7, ANATOMY OF AN INDEX. A single index (In this example, Graylog Index #25), is broken into shards. This means the index is broken up and the parts are run on different ES nodes. This makes for faster searches because the query result can be computed on multiple ES nodes in parallel. An index can also have replicas configured. This means that each shard is mirrored to other nodes, which is great for HA.
  • 11. 8, INDEX MODEL. Each index is numbered starting with 0 the first time. In a time series database, all data is stored with a time stamp, and once it is stored it is not gone back to be re- written (hence is marked as READ_ONLY vs WRITE_ACTIVE for performance). So, messages are not gone back to be re-inserted. This makes it fast. Because of the time based storage, this also means when you query it you must give a time bound search (i.e. in the last hour…). Pro Tip: So the size of these indexes matter when performance tuning. You don’t want to make the indexes too big because then the searches will take much longer, and you don’t want them too short for the same reason. The indexes should be sized based on the amount of data a you have and how far you normally search. Sometimes people use it for longer historical strategic type searches. It is important to know and size this correctly.
  • 12. 9, DEFLECTOR. We write to an index alias called ‘deflector’ that can be atomically switched to a new index. This allows us not to worry about having to stop message processing when creating a new index because that is error-prone to manage (oh, index #25 is now closed, ahh wait, okay the next one is #26, go ahead and write). Why are we telling you this? Because, well, it’s these kinds of things that makes us different. We are proud of thinking about all the small things that give you great performance and stability, and hope you have enjoyed reading this as much as we did writing it.