Tuning Kafka for Fun and Profit

•Download as PPTX, PDF•

15 likes•5,992 views

This document discusses tuning Kafka for performance. It covers optimizing Zookeeper configurations like using SSDs; using RAID or JBOD for Kafka broker disks with testing showing XFS performs best; scaling Kafka clusters by considering disk capacity, network capacity, and partition counts; configuring topics for retention settings and partition balancing; and tuning Mirror Maker for network locality and producer/consumer settings.

Data & Analytics

ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved.
Tuning Kafka for Fun and Profit

ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved.
Zookeeper
 5-node vs. 3-node Ensembles
 Solid State Disks
– Use good SSDs
– Transaction logs only
– Significant improvement in latency and outstanding requests
2

ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved.
Kafka Broker Disks
 Disk Layout
 JBOD vs. RAID
– JBOD and RAID-0 are similar
– RAID-5/6 has significant performance overhead
– RAID-10 still offers the best performance and protection
 Filesystem
– New testing shows XFS has a clear benefit
– No tuning required
– Will be continuing testing with more production traffic
3

ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved.
Scaling Kafka Clusters
 Disk Capacity
 Network Capacity
 Partition Counts
– Per-Cluster
– Per-Broker
 Limitations
– Topic list length
4

ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved.
Topic Configuration
 Retention Settings
 Partition Counts
– Balance over consumers
– Balance over brokers
– Partition size on disk
– Application-specific requirements
5

ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved.
Mirror Maker
 Network Locality
 Consumer Tuning
– Number of streams
– Partition assignment strategy
 Producer Tuning
– Number of streams
– In flight requests
– Linger time
6

What's hot

Micro service architecture Ayyappan Paramesh

Kafka at scale facebook israelGwen (Chen) Shapira

Key Performance Indicators for Managing MongoDB and Recommended Production Co...MongoDB

Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella

Introduction to Apache KafkaJeff Holoman

High Availability Using MySQL Group ReplicationOSSCube

Apache Kafka Best PracticesDataWorks Summit/Hadoop Summit

Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira

The DBA 3.0 UpgradeSean Scott

High Availability with MariaDB EnterpriseMariaDB Corporation

Design Patterns for working with Fast DataMapR Technologies

Webinar slides: Introduction to Database Proxies (for MySQL)Continuent

Developing with the Go client for Apache KafkaJoe Stein

Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Redis Labs

Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMRedis Labs

Introduction to KafkaAkash Vacher

Become a MySQL DBA: performing live database upgrades - webinar slidesSeveralnines

Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent

Webinar slides: How to deploy and manage HAProxy, MaxScale or ProxySQL with C...Severalnines

Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...HostedbyConfluent

What's hot (20)

Micro service architecture

Kafka at scale facebook israel

Key Performance Indicators for Managing MongoDB and Recommended Production Co...

Real time Messages at Scale with Apache Kafka and Couchbase

Introduction to Apache Kafka

High Availability Using MySQL Group Replication

Apache Kafka Best Practices

Multi-Datacenter Kafka - Strata San Jose 2017

The DBA 3.0 Upgrade

High Availability with MariaDB Enterprise

Design Patterns for working with Fast Data

Webinar slides: Introduction to Database Proxies (for MySQL)

Developing with the Go client for Apache Kafka

Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020

Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM

Introduction to Kafka

Become a MySQL DBA: performing live database upgrades - webinar slides

Building High-Throughput, Low-Latency Pipelines in Kafka

Webinar slides: How to deploy and manage HAProxy, MaxScale or ProxySQL with C...

Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...

Similar to Tuning Kafka for Fun and Profit

Gluster for Geeks: Performance Tuning Tips & TricksGlusterFS

Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Community

Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red_Hat_Storage

Getting The Most Out Of Your Flash/SSDsAerospike, Inc.

Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...MongoDB

Milestone Server And Storage Best Practicehypknight

Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red_Hat_Storage

Fulcrum Group Storage And Storage Virtualization PresentationSteve Meek

Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax

High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB

Storage spaces direct webinarВиталий Стародубцев

AUSOUG - NZOUG-GroundBreakers-Jun 2019 - 19c RACSandesh Rao

[B34] MySQL最新ロードマップ – MySQL 5.7とその先へ by Ryusuke KajiyamaInsight Technology, Inc.

RAIDMike Tennyson

A presentaion on Panasas HPC NASRahul Janghel

Oracle RAC 12c OverviewMarkus Michalewicz

Storage, San And Business Continuity OverviewAlan McSweeney

50-Tips-for-Boosting-MySQL-Performance-CON2655.pdfAsparuhPolyovski2

MySQL and MariaDBAmazon Web Services

Storage systems reliabilityJuha Salenius

Similar to Tuning Kafka for Fun and Profit (20)

Gluster for Geeks: Performance Tuning Tips & Tricks

Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology

Red Hat Ceph Storage Acceleration Utilizing Flash Technology

Getting The Most Out Of Your Flash/SSDs

Transforming your Business with Scale-Out Flash: How MongoDB & Flash Accelera...

Milestone Server And Storage Best Practice

Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...

Fulcrum Group Storage And Storage Virtualization Presentation

Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...

High Performance, Scalable MongoDB in a Bare Metal Cloud

Storage spaces direct webinar

AUSOUG - NZOUG-GroundBreakers-Jun 2019 - 19c RAC

[B34] MySQL最新ロードマップ – MySQL 5.7とその先へ by Ryusuke Kajiyama

RAID

A presentaion on Panasas HPC NAS

Oracle RAC 12c Overview

Storage, San And Business Continuity Overview

50-Tips-for-Boosting-MySQL-Performance-CON2655.pdf

MySQL and MariaDB

Storage systems reliability

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Halmar dropshipping via API with DroFxolyaivanovalion

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Introduction-to-Machine-Learning (1).pptxfirstjob4

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recently uploaded (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Generative AI on Enterprise Cloud with NiFi and Milvus

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Halmar dropshipping via API with DroFx

Carero dropshipping via API with DroFx.pptx

Determinants of health, dimensions of health, positive health and spectrum of...

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Schema on read is obsolete. Welcome metaprogramming..pdf

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Smarteg dropshipping via API with DroFx.pptx

Invezz.com - Grow your wealth with trading signals

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

100-Concepts-of-AI by Anupama Kate .pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Edukaciniai dropshipping via API with DroFx

Introduction-to-Machine-Learning (1).pptx

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Tuning Kafka for Fun and Profit

2. ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved. Zookeeper  5-node vs. 3-node Ensembles  Solid State Disks – Use good SSDs – Transaction logs only – Significant improvement in latency and outstanding requests 2

3. ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved. Kafka Broker Disks  Disk Layout  JBOD vs. RAID – JBOD and RAID-0 are similar – RAID-5/6 has significant performance overhead – RAID-10 still offers the best performance and protection  Filesystem – New testing shows XFS has a clear benefit – No tuning required – Will be continuing testing with more production traffic 3

4. ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved. Scaling Kafka Clusters  Disk Capacity  Network Capacity  Partition Counts – Per-Cluster – Per-Broker  Limitations – Topic list length 4

5. ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved. Topic Configuration  Retention Settings  Partition Counts – Balance over consumers – Balance over brokers – Partition size on disk – Application-specific requirements 5

6. ORGANIZATION NAME©2013 LinkedIn Corporation. All Rights Reserved. Mirror Maker  Network Locality  Consumer Tuning – Number of streams – Partition assignment strategy  Producer Tuning – Number of streams – In flight requests – Linger time 6

Editor's Notes

We start talking about tuning from the ground up, and Kafka is underpinned by Zookeeper. This tends to be an application that we forget about unless we have problems, because it just runs, but it needs love too. One thing we’ve learned recently is about ensemble sizing in Zookeeper. There has been a lot of work done on performance at different ensemble sizes, and this is largely driven by the ZAB protocol and the network traffic involved. We run either 3-node or 5-node ensembles, with most of the 3-node ensembles being in our staging environments, but we are moving to all 5-node for a very important reason. In order to add a new server to the ensemble, you need to take down each node in turn, add the new server to the config, and bring it back up. If you don’t want to take Zookeeper down, you have to maintain quorum while you do this. If you have one node down in a 3 node cluster due to hardware problems, there is no way to change the server list without an outage because you cannot take a second server offline and maintain quorum. The other important change we have made to Zookeeper is to run it on solid state disks. There’s some information out there that suggests this is a bad thing, but our experience has been the opposite. The first thing to note is that we use really good SSDs, not the consumer grade ones you can buy from Best Buy. The Virident cards we use have garbage collection and are very robust. We only put the transaction logs on SSD, keeping the snapshots on spinning disk. By doing this, we have dropped min, max, and average latency to 0ms (from an average of 20ms), with no outstanding requests during normal operations, even at peak load.
Moving on from Zookeeper to the Kafka brokers, mostly what we look at here is disk. Our CPU and memory are fairly standard 12-CPU systems (with hyperthreading) and 64 GB of memory, and we do not colocate any other application with Kafka (which is running on physical hardware, not a virtual environment). Having a lot of memory is helpful because Kafka depends on the pagecache to get the best performance for consumers. With disk, the more spindles you have, the better off you will be. Produce times are dependent on disk IO (assuming you are not using an acknowledgement setting of 0 where you are producing in a “fire and forget” mode), so the more you can spread that out the better. We have recently done a lot of testing of RAID layouts, to validate that our configuration of using RAID-10 on 14 disks was the optimal layout. What we found is that JBOD and RAID-0 perform the best, but offer no protection of the data (if you lose one disk, you lose everything on that broker). RAID 5 and 6 give you a nice balance of protection and disk capacity, but we ran into significant performance problems (produce times shot up to over 20 seconds in the 99% case). RAID-10 gave us the best balance of performance and protection, and is where we are staying for now. It is notable that we are running software RAID, and have not done any testing with hardware RAID. All of our testing was done with a variety of RAID stripe settings, and we found that at least for RAID-10, the default 512 Kb stripe is the best choice. Larger stripes did not offer a significant improvement. We have also been retesting the filesystem lately. Currently, Kafka log segments are stored on an ext4 filesystem, configured with a 120 second commit interval with writeback mode. These settings are obviously unsafe, and we justified it by knowing that we were also replicating data within Kafka and could suffer a system failure. A datacenter power outage changed this view, and we were left with a large amount of disk corruption, both at the file level and the block level. We found that XFS is a better choice of filesystem, offering significant performance benefits without needing to resort to unsafe tuning. We’ll be continuing this testing in some of our staging environments soon.
Once we have an optimal configuration for a single broker, we look at how many brokers we need to have in a cluster. The driving factor for us right now is the disk capacity. We use a default retention of 4 days for almost all topics, and having enough disk space to handle this is the primary driver behind increasing the size of a cluster. We threshold our alerts at 60%, and increase the cluster size when we hit this limit. This gives us enough headroom to move partitions around (which resets the retention clock), and wait for new hardware to arrive if needed. Another concern with sizing is the network capacity. While Kafka can definitely operate at line speed for a 1 Gigabit NIC, you want to have some overhead reserved for intra-cluster replication and communication. For this reason, we threshold our network alerts at 75%. If we go above that at peak load, we need to spread out the traffic over more systems. This is another good reason to make sure you balance partitions across your brokers as evenly as possible. The number of partition you have in your cluster is a lesser, but important, concern. Here we are mostly concerned with the number of partitions on a single broker. We have noticed performance problems above 4000 partitions per-broker, though we are not sure exactly where that problem is (whether it is with open filehandles, data structures in the broker, or problems in the controller). We are about to start testing on much larger Kafka broker hardware, however, and will be digging into this limitation. As a side note, you should keep an eye on the number of topics you have for a reason that is not immediately obvious. Zookeeper has a limit of 1 MB as the size of the data in a node. This also applies to the combined length of all the names of the child nodes. Because all of the topics exist as child nodes under /brokers/topics, there is a limitation here. If your topic names are all 50 characters long, and you have more than about 20,900 topics, you will hit this limitation. This could cause Zookeeper to fail entirely, or it could cause problems in Kafka. The guarantee is that it will cause problems.
Now that Kafka is running well, we can turn our attention to the topics. In general, there are two things to configure when it comes to topics: the retention, and the number of partitions. There are other things you can look at, such as the segment size, or how long until the segments are rolled, which may have application-specific concerns. But in large part, all we really care about is how long we keep the data, and how much we spread it out. Topics can be configured for retention by time, by size, or by key. There is a default broker-level setting for this, and it can be overridden per-topic. How you retain data is mostly application-dependent. We use a default retention of 4 days, and the reason for this is that in the normal state of affairs, consumers are caught up and reading from the end of the stream. We want enough retention so that if a problem happens with an individual application on the weekend, there is enough time to identify it, figure out what the problem is, resolve it, and catch back up before they fall off the end of their topic. We have certain types of data, such as some of the monitoring, which uses a shorter retention time because the data size is much larger and it gets fixed very quickly if there is ever a problem. We also have topics that are retained for much longer, up to a month, when there is a reason to because of how the application uses the data. The rule of thumb is to never hang on to more data than you really need. There are systems (such as HDFS) which are better designed for long-term storage of data. Partition counts are the tricky calculation. General guidance is to have fewer partitions, not more. This is because more partitions means more log segments, which is more file handles open, and more overhead in the brokers. At the same time, you need to make sure you have enough. There are several ways to look at this, all of which should be taken into account. Balancing over consumers – You must have at least as many partitions as you have consumers in the largest group for a topic. If a topic has 8 partitions, and you have 16 consumer instances, 8 of those consumers will be idle all the time. Balancing over brokers – If your number of topics is not a multiple of the number of brokers in your cluster, the topic cannot be evenly balanced over the brokers. In a cluster with a large number of topics, this is less of a concern because over all the topics you should have a good balance regardless. In cases where you get a dump of messages (high number of messages in a short period of time), balancing over the brokers is very important so you don’t swamp the network. Partition size on disk – This is one of our primary drivers in how we expand topics, as it is a good indication of how busy the topic is. We’ve picked a somewhat arbitrary threshold of 50 GB as the size of a single partition on disk on the brokers. Once a topic exceeds that, we increase the number of partitions (in general). This keeps the log segments of a reasonable size, which is good for recovering a crashed broker, and it also allows us to balance busy topics over more of the cluster. Through all of this, you also need to keep in mind application-specific requirements. You may have an application which is very concerned about message ordering, and only wants a single partition. You may have an application that is using keyed partitioning, and wants a high number of partitions so that they do not need to be expanded at any point (which would change the hashing of keys to partitions). This will often override other concerns. In a multi-tenant environment, the important thing is to have communication with the users, and a way of keeping track of these requirements so they are not forgotten.
In an environment with multiple Kafka clusters, you are often using the mirror maker application to replicate data between them. In addition, because mirror maker has both a consumer and a producer, it’s a useful case to look at when tuning both. If you want more information about using mirror maker for running Kafka clusters in tiers, I encourage you to look at one of my other presentations on multi-tier architectures that goes into more depth on the design and concerns around setting this up. With any consumer or producer, network locality is a big factor in performance. If your client is not in the same network as your Kafka cluster, you will have latency, bandwidth concerns, network partitions, and any number of other problems that you get when you have a lot of network hops in the way. With mirror maker, we need to choose whether we are going to locate it proximate to the cluster we are consuming from or the cluster we are producing too (as we use it most often for inter-datacenter replication). Our choice is always to locate it with the produce cluster. The reason for this is that if there is a problem with the produce side of the mirror maker, it will lose messages and the consumer will be continuing to consume messages and commit offsets. If there is a problem with the consumer, it will just stop. So we choose to put the higher risk of network problems on the consume side, rather than the produce side. With tuning the mirror maker consumer, you will mostly consider how much data you need to consume, and the number of streams. You need to have enough copies of mirror maker in a given pipeline to handle the peak traffic, and mirror maker will not operate at line speed because it needs to decompress and recompress every message batch. This is also why you should run more than one consumer stream in a single mirror maker copy, to take advantage of parallelism to get around some of this inefficiency. You will also want to look at the partition assignment strategy that is used when balancing consumers. There is a strategy available for wildcard consumers called “roundrobin” which provides a much more even balance of partitions than the standard assignment strategy. There are also improvements in the most recent mirror maker code to the speed with which the consumer rebalance is performed. On the producer side, you also should be running multiple streams. Where the consumer is responsible for decompressing message batches, the producer is responsible for compressing them again before sending to Kafka. You will also want to consider the number of in flight requests that are allowed between the producer and the Kafka cluster. A higher number will allow for greater throughput, but it will also introduce a higher risk of loss. When the leadership changes on a partition in the produce cluster, message batches that are in flight will be lost. It is also possible to improve this by changing the acknowledgement configuration on the producer, but this will have other performance concerns. Another parameter to look at is the linger time. The mirror maker producer will flush a batch to the producer based on either the producer reaching the byte size limit for a single batch, or by reaching the linger time. For busy topics, you will be subject to the size limit. For slow topics, you will be subject to the time limit. A higher linger time will allow the producer to assemble more efficient batches, with better compression (and the Kafka broker itself does not decompress and break up batches, so this affects your disk utilization on the brokers). It will also increase the amount of time it takes for messages to get from one cluster to the next. You will need to determine how important these factors are and strike a balance.

Tuning Kafka for Fun and Profit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tuning Kafka for Fun and Profit

Similar to Tuning Kafka for Fun and Profit (20)

More from Todd Palino

More from Todd Palino (9)

Recently uploaded

Recently uploaded (20)

Tuning Kafka for Fun and Profit

Editor's Notes