2. Growing Pains for A Kafka Cluster
● A few brokers, handful topics, tens of partitions
○ Wonderful!
● Tens of brokers, tens of topics, hundreds of
partitions
○ Life is good!
3. ● A hundred brokers, a hundred topics, thousands of
partitions
○ … OK
● Hundreds of brokers, hundreds of topics, one
hundred thousand partitions
○ ???
4. Why Huge Kafka Cluster Does Not Work
● Significant time increase on operations
○ Rolling binary update
■ Three minutes per broker, 500 brokers = 1 whole day
○ Rolling AMI (image) update with data copying
■ One hour per broker, 500 brokers = 20 days
5. ● Increased latency due to number of partitions
○ https://www.confluent.io/blog/how-to-choose-the-number
-of-topicspartitions-in-a-kafka-cluster/
● Vulnerability to ZK/Controller failures
6. Scaling and Data Balancing Challenge
● The problem with partition reassignment
○ Time consuming
○ Replication traffic taking bandwidth
○ Complexity of bin packing for data balancing
11. The Idea
● Create many small and mostly “immutable”
clusters
● Organize them in a topology with routing service
connecting the clusters
12. Multi-Cluster Kafka Service At Netflix
Router
(w/ simple ETL)
Fronting
Kafka
Event
Producer
Consumer
Kafka
Management
HTTP
PROXY
Consumers
13. Multi-cluster Producers
● Support producing to multiple clusters at the same
time
● High level producer API implemented by multiple
embedded Kafka producers
public interface KsProducer<V> {
// ...
<T extends V> CompletableFuture<SendResult> send(T obj)
}
15. @Stream("foo") // send to topic “foo”
public class Foo {
// ...
}
@Stream("bar") // send to topic “bar”
public class Bar {
// ...
}
KsProducer<Object> producer = // …
producer.send(new Foo()); // Send to Kafka cluster which has “foo” topic
producer.send(new Bar()); // Send to Kafka cluster which has “bar” topic
16. Fronting Kafka
● For data collection and buffering
● Optimized for producers
○ Only consumers are routers
17. Scaling of Fronting Kafka
● Creating / destroying Kafka clusters
○ E.g., create new topic on new clusters and update topic to
cluster mapping
● No partition reassignment
18. Data Balancing
● Assign the same number of partitions of any topic
to every brokers
○ E.g., for clusters of 12 brokers, create topics with partitions
of 12, 24, 36
○ Guaranteed even distribution of data (aside from
occasional leader imbalance)
● Balance data among clusters by moving topics
○ Must dynamically update topic to cluster mapping
20. Consumer Kafka
● Scaling
○ Add brokers and partitions for small cluster for non-keyed
topics
○ Create same topics on a new cluster and move consumers
21. Future Plan
● Cross-cluster topic
○ load sharing beyond single cluster
○ Auto-scale
○ Consumer/producer support needed
22. Multi-Cluster Consumer (Ongoing work)
● Same Kafka consumer interface
● Consume from multiple clusters with dynamic
topic to cluster mapping
○ Keep subscription state
○ Receive mapping updates
○ Create and delegate to underlying Kafka consumer for each
associated cluster on the fly
23. Multi-Cluster Consumer Topic to Cluster Mapping and
Code Example
{
"foo": [
{"vip": "cluster1"},
{"vip": "cluster2"}
],
“bar”: [
{“vip”: “cluster2”}
]
}
// Create a multi-cluster consumer
Consumer<String, String> multiClusterConsumer = ...
// subscribe as usual and keep subscription state
consumer.subscribe(new ArrayList<String>(“foo”));
while (...) {
// fetch from both clusters for topic “foo” and
// return the aggregated records
ConsumerRecords<String, String> records =
multiClusterConsumer.poll(2000);
process(records);
}
26. What About Keyed Messages
● Few topics requiring keyed messages in Netflix
● A word of caution for keyed messages
○ Inflexible/skewed load balancing
○ Difficult to scale
● Handling of keyed messages
○ Currently only produced by routers to consumer Kafka
○ Hard to guarantee message ordering in multi-cluster setting
○ Key-consumer affinity is guaranteed
27. Think Differently on Scaling Kafka
The “broker” way The “cluster” way
Scale up Add brokers Add clusters
Data balance Move partitions to
different brokers
Move/expand topics to
different clusters
Producer Produce to different
brokers at the same time
Produce to different clusters at
the same time
Consumer Consume from different
brokers at the same time
Consume from different
clusters at the same time