2. Agenda
● Kafka Overview
● Kafka 101
● Best Practices for Writing to Kafka: A tour of the
Producer
● Best Practices for Reading from Kafka: The
Consumer
● General Considerations
6. 6
ETL/Data Integration
Batch
Expensive
Time Consuming
Messaging
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Transient Messages
Stored records
Both of these are a complete mismatch
to how your business works.
7. 7
ETL/Data Integration Messaging
Transient Messages
Stored records
ETL/Data Integration Messaging
Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Event Streaming Paradigm
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
8. 8
Fast (Low Latency)
Event Streaming Paradigm
To rethink data as not stored records
or transient messages, but instead as
a continually updating stream of events
10. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
...
Device
Logs ... ...
...
Data Stores Logs 3rd Party Apps Custom Apps / Microservices
Real-time
Customer 360
Financial Fraud
Detection
Real-time
Risk Analytics
Real-time
Payments
Machine
Learning
Models
...
Event-Streaming Applications
Universal Event Pipeline
Amazon
S3
SaaS
apps
Confluent: Central Nervous System For Enterprise
11. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Confluent uniquely
enables Event Streaming
success
Hall of Innovation
CTO Innovation
Award Winner
2019
Enterprise Technology
Innovation
AWARDS
Confluent founders are
original creators of Kafka
Confluent team wrote 80%
of Kafka commits and has
over 1M hours technical
experience with Kafka
Confluent helps enterprises
successfully deploy event
streaming at scale and
accelerate time to market
Confluent Platform extends
Apache Kafka to be a
secure, enterprise-ready
platform
13. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
17. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka Topics
my-topic
my-topic-partition-0
my-topic-partition-1
my-topic-partition-2
broker-1
broker-2
broker-3
18. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Creating a topic
$ kafka-topics --zookeeper zk:2181
--create
--topic my-topic
--replication-factor 3
--partitions 3
Or use the AdminClient API!
19. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producing to Kafka
Time
20. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producing to Kafka
Time
C C
C
21. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka’s distributed nature
Broker 1
Topic-1
partition-1
Broker 2 Broker 3 Broker 4
Topic-1
partition-1
Topic-1
partition-1
Leader Follower
Topic-1
partition-2
Topic-1
partition-2
Topic-1
partition-2
Topic-1
partition-3
Topic-1
partition-4
Topic-1
partition-3
Topic-1
partition-3
Topic-1
partition-4
Topic-1
partition-4
22. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka’s distributed nature
Broker 1
Topic-1
partition-1
Broker 2 Broker 3 Broker 4
Topic-1
partition-1
Topic-1
partition-1
Leader Follower
Topic-1
partition-2
Topic-1
partition-2
Topic-1
partition-2
Topic-1
partition-3
Topic-1
partition-4
Topic-1
partition-3
Topic-1
partition-3
Topic-1
partition-4
Topic-1
partition-4
24. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer
Clients - Producer Design
Producer Record
Topic
[Partition]
[Key]
Value
Serializer Partitioner
Topic A
Partition 0
Batch 0
Batch 1
Batch 2
Topic B
Partition 1
Batch 0
Batch 1
Batch 2
Kafka
Broker
Send()
Retry
?
Fail
?
Yes
No
Can’t retry, throw
exception
Success: return
metadata
Yes
25. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
The Serializer
Kafka doesn’t care about what you send to it as long as
it’s been converted to a byte stream beforehand.
JSON
CSV
Avro
Protobufs
XML
SERIALIZERS
01001010 01010011 01001111 01001110
01000011 01010011 01010110
01001010 01010011 01001111 01001110
01010000 01110010 01101111 01110100 ...
01011000 01001101 01001100
(if you must)
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
26. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
The Serializer
private Properties kafkaProps = new Properties();
kafkaProps.put(“bootstrap.servers”, “broker1:9092,broker2:9092”);
kafkaProps.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
kafkaProps.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
kafkaProps.put("schema.registry.url", "https://schema-registry:8083");
producer = new KafkaProducer<String, SpecificRecord>(kafkaProps);
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
27. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Record
Topic
[Partition]
[Key]
Value
Record keys determine the partition with the default kafka
partitioner
If a key isn’t provided, messages will be
produced in a round robin fashion
partitioner
Record Keys and why they’re important -
Ordering
28. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
Record Keys and why they’re important -
Ordering
Producer Record
Topic
[Partition]
AAA
Value
partitioner
Record keys determine the partition with the default kafka
partitioner
29. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
Record Keys and why they’re important -
Ordering
Producer Record
Topic
[Partition]
BBB
Value
partitioner
Record keys determine the partition with the default kafka
partitioner
30. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
Record Keys and why they’re important -
Ordering
Producer Record
Topic
[Partition]
CCC
Value
partitioner
Record keys determine the partition with the default kafka
partitioner
31. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
Record Keys and why they’re important -
Ordering
Producer Record
Topic
[Partition]
DDD
Value
partitioner
Record keys determine the partition with the default kafka
partitioner
32. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Record Keys and why they’re important - Key
Cardinality
Consumers
Key cardinality affects the amount
of work done by the individual
consumers in a group. Poor key
choice can lead to uneven
workloads.
Keys in Kafka don’t have to be
primitives, like strings or ints. Like
values, they can be be anything:
JSON, Avro, etc… So create a key
that will evenly distribute groups of
records around the partitions.
Car·di·nal·i·ty
/ˌkärdəˈnalədē/
Noun
the number of elements in a set or other grouping, as a property of that grouping.
33. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“Zip”: “19101”
}
You don’t have to but... use a Schema!
Data
Producer
Service
Data
Consumer
Service
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“City”: “Philadelphia”,
“State”: “PA”,
“Zip”: “19101”
}
send JSON
“Where’s record.City?”
Reference
https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you
-really-need-one/
34. Schema Registry: Make Data Backwards Compatible and Future-Proof
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible
changes
● Support multi-data center environments
Elastic
Cassandra
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic
!
Schema
Registry
Open Source Feature
35. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“Zip”: “19101”,
“City”: “NA”,
“State”: “NA”
}
Avro allows for evolution of schemas
Data
Producer
Service
Data
Consumer
Service
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“City”: “Philadelphia”,
“State”: “PA”,
“Zip”: “19101”
}
send AvroRecord
Schema
Registry
Version 1
Version 2
Reference
https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you
-really-need-one/
36. Developing with Confluent Schema Registry
We provide several Maven plugins for developing with
the Confluent Schema Registry
● download - download a subject’s schema to
your project
● register - register a new schema to the
schema registry from your development env
● test-compatibility - test changes made to
a schema against compatibility rules set by the
schema registry
Reference
https://docs.confluent.io/current/schema-registry/docs/maven-plugin.html
<plugin>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-maven-plug
<version>5.0.0</version>
<configuration>
<schemaRegistryUrls>
<param>http://192.168.99.100:808
</schemaRegistryUrls>
<outputDirectory>src/main/avro</outp
<subjectPatterns>
<param>^TestSubject000-(key|valu
</subjectPatterns>
</configuration>
</plugin>
37. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Use Kafka’s Headers
Reference
https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers
Producer Record
Topic
[Partition]
[Timestamp]
Value
[Headers]
[Key]
Kafka Headers are simply an interface that requires a key of type
String, and a value of type byte[], the headers are stored in an
iterator in the ProducerRecord .
Example Use Cases
● Data lineage: reference previous topic partition/offsets
● Producing host/application/owner
● Message routing
● Encryption metadata (which key pair was this message
payload encrypted with?)
38. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=0
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
39. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
ack
Producer Properties
acks=1
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
40. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
ack
41. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
ack
42. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees - without exactly once
guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
{key: 1234 data: abcd} - offset 3345
Failed ack
Successful write
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
43. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees - without exactly once
guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
{key: 1234, data: abcd} - offset 3345
{key: 1234, data: abcd} - offset 3346
retry
ack
dupe!
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
44. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees - with exactly once
guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
enable.idempotence=true
max.inflight.requests.per.connection=5
acks = “all”
retries > 0 (preferably MAX_INT)
(pid, seq) [payload]
(100, 1) {key: 1234, data: abcd} - offset 3345
(100, 1) {key: 1234, data: abcd} - rejected, ack re-sent
(100, 2) {key: 5678, data: efgh} - offset 3346
retry
ack
no dupe!
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
47. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A basic Java Consumer
final Consumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
-- Do Some Work --
}
}
} finally {
consumer.close();
}
}
48. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Single Consumer
C
49. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
C
C
C1
C
C
C2
50. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
C C
C C
51. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
0 1
2 3
52. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
0 1
2 3
53. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
0, 3 1
2 3
54. Kafka’s Interceptors
ProducerInterceptor
onSend(ProducerRecord<K, V> record)
Returns ProducerRecord<K, V> . Called from
send() before key and value get serialized and
partition is assigned. This method is allowed to
modify the record.
onAcknowledgement(RecordMetadata metadata,
java.lang.Exception exception)
This method is called when the record sent to the
server has been acknowledged, or when sending
the record fails before it gets sent to the server.
Used for observability and reporting.
ConsumerInterceptor
onConsume(ConsumerRecords<K,V> records)
Called just before the records are returned by
KafkaConsumer.poll()
This method is allowed to modify consumer
records, in which case the new records will be
returned.
onCommit(Map<TopicPartition,OffsetAndMetada
ta> offsets)
This is called when offsets get committed.
Used for observability and reporting
Reference
https://kafka.apache.org/20/javadoc/org/apache/kafka/clien
ts/producer/ProducerInterceptor.html
Reference
https://kafka.apache.org/20/javadoc/org/apache/kafka/clien
ts/consumer/ConsumerInterceptor.html
55. Should I pool connections?
NO!
Since Kafka connections are long-lived, there is no reason to
pool connections. It’s common to keep one connection per
thread.
56. Use a good client!
Clients
● Java/Scala - default clients, comes with Kafka
● C/C++ - https://github.com/edenhill/librdkafka
● C#/.Net - https://github.com/confluentinc/confluent-kafka-dotnet
● Python - https://github.com/confluentinc/confluent-kafka-python
● Golang - https://github.com/confluentinc/confluent-kafka-go
● Node/JavaScript - https://github.com/Blizzard/node-rdkafka (not supported by Confluent!)
New Kafka features will only be available to modern, updated clients!
57. Resources
Free E-Books from Confluent!
I Heart Logs:
https://www.confluent.io/ebook/i-heart-logs-event-data-stream-processing-and-data-integration/
Kafka: The Definitive Guide: https://www.confluent.io/resources/kafka-the-definitive-guide/
Designing Event Driven Systems:
https://www.confluent.io/designing-event-driven-systems/
Confluent Blog: https://www.confluent.io/blog
Thank You!