SlideShare a Scribd company logo
1 of 25
Download to read offline
Apache Kafka:
New Features That You
Might Not Know About
Yaroslav Tkachenko
Software Architect at Activision
Apache Kafka Versions
0.10.0.0 0.11.0.0 1.0.0 2.0.0
0.10.0.1
0.10.2.2
0.11.0.1
0.11.0.3
1.0.1
1.1.1
2.0.1
2.1.0
May 2016 June 2017 November 2017 July 2018
0.11: New Message Format
Record Batch:
...
magic: 2
…
attributes:
…
bit 4: isTransactional
…
producerId: int64
producerEpoch: int16
records: [Record]
Message Format v2
Record:
...
key: byte[]
value: byte[]
headers: [Header]
Header:
...
headerKey: String
value: byte[]
0.11: Headers
Message Headers
public interface Header {
String key();
byte[] value();
}
List<Header> headers = Arrays.asList(
new RecordHeader("hkey1", "hvalue1".getBytes()),
new RecordHeader("hkey2", "hvalue2".getBytes())
);
new ProducerRecord<>("topic", 0, "key", "value", headers);
Pros
• No need to deserialize the whole
message payload for routing /
filtering use-cases
Cons
• Harder to save the headers
together with the payload when
archiving, persisting to data
stores or integrating with 3rd
party systems
Message Headers
Message Headers
0.11: Transactions
Transactions
• Atomic writes to multiple Kafka topics and partitions
• Offset commits happen in the same transaction
• transactional.id + epoch for every producer
• Consumers must use “read_committed” isolation level for consuming
only committed transactional data
Transactions
KafkaProducer producer = ...
producer.initTransactions();
KafkaConsumer consumer = ...
consumer.subscribe("inputTopic"));
ConsumerRecords records = consumer.poll(Long.MAX_VALUE);
try {
producer.beginTransaction();
for (ConsumerRecord record : records) {
producer.send(processAndProduceRecord("outputTopic", record));
}
producer.sendOffsetsToTransaction(currentOffsets(consumer), groupId);
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
}
Transactions
Transactions
In practice, for a producer producing 1KB records at maximum
throughput, committing messages every 100ms results in only a
3% degradation in throughput.
https://www.confluent.io/blog/transactions-apache-kafka/
0.11: Exactly-Once
Delivery
Exactly-Once: Why
is it so Hard?
At most once
• May or may not be
received
• No duplicates
• Probably missing
data
Delivery Guarantees
Exactly once
• Delivery guaranteed
• No duplicates
• No missing data
At least once
• Delivery guaranteed
• Possible duplicates
• No missing data
Idempotent producer writesTransactions API
Atomic writes and reads
Transactions Idempotence
Idempotence
• Unique producer ID is assigned to each producer
• Monotonically increasing sequence number is generated for every
topic/partition write
• Broker persists and validates sequence numbers:
• lower number → duplicate, reject
• higher number → out-of-sequence error, reject
• exactly one greater than the last → allow
Enabling Exactly-Once in
Kafka Streams?
Just set “processing.guarantee” to “exactly_once”. That’s it!
Don’t need to think about checkpointing and related challenges (like in some
other frameworks...)
1.1: Controller
Improvements
Controller Improvements
• One Controller per cluster
• Responsible for state management of partitions and replicas
• Communicates with Zookeeper
Updating partition leaders in batches
during the controlled shutdown
Zookeeper Asynchronous API is used
during the controlled shutdown and
controller failover
Controlled shutdown time: 3 seconds
Updating partition leaders one by
one, sequentially during the
controlled shutdown
Zookeeper Synchronous API is
used during the controlled
shutdown and controller failover
Controlled shutdown time: 6.5
minutes
Before 1.1.0 After 1.1.0
2.0: Kafka Streams
Improvements
Kafka Streams Improvements
• Message header support in the Processor API
• TopicNameExtractor for dynamic routing
• kafka-streams-testutil helper for unit-testing
• Scala wrapper for the Streams DSL
Thanks!
@sap1ens

More Related Content

What's hot

A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetesDr Ganesh Iyer
 
Jenkins vs GitLab CI
Jenkins vs GitLab CIJenkins vs GitLab CI
Jenkins vs GitLab CICEE-SEC(R)
 
Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...HostedbyConfluent
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOSAkihiro Suda
 
Connecting the Dots: Kong for GraphQL Endpoints
Connecting the Dots: Kong for GraphQL EndpointsConnecting the Dots: Kong for GraphQL Endpoints
Connecting the Dots: Kong for GraphQL EndpointsJulien Bataillé
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
VMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes ConnectVMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes ConnectVMware Tanzu
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...HostedbyConfluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets Janakiram MSV
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
Introduction to EKS and eksctl
Introduction to EKS and eksctlIntroduction to EKS and eksctl
Introduction to EKS and eksctlWeaveworks
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 

What's hot (20)

Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetes
 
Jenkins vs GitLab CI
Jenkins vs GitLab CIJenkins vs GitLab CI
Jenkins vs GitLab CI
 
Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...Getting up to speed with Kafka Connect: from the basics to the latest feature...
Getting up to speed with Kafka Connect: from the basics to the latest feature...
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS
 
Connecting the Dots: Kong for GraphQL Endpoints
Connecting the Dots: Kong for GraphQL EndpointsConnecting the Dots: Kong for GraphQL Endpoints
Connecting the Dots: Kong for GraphQL Endpoints
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
VMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes ConnectVMware Tanzu Kubernetes Connect
VMware Tanzu Kubernetes Connect
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Introduction to EKS and eksctl
Introduction to EKS and eksctlIntroduction to EKS and eksctl
Introduction to EKS and eksctl
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 

Similar to Apache Kafka: New Features That You Might Not Know About

Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Yuta Iwama
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupSnehal Nagmote
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka TLV
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation
 
AWS Lambda and the Serverless Cloud -Pop-up Loft
AWS Lambda and the Serverless Cloud -Pop-up LoftAWS Lambda and the Serverless Cloud -Pop-up Loft
AWS Lambda and the Serverless Cloud -Pop-up LoftAmazon Web Services
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用LINE Corporation
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafkaconfluent
 
Singer, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging InfrastructureSinger, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging InfrastructureDiscover Pinterest
 
AWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless CloudAWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless CloudAmazon Web Services
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormP. Taylor Goetz
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandRan Silberman
 
Scaling big with Apache Kafka
Scaling big with Apache KafkaScaling big with Apache Kafka
Scaling big with Apache KafkaNikolay Stoitsev
 

Similar to Apache Kafka: New Features That You Might Not Know About (20)

Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
Kafka overview v0.1
Kafka overview v0.1Kafka overview v0.1
Kafka overview v0.1
 
Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016Treasure Data Summer Internship 2016
Treasure Data Summer Internship 2016
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
AWS Lambda and the Serverless Cloud -Pop-up Loft
AWS Lambda and the Serverless Cloud -Pop-up LoftAWS Lambda and the Serverless Cloud -Pop-up Loft
AWS Lambda and the Serverless Cloud -Pop-up Loft
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Singer, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging InfrastructureSinger, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging Infrastructure
 
AWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless CloudAWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless Cloud
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Past, Present, and Future of Apache Storm
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
Scaling big with Apache Kafka
Scaling big with Apache KafkaScaling big with Apache Kafka
Scaling big with Apache Kafka
 

More from Yaroslav Tkachenko

Dynamic Change Data Capture with Flink CDC and Consistent Hashing
Dynamic Change Data Capture with Flink CDC and Consistent HashingDynamic Change Data Capture with Flink CDC and Consistent Hashing
Dynamic Change Data Capture with Flink CDC and Consistent HashingYaroslav Tkachenko
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Yaroslav Tkachenko
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureYaroslav Tkachenko
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Yaroslav Tkachenko
 
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesDesigning Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesYaroslav Tkachenko
 
10 tips for making Bash a sane programming language
10 tips for making Bash a sane programming language10 tips for making Bash a sane programming language
10 tips for making Bash a sane programming languageYaroslav Tkachenko
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesYaroslav Tkachenko
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 
Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With AkkaYaroslav Tkachenko
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaYaroslav Tkachenko
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And DesignYaroslav Tkachenko
 
Why Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For MicroservicesWhy Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For MicroservicesYaroslav Tkachenko
 
Why actor-based systems are the best for microservices
Why actor-based systems are the best for microservicesWhy actor-based systems are the best for microservices
Why actor-based systems are the best for microservicesYaroslav Tkachenko
 
Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture  Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture Yaroslav Tkachenko
 
Быстрая и безболезненная разработка клиентской части веб-приложений
Быстрая и безболезненная разработка клиентской части веб-приложенийБыстрая и безболезненная разработка клиентской части веб-приложений
Быстрая и безболезненная разработка клиентской части веб-приложенийYaroslav Tkachenko
 

More from Yaroslav Tkachenko (18)

Dynamic Change Data Capture with Flink CDC and Consistent Hashing
Dynamic Change Data Capture with Flink CDC and Consistent HashingDynamic Change Data Capture with Flink CDC and Consistent Hashing
Dynamic Change Data Capture with Flink CDC and Consistent Hashing
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
It's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda ArchitectureIt's Time To Stop Using Lambda Architecture
It's Time To Stop Using Lambda Architecture
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
 
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesDesigning Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
 
10 tips for making Bash a sane programming language
10 tips for making Bash a sane programming language10 tips for making Bash a sane programming language
10 tips for making Bash a sane programming language
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Building Stateful Microservices With Akka
Building Stateful Microservices With AkkaBuilding Stateful Microservices With Akka
Building Stateful Microservices With Akka
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Why Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For MicroservicesWhy Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For Microservices
 
Why actor-based systems are the best for microservices
Why actor-based systems are the best for microservicesWhy actor-based systems are the best for microservices
Why actor-based systems are the best for microservices
 
Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture  Building Eventing Systems for Microservice Architecture
Building Eventing Systems for Microservice Architecture
 
Быстрая и безболезненная разработка клиентской части веб-приложений
Быстрая и безболезненная разработка клиентской части веб-приложенийБыстрая и безболезненная разработка клиентской части веб-приложений
Быстрая и безболезненная разработка клиентской части веб-приложений
 

Recently uploaded

What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 

Recently uploaded (20)

What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 

Apache Kafka: New Features That You Might Not Know About

  • 1. Apache Kafka: New Features That You Might Not Know About Yaroslav Tkachenko Software Architect at Activision
  • 2. Apache Kafka Versions 0.10.0.0 0.11.0.0 1.0.0 2.0.0 0.10.0.1 0.10.2.2 0.11.0.1 0.11.0.3 1.0.1 1.1.1 2.0.1 2.1.0 May 2016 June 2017 November 2017 July 2018
  • 4. Record Batch: ... magic: 2 … attributes: … bit 4: isTransactional … producerId: int64 producerEpoch: int16 records: [Record] Message Format v2 Record: ... key: byte[] value: byte[] headers: [Header] Header: ... headerKey: String value: byte[]
  • 6. Message Headers public interface Header { String key(); byte[] value(); } List<Header> headers = Arrays.asList( new RecordHeader("hkey1", "hvalue1".getBytes()), new RecordHeader("hkey2", "hvalue2".getBytes()) ); new ProducerRecord<>("topic", 0, "key", "value", headers);
  • 7. Pros • No need to deserialize the whole message payload for routing / filtering use-cases Cons • Harder to save the headers together with the payload when archiving, persisting to data stores or integrating with 3rd party systems Message Headers
  • 10. Transactions • Atomic writes to multiple Kafka topics and partitions • Offset commits happen in the same transaction • transactional.id + epoch for every producer • Consumers must use “read_committed” isolation level for consuming only committed transactional data
  • 11. Transactions KafkaProducer producer = ... producer.initTransactions(); KafkaConsumer consumer = ... consumer.subscribe("inputTopic")); ConsumerRecords records = consumer.poll(Long.MAX_VALUE); try { producer.beginTransaction(); for (ConsumerRecord record : records) { producer.send(processAndProduceRecord("outputTopic", record)); } producer.sendOffsetsToTransaction(currentOffsets(consumer), groupId); producer.commitTransaction(); } catch (Exception e) { producer.abortTransaction(); }
  • 13. Transactions In practice, for a producer producing 1KB records at maximum throughput, committing messages every 100ms results in only a 3% degradation in throughput. https://www.confluent.io/blog/transactions-apache-kafka/
  • 16. At most once • May or may not be received • No duplicates • Probably missing data Delivery Guarantees Exactly once • Delivery guaranteed • No duplicates • No missing data At least once • Delivery guaranteed • Possible duplicates • No missing data
  • 17. Idempotent producer writesTransactions API Atomic writes and reads Transactions Idempotence
  • 18. Idempotence • Unique producer ID is assigned to each producer • Monotonically increasing sequence number is generated for every topic/partition write • Broker persists and validates sequence numbers: • lower number → duplicate, reject • higher number → out-of-sequence error, reject • exactly one greater than the last → allow
  • 19. Enabling Exactly-Once in Kafka Streams? Just set “processing.guarantee” to “exactly_once”. That’s it! Don’t need to think about checkpointing and related challenges (like in some other frameworks...)
  • 21. Controller Improvements • One Controller per cluster • Responsible for state management of partitions and replicas • Communicates with Zookeeper
  • 22. Updating partition leaders in batches during the controlled shutdown Zookeeper Asynchronous API is used during the controlled shutdown and controller failover Controlled shutdown time: 3 seconds Updating partition leaders one by one, sequentially during the controlled shutdown Zookeeper Synchronous API is used during the controlled shutdown and controller failover Controlled shutdown time: 6.5 minutes Before 1.1.0 After 1.1.0
  • 24. Kafka Streams Improvements • Message header support in the Processor API • TopicNameExtractor for dynamic routing • kafka-streams-testutil helper for unit-testing • Scala wrapper for the Streams DSL