Can and should Apache Kafka replace a database? How long can and should I store data in Kafka? How can I query and process data in Kafka? These are common questions that come up more and more. This session explains the idea behind databases and different features like storage, queries, transactions, and processing to evaluate when Kafka is a good fit and when it is not.
The discussion includes different Kafka-native add-ons like Tiered Storage for long-term, cost-efficient storage and ksqlDB as event streaming database. The relation and trade-offs between Kafka and other databases are explored to complement each other instead of thinking about a replacement. This includes different options for pull and push-based bi-directional integration.
Key takeaways:
- Kafka can store data forever in a durable and high available manner
- Kafka has different options to query historical data
- Kafka-native add-ons like ksqlDB or Tiered Storage make Kafka more powerful than ever before to store and process data
- Kafka does not provide transactions, but exactly-once semantics
- Kafka is not a replacement for existing databases like MySQL, MongoDB or Elasticsearch
- Kafka and other databases complement each other; the right solution has to be selected for a problem
- Different options are available for bi-directional pull and push-based integration between Kafka and databases to complement each other
Video Recording:
https://youtu.be/7KEkWbwefqQ
Blog post:
https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-acid-storage-transactions-sql-nosql-data-lake/
1. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kafka’s Capabilities and Trade-Offs for Storage, Queries, Processing, Transactions, Connectivity
Kai Waehner
Field CTO
contact@kai-waehner.de
@KaiWaehner
www.confluent.io
www.kai-waehner.de
linkedin.com/in/kaiwaehner
2. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
This was answered [with ‘yes’] a long time ago…
2
https://www.confluent.io/kafka-summit-SF18/is-kafka-a-database/
… and many things changed [= improved] since then!
4. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Agenda
4
1. What is a Database?
2. What is Apache Kafka?
3. Storage in Kafka
4. Queries and Processing in Kafka
5. Transactions in Kafka
6. Connectivity
5. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Agenda
5
1. What is a Database?
2. What is Apache Kafka?
3. Storage in Kafka
4. Queries and Processing in Kafka
5. Transactions in Kafka
6. Connectivity
6. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
What is a Database?
6
Database Concepts
1960s: Navigational DBMS
1970s, Relational DBMS
Late 1970s: SQL DBMS
1980s: On the desktop
1990s: Object-oriented
2000s: NoSQL / NewSQL
2010s: DBaaS
Database Features
Storage
Queries (CRUD)
Processing
Transactions
Backup
Replication
…
9. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Database Examples
9
I thought
Kafka is for
data in motion?
10. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Agenda
10
1. What is a Database?
2. What is Apache Kafka?
3. Storage in Kafka
4. Queries and Processing in Kafka
5. Transactions in Kafka
6. Connectivity
11. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Apache Kafka is a Platform for Data in Motion
MES
ERP
Sensors
Mobile
Customer 360
Real-time
Alerting System
Data warehouse
Producers
Consumers
Streams and storage of real time events
Stream
processing
apps
Connectors
Connectors
Stream
processing
apps
Supplier
Alert
Forecast
Inventory Customer
Order
11
12. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
The Rise of Data in Motion
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2020
80%
Fortune 100
Companies
trust and use
Apache Kafka
13. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
ETL/Data Integration Messaging
Highly Scalable
Durable
Persistent
Ordered
Real-time Difficult to Scale
No Persistence After
Consumption
No Replay
Batch
Expensive
Time Consuming
14. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Highly Scalable
Persistent
ETL/Data Integration Messaging
ETL/Data Integration Messaging
Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Real-time
Highly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
15. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Business
Value
Decrease
Costs
(save money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital replatforming/
Mainframe Offload
Connected Car: Navigation & improved in-
car experience: Audi
Customer 360
Simplifying Omni-channel Retail at Scale:
Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives; LinkedIn,
Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka Streams:
Funding Circle
Detect Fraud & Prevent Fraud in Real Time:
PayPal
Kafka as a Service - A Tale of Security and
Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$↔
Example Case Studies
(of many)
16. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Agenda
16
1. What is a Database?
2. What is Apache Kafka?
3. Storage in Kafka
4. Queries and Processing in Kafka
5. Transactions in Kafka
6. Connectivity
17. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Kafka’s Distributed Commit Log is the Storage
(and enables real decoupling and domain-driven design)
17
https://www.confluent.io/blog/microservices-apache-kafka-domain-driven-design/
18. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Kafka Stores Your Data Durably.
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Kafka is the source of truth.
Powers NYTimes.com, and stores
all articles ever published since 1851.
September 30, 1851, Page 1
Kafka is the leading system.
Account Activity Replay API to recover events
that weren’t delivered for various reasons
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/kafka-as-a-storage-system.html
19. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Confluent Tiered Storage for Kafka
19
(Only available in Confluent Platform)
Store data forever
Hot and cold storage
Cheap object store
Easy scale up/down
No changes in clients
20. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Tiered Storage for Apache Kafka
KIP-405 –
Add Tiered Storage Support to Kafka
Confluent is actively working on this with the
open source community -
Uber is leading this initiative
Confluent Tiered Storage is available today in
Confluent Platform and used under the hood in
Confluent Cloud
https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
21. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Log Compaction with Compacted Topics
21
Retain last known value
for each message key
No retention time
22. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Stateful Client Applications
Kafka Streams and ksqlDB embed RocksDB
22
Do I really need
another database for
my microservice?
streams
23. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Kafka as Single Source of Truth
23
The Leading System is
Real-Time and Scalable
Real Decoupling
Handling Slow Consumers
24. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Agenda
24
1. What is a Database?
2. What is Apache Kafka?
3. Storage in Kafka
4. Queries and Processing in Kafka
5. Transactions in Kafka
6. Connectivity
26. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Query and Event Processing in Kafka
26
PUSH à Continuously process and forward events
PULL à Client requests events (like you know it from your favourite database)
27. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
ksqlDB - The Event Streaming Database
27
-- Continuously look up data in a table; query keeps running
SELECT * FROM myTable WHERE ... EMIT CHANGES
-- Continuously look up data in a stream; query keeps running
SELECT * FROM myStream WHERE ... EMIT CHANGES
-- Look up data in a table once; query then terminates
SELECT * FROM myTable WHERE ...
app
app
28. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
ksqlDB - The Event Streaming Database
• Project created by Confluent, source-available license: https://ksqldb.io/
• A ksqlDB cluster runs in a distributed manner across many server nodes
• Tightly integrates with Apache Kafka® as its persistent storage layer
• Has projections, transformations, aggregations, windowing, joins, etc.
• Distinguishes between event-time and processing-time
• Handles out-of-order and late data
• Streaming import-export for external data systems
• DDL and DML via SQL-like statements
• Security features like role-based access control
• Run it yourself or use SaaS offering in Confluent Cloud
28
30. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Queries through the Kafka Consumer
30
• Continuous consumption of the latest events (in real time or batch)
• Just specific time frames or partitions
• All data from the beginning
connect
Cluster Linking
REST Proxy
31. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Queries for Reprocessing Historical Events
Give me all events from time A to time B
Real-time Producer
Time
• New consumer application
• Error-handling
• Compliance / regulatory processing
• Query and analyze existing events
• Schema changes in analytics platform
• Model training
Real-time Consumer
Consumer of Historical
Data
32. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Interactive Queries
Query values from the client applications’ state store
Optional Proxy (e.g. HTTP or WebSockets)
Limitation: Only Key/Value, no complex queries
32
streams
33. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
ANSI SQL Queries against the Kafka Log
3rd Party Add-Ons help
Integration with any Business Intelligence Tool
33
https://www.confluent.io/blog/analytics-with-apache-kafka-and-rockset/
34. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Agenda
34
1. What is a Database?
2. What is Apache Kafka?
3. Storage in Kafka
4. Queries and Processing in Kafka
5. Transactions in Kafka
6. Connectivity
35. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Exactly-Once Semantics (EOS) in Kafka
No Two-Phase-Commit (because that does not scale)
Idempotent Producer and Transactions API
Supported by the whole Kafka Ecosystem (not just Messaging)
35
36. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Transaction API in Apache Kafka
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
https://www.confluent.io/kafka-summit-london18/dont-repeat-yourself-introducing-exactly-once-semantics-in-apache-kafka/
37. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
From the Mainframe to ksqlDB in the Cloud
Bi-Directional End-to-End Referential Integrity
ksqlDB
App
CICS
Mainframe
Transactions
Bi-Directional Integration
Secured Referential Integrity
End-to-End “Transactions”
Low Latency
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Kafka
Exactly-Once
Semantics
using librdkafka
IMS
DB
Cobol App
Kafka
Exactly-Once
Semantics
using ksqlDB
38. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Agenda
38
1. What is a Database?
2. What is Apache Kafka?
3. Storage in Kafka
4. Queries and Processing in Kafka
5. Transactions in Kafka
6. Connectivity
39. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Kafka Connect
Integration between Databases, Applications, APIs, SaaS
Kafka-native (no other middleware required)
Sources and Sinks
Legacy and Modern
Real-Time and Batch
39
40. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Turn the Database Inside Out!
Materialized Views
Integration with any Database
40
https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
41. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Global Event Streaming
Streaming Replication between Kafka Clusters
Bridge to Databases, Data Lakes, Apps, APIs, SaaS
Aggregate Small Footprint
Edge Deployments with
Replication (Aggregation)
Simplify Disaster Recovery
Operations with
Multi-Region Clusters
with RPO=0 and RTO=0
Stream Data Globally with
Replication and Cluster Linking
41
42. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Yes.
But it does not replace
other databases!
Can
replace a
Database?
43. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
TL;DR
• Kafka can store data forever in a durable and high available manner providing ACID
guarantees
• Different options to query historical data are available in Kafka
• Kafka-native add-ons like ksqlDB or Tiered Storage make Kafka more powerful than ever
before for processing data in motion and event-based long-term storage
• Stateful applications can be built leveraging Kafka clients (microservices, business
applications) without the need for another external database
• Not a replacement for existing databases like MySQL, MongoDB, Elasticsearch or
Hadoop
• Other databases and Kafka complement each other; the right solution has to be
selected for a problem; often purpose-built materialized views are created and updated
in real time from the central event-based infrastructure
• Different options are available for bi-directional pull and push based
integration between Kafka and databases to complement each other
44. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database?
Kai Waehner
Field CTO
contact@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
www.confluent.io
linkedin.com/in/kaiwaehner
Questions? Feedback?
Let’s connect!