Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
1. 1C O N F I D E N T I A L
Stream Processing with Confluent
Kafka Streams and KSQL
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
4. 4C O N F I D E N T I A L
Ubiquitous connectivity
Globally scalable platform for all
event producers and consumers
Immediate data access
Data accessible to all
consumers in real time
Single system of record
Persistent storage to enable
reprocessing of past events
Continuous queries
Stream processing capabilities
for in-line data transformation
Microservice
s
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experience
s
Streams of real time events
Stream processing appsStream processing apps Stream processing apps
A Streaming Platform is the Underpinning
of an Event-driven Architecture
6. 6C O N F I D E N T I A L
The beginning of a new Era
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
The first use case. This is why Kafka was created!
7. 7C O N F I D E N T I A L
● Global-scale
● Real-time
● Persistent Storage
● Stream Processing
Apache Kafka: The De-facto Standard for Real-Time Event Streaming
Edge
Cloud
Data LakeDatabases
Datacenter
IoT
SaaS AppsMobile
Microservices Machine
Learning
Apache Kafka
8. 8C O N F I D E N T I A L
Apache Kafka at Scale at Tech Giants
> 4.5 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big data
9. 9C O N F I D E N T I A L
Confluents Business Value per Use Case
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Business
Value
Decrease
Costs
(save
money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved
in-car experience: Audi
Customer 360
Simplifying Omni-channel Retail at
Scale: Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka
Streams: Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security
and Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$
Example Case Studies
(of many)
10. 10C O N F I D E N T I A L
A Modern, Distributed Platform for
Data Streams.
Messaging + Storage + Processing!
11. 11C O N F I D E N T I A L
Stream Processing
processing-time
event-time
windowing
alice
bob
dave
12. 12C O N F I D E N T I A L
Confluent Delivers a Mission-Critical Event Streaming Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Enterprise Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Kubernetes Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME APPLICATIONS
COMMUNITY FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
Confluent Platform
Management & Monitoring
Control Center | Security
Development & Connectivity
Clients | Connectors | REST Proxy | KSQL
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
13. 13C O N F I D E N T I A L
What We Cover Today:
streams
The streaming SQL engine
for Apache Kafka® to write
real-time applications in SQL
Apache Kafka® library to write
real-time applications and
microservices in Java, Scala
KSQL
10
14. 14C O N F I D E N T I A L
Lower the bar to enter the world of streaming
User Population
CodingSophistication
Core developers who use Java/Scala
Core developers who don’t use Java/Scala
Data engineers, architects, DevOps/SRE
BI analysts
streams
15. 15C O N F I D E N T I A L
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8
Lower the bar to enter the world of streaming
vs.
KSQL
streams
16. 16C O N F I D E N T I A L
Confluent
● Kafka Streams and
KSQL for stream
processing
● Lower-level Kafka
Producer and Kafka
Consumer clients
for multiple
languages
Java C/C++ Go
Python.NET JMS
Kafka Streams KSQL
Java/Scala Streaming SQL
17. 17C O N F I D E N T I A L
Microservices
Example use cases
Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
18. 18C O N F I D E N T I A L
Similarities and Differences
of KSQL and Kafka Streams
19. 19C O N F I D E N T I A L
Similarities
Enterprise
support
All you need is
Kafka
Run everywhere Elastic, scalable,
fault-tolerant
Kafka security
integration
Powerful
processing
Supports
streams & tables
Exactly-once
processing
Event-time
processing And more!
1 2 3 4 5
6 7 8 9 ...
streams
KSQL
20. 20C O N F I D E N T I A L
Similarities
KSQL
(processing)
Kafka
(data)
JVM application
with Kafka Streams (processing)
Does not run on
Kafka brokers!
Does not run on
Kafka brokers!
21. 21C O N F I D E N T I A L
Differences
Consumer,
Producer
KSQL
Kafka Streams
Flexibilit
y
Ease of Use
CREATE STREAM ...
CREATE TABLE ...
SELECT, JOIN, COUNT, …
KStream, KTable,
filter(), map(), flatMap(),
join(), aggregate(), …
subscribe(), poll(),
send(), flush(),
beginTransaction(), …
streams
KSQL Kafka
Clients
22. 22C O N F I D E N T I A L
Differences
You write…
UI included for human interaction
CLI included for human interaction
Data formats
Interactive queries
KSQL statements
Yes (Enterprise)
Yes
Avro, JSON, CSV (today)
Not yet
JVM applications
No
No
Any data format, including
Avro, JSON, CSV, Protobuf, XML
Yes
streams
KSQL
Flexibility, use case coverage Limited to KSQL syntax, UDFs Full power of Java, Scala
REST API included Yes No, but you can DIY
Runtime included Yes, the KSQL server Not needed, applications run
as standard JVM processes
23. 23C O N F I D E N T I A L
Guidance
streams
KSQL
• New to streaming and Kafka
• Prefer SQL to writing code in Java, Scala
• Prefer interactive experience with UI, CLI
• Use cases include: filtering, transforming,
masking data; enriching data, joining
data sources
• Use case is naturally expressible through
SQL, with optional help from User Defined
Functions as “get out of jail free” card
• Provides KSQL REST API for use from
Python, Go, JavaScript, shell, etc.
• At least basic Kafka experience
• Prefer writing and deploying JVM apps
• Writing microservices
• Use cases cover KSQL’s and more
• To integrate with external services or
3rd party libraries (but see KSQL UDFs)
• To customize or fine-tune a use case,
e.g. custom joins, probabilistic
counting
• Need for queryable state, which is not
yet supported by KSQL
24. 24C O N F I D E N T I A L
KSQL and Kafka Streams
A closer look
25. 25C O N F I D E N T I A L
Next: KSQL in more detail
streams
The streaming SQL engine
for Apache Kafka® to write
real-time applications in SQL
Apache Kafka® library to write
real-time applications and
microservices in Java, Scala
KSQL
30
26. 26C O N F I D E N T I A L
KSQL
● You write only SQL.
No Java, Python, or
other boilerplate to
wrap around it!
● Create KSQL user
defined functions in
Java when needed.
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8
27. 27C O N F I D E N T I A L
New user experience: interactive stream processing
28. 28C O N F I D E N T I A L
KSQL can be used interactively + programmatically
ksql>
1 UI
POST /query
2 CLI 3 REST 4 Headless
29. 29C O N F I D E N T I A L
KSQL REST API
POST /query HTTP/1.1
{
"ksql": "SELECT * FROM users WHERE name LIKE 'a%';"
"streamsProperties": {
"your.custom.setting": "value"
}
}
Work with KSQL programmatically from other languages or the terminal
Here: run a continuous query and stream back the results
30. 30C O N F I D E N T I A L
All you need is Kafka & KSQL
1.Build & package
2. Submit job
ksql> SELECT * FROM myStream
Without KSQL With KSQL
Processing
clusterStorage
system
required for
fault-tolerant
processing
31. 31C O N F I D E N T I A L
KSQL is a stream processing technology
As such it is not yet a great fit for:
Ad-hoc queries
● No indexes yet in KSQL
● Kafka often configured to retain
data for only a limited span of
time
BI reports (Tableau etc.)
● No indexes yet in KSQL
● No official JDBC
● Most BI tools don’t understand
continuous, streaming results
32. 32C O N F I D E N T I A L
Data exploration
KSQL example use cases
Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
33. 33C O N F I D E N T I A L
Example: CDC from DB via Kafka to Elastic
KSQL processes table changes in
real-time to continuously
maintain aggregates of metrics,
KPI
Kafka Connect
streams data
in
Kafka Connect
streams data out
34. 34C O N F I D E N T I A L
Example: Retail
KSQL joins the two
streams in real-time
Stream of shipments
that arrive
Stream of purchases
from online and physical
stores
35. 35C O N F I D E N T I A L
Example: IoT, Automotive, Connected Cars
KSQL joins the stream
and table in real-time, and
spots for vehicle failures
Kafka Connect
streams data in
Cars send telemetry data
via Kafka API
Kafka Streams
application to notify
customers
36. 36C O N F I D E N T I A L
KSQL for Streaming ETL
● Joining, filtering,
and aggregating
streams of event
data
CREATE STREAM vip_actions AS
SELECT user_id, page, action
FROM clickstream c
LEFT JOIN users u
ON c.user_id = u.user_id
WHERE u.level = 'Platinum'
37. 37C O N F I D E N T I A L
KSQL for Data Transformation
● Easily make
derivations of
existing topics
● Change data format
● Change number of
partitions or
partitioning scheme
CREATE STREAM pageviews_avro
WITH (PARTITIONS=6,
VALUE_FORMAT='AVRO') AS
SELECT * FROM pageviews_json
PARTITION BY user_id
38. 38C O N F I D E N T I A L
KSQL for Real-Time Monitoring
● Filtering, tracking,
and alerting
● Log data monitoring
● Syslog data
● Sensor / IoT data
● Application metrics
CREATE STREAM syslog_invalid_users AS
SELECT host, message
FROM syslog
WHERE message LIKE '%Invalid user%'
http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
39. 39C O N F I D E N T I A L
KSQL for Anomaly Detection
● Identify patterns or
anomalies in real-
time data, surfaced
in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3
40. 40C O N F I D E N T I A L
Streams and Tables
Important because most use cases need both
41. 41C O N F I D E N T I A L
§
Do you think that’s a table
you are querying?
42. 42C O N F I D E N T I A L
The Stream-Table Duality
aggregation
changelog “materialized view”
of the stream
(like SUM, COUNT)
Stream Table
(CDC)
43. 43C O N F I D E N T I A L
The Stream-Table Duality
CREATE TABLE num_visited_locations_per_user AS
SELECT username, COUNT(*)
FROM location_updates
GROUP BY username
44. 45C O N F I D E N T I A L
Scalability, Elasticity,
Fault-Tolerance
45. 46C O N F I D E N T I A L
Fault-tolerance, powered by Kafka
Server A:
“I do stateful stream
processing, like tables,
joins, aggregations.”
“streaming
restore” of
A’s local state to B
Changelog Topic
“streaming
backup” of
A’s local state
KSQL
Kafka
State is automatically migrated
in case of server failure
Server B:
“I restore the state and
continue processing where
server A stopped.”
A key challenge of distributed stream processing is fault-tolerant state.
46. 47C O N F I D E N T I A L
Fault-tolerance, powered by Kafka
Processing fails over automatically, without data loss or miscomputation.
1 Kafka consumer group
rebalance is triggered
2 Processing and state of #3
is migrated via Kafka to
remaining servers #1 + #2
#3 died, so #1 and #2 take over
1 Kafka consumer group
rebalance is triggered
2 Part of processing incl.
state is migrated via Kafka
from #1 + #2 to server #3
#3 is back, so work is split again
47. 48C O N F I D E N T I A L
Elasticity and Scalability, powered by Kafka
You can add, remove, restart servers during live operations.
We need more processing power!” “Ok, we can scale down again.”
49. 50C O N F I D E N T I A L
Deploying KSQL
KSQL Server
(JVM process)
…and many more…
DEB, RPM, ZIP, TAR downloads
http://confluent.io/ksql
Docker images
confluentinc/cp-ksql-server
confluentinc/cp-ksql-cli
50. 51C O N F I D E N T I A L
Deploying KSQL
#1 Interactive KSQL, for development & testing
ksql>
POST /query
...
KSQL
(processing)
Kafka
(data)
51. 52C O N F I D E N T I A L
Deploying KSQL
#2 Headless KSQL, for production
servers started
with same
.sql file ...
interaction for
UI, CLI, REST
is disabled
KSQL
(processing)
Kafka
(data)
52. 53C O N F I D E N T I A L
Deploying KSQL
read,
write
…
BookingsTeam
…
FraudTeam
…
MobileTeam
KSQL
(processing)
Kafka
(data)
More KSQL
53. 54C O N F I D E N T I A L
Monitoring KSQL
https://www.confluent.io/blog/troubleshooting-ksql-part-2
Confluent Control Center JMX
54. 56C O N F I D E N T I A L
Next: Kafka Streams in more detail
streams
The streaming SQL engine
for Apache Kafka® to write
real-time applications in SQL
Apache Kafka® library to write
real-time applications and
microservices in Java, Scala
KSQL
10
55. 57C O N F I D E N T I A L
Kafka Streams
● You write standard
Java or Scala
applications to
process your data
● The Kafka Streams
library makes these
applications:
elastic, scalable,
fault-tolerant, and
more
● All you need is
56. 58C O N F I D E N T I A L
All you need is Kafka & Kafka Streams
Without Kafka Streams With Kafka Streams
JVM application
1.Build & package
2. Submit job
required for
fault-tolerant
processing
Processing
clusterStorage
system
57. 59C O N F I D E N T I A L
DB (key-value store) included, and queryable
Location-tracking
Application:
“I continuously track the
latest geolocation of every
customer vehicle in a table.”
Your app has its own local DB. You can also expose it to other apps, e.g. via
REST.
Other Applications
(Java, Go, Python, etc.)
can directly query this table.
Kafka Write
results
Read
results
Alternative
query
58. 60C O N F I D E N T I A L
Microservices
Kafka Streams example use cases
Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
59. 61C O N F I D E N T I A L
Writing Kafka Streams applications
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>2.1.0</version>
</dependency>
Add as dependency to your Java/Scala application
60. 62C O N F I D E N T I A L
Writing Kafka Streams applications
DSL Processor API
And of course, you can combine the DSL and the Processor API!
API style Functional programming Imperative programming
Typically used when Starting point for most developers
and most use cases
To customize, tune, or to add functionality
beyond what’s in the DSL today
You work with KStream and KTable Processors and state stores
Example operations KStream#map(), KTable#filter(),
KStream#join(), aggregate()
Processor#init(), Processor#close(),
Processor#process(msg)
Suitable for use cases S / M / L / XL S / M / L / XL
61. 63C O N F I D E N T I A L
Deploying Kafka Streams applications
JVM application
with Kafka Streams (processing)
Develop your
application
Build and package
(jar, container, ...)
Deploy and run
one or multiple
app instances
…and many more…
62. 64C O N F I D E N T I A L
Elasticity and Scalability, powered by Kafka
You can add, remove, restart instances of your application during live operations.
We need more processing power!” “Ok, we can scale down again.”
63. 65C O N F I D E N T I A L
Deploying Kafka Streams applications
read,
write
App
(processing)
Kafka
(data)
More Apps
BookingsTeam
FraudTeam
…
MobileTeam
…
64. 66C O N F I D E N T I A L
Confluent Platform as
Central Nervous System
65. 67C O N F I D E N T I A L
Confluent’s Streaming Maturity Model - where are you?
Value
Maturity (Investment & time)
2
Enterprise
Streaming Pilot /
Early Production
Pub + Sub Store Process
5
Central Nervous
System
1
Developer
Interest
Pre-Streaming
4
Global
Streaming
3
SLA Ready,
Integrated
Streaming
Projects
Platform
66. 69C O N F I D E N T I A L
Kafka Connect
Kafka Cluster
CRM
Integration
Domain-Driven Design for your Event Steaming Platform
Legacy
Integration
Custom
Application
ESB Connector
Java / KSQL /
Kafka Streams
Schema
Registry
Event Streaming Platform
CRM Domain Legacy Domain Payment Domain
è Independent and loosely coupled, but scalable, highly available and reliable!
67. 70C O N F I D E N T I A L
Confluent Schema Registry for Message Validation
Input Data
Schema
Registry
App 1
• “Kafka Benefits Under the Hood”
• Schema definition + evolution
• Forward and backward compatibility
• Multi data center deployment
App X
68. 71C O N F I D E N T I A L
Resources and Next Steps
confluentinc/kafka-streams-examples
https://docs.confluent.io/current/streams/
http://cnfl.io/slack
69. 72C O N F I D E N T I A L
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.confluent.io
www.kai-waehner.de
LinkedIn
Questions? Feedback?
Please contact me!