SlideShare a Scribd company logo
1 of 55
Download to read offline
1
Kafka 102:
Streams and Tables All the Way Down
Kafka Summit San Francisco, Sep 2019
Michael G. Noll
Technologist, Office of the CTO, Confluent
@miguno
22
Streams and Tables
A First Look
33
@miguno
Streams Tables
Event Streaming
44
@miguno
Streams Tables
Event Streaming
5
An Event Streaming Platform
gives you three key functionalities
5
Publish & Subscribe
to Events
Store
Events
Process & Analyze
Events
@miguno
6
An Event
records the fact that something happened
6
A good
was sold
An invoice
was issued
A payment
was made
A new customer
registered
@miguno
7
A Stream
records history as a sequence of Events
7
@miguno
88
@miguno
Event Streaming Paradigm
Highly Scalable
Durable
Persistent
Maintains Order
Fast (Low Latency)
Kafka = Source of Truth,
stores every article since 1851
Denormalized into
“Content View”
Normalized assets
(images, articles,
bylines, etc.)
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Streams record history, even hundreds of years
99
“The ledger of sales.” “The sales totals.”
Streams
record history
Tables
represent state
@miguno
1010
1. e4 e5
2. Nf3 Nc6
3. Bc4 Bc5
4. d3 Nf6
5. Nbd2
“The sequence of moves.” “The state of the board.”
Streams
record history
Tables
represent state
@miguno
11
Streams = INSERT only
Immutable, append-only
Tables = INSERT, UPDATE, DELETE
Mutable, row key (event.key) identifies which row
11
@miguno
12
The key to mutability is … the event.key!
12
@miguno
Stream Table
Has unique key constraint? No Yes
First event with key ‘alice’ arrives INSERT INSERT
Another event with key ‘alice’ arrives INSERT UPDATE
Event with key ‘alice’ and value == null arrives INSERT DELETE
Event with key == null arrives INSERT <ignored>
RDBMS analogy: A Stream is ~ a Table that has no unique key and is append-only.
13
Creating a table from a stream or topic
streams
1414
@miguno
Stream
(facts)
Table
(dims)
alice Berlin
bob Lima
alice Berlin
alice Rome
bob Lima
alice Paris
bob Sydney
alice Berlin
alice Rome
bob Lima
alice Paris
bob Sydney
90°
1515
aggregation
(like SUM, COUNT)
table changes
*See Streams and Tables: Two Sides of the Same Coin, M. Sax et al., BIRTE ’18
Streams
record history
Tables
represent state
Duality
@miguno
16
Aggregating a stream (COUNT example)
streams
@miguno
17
Aggregating a stream (COUNT example)
streams
@miguno
1818
Kafka Topics
The storage foundation of Streams and Tables
19
Data storage of a Kafka topic is partitioned
which impacts data processing as we see later
19
...
...
...
...
P1
P2
P3
P4
storage
Topic
@miguno
20
Producers determine target partition of an event
through a partitioning function ƒ(event.key)
20
...
...
...
...
P1
P2
P3
P4
storage
Topic
Producer client 1
Producer client 2
event sent and appended
to partition 1
@miguno
21
Events with same key should be in same partition
to ensure proper ordering of related events
to ensure processing by Consumers returns expected results
21
...
...
...
...
P1
P2
P3
P4
Producer client 1
Producer client 2
Yellow events should always
be stored in partition 3
@miguno
22
Top causes for same key in different partitions
1. You increased/decreased number of partitions
2. A producer uses a custom partitioner
→ Be careful in this situation!
22
@miguno
23
Processing Layer
(KStreams, KSQL, etc.)
Storage Layer
(Brokers)
Partitions play a central role in Kafka
23
@miguno
Topics are partitioned. Partitions enable scalability, elasticity, fault-tolerance.
joined based on
stored in
replicated based on
log-compacted based on
read from and written to
processed based on
partitionsData is
2424
From Topics to Streams and Tables
25
Topics
live in the Kafka storage layer, ‘filesystem’ (Brokers)
Streams and Tables
live in the Kafka processing layer (KStreams, KSQL)
25
@miguno
26
Processing Layer
(KSQL, KStreams)
26
00100 11101 11000 00011 00100 00110Topic
alice Paris bob Sydney alice RomeStream
plus schema (serdes)
alice Rome
bob Sydney
Table
plus aggregation
Storage Layer
(Brokers)
Topics vs. Streams and Tables
@miguno
27
Kafka Processing
Data is processed per-partition
27
...
...
...
...
P1
P2
P3
P4
storage processing state
Stream Task 1
Stream Task 2
Stream Task 3
Stream Task 4
read via
network
Application Instance 1Topic Application Instance 1
Application Instance 2
@miguno
28
Streams and Tables are partitioned, too
And so is their processing!
28
...
...
...
...
P1
P2
P3
P4
Stream Task 1
Stream Task 2
Stream Task 3
Stream Task 4
KTable / TABLE
2 GB
3 GB
5 GB
2 GB
@miguno
29
Global Tables give complete data to every task
Great for joins without re-keying the input or to broadcast info
29
...
...
...
...
P1
P2
P3
P4
Stream Task 1
Stream Task 2
Stream Task 3
Stream Task 4
GlobalKTable
2 + 3 + 5 + 2 = 12 GB
12 GB
12 GB
12 GB
@miguno
30
Tables are cached in ‘state stores’ on local disk
Tables and other state don’t need to fit into RAM.
Enables large-scale state. Saves $$$ on cloud instances.
30
But: The ‘source of truth’ of a table is its underlying topic!
@miguno
Stream Task 1 Cached on local disk under /var
(default storage engine: RocksDB)
Global
Table
Table
31
streaming restore
via network
Stream Task 1
On machine B
Tables and other state are always fault-tolerant
because backed by Kafka topics (cf. event sourcing)
31
streaming backup
via network
table’s changelog topic
(log-compacted)
Kafka
Storage
KSQL,
Kafka Streams
On machine A
Stream Task 1
@miguno
32
Standby Replicas speed up application recovery
App instances can optionally maintain copies of another
instance’s local state stores to minimize failover times
32
@miguno
num.standby.replicas = 1
Stream Task 1
Stream Task 2
App Instance 1
App Instance 2
num.standby.replicas = 0 (default)
Stream Task 1
Stream Task 2
Stream Task 2
failover
33
Elastic scaling
Stream tasks are migrated, including their state (via Kafka)
33
@miguno
App Instance 1
App Instance 2
Stream Task 1
Stream Task 3
Stream Task 2
Stream Task 4
App Instance 3
App Instance 4
Stream Task 2
Stream Task 4
34
bob Zurich
alice Rome
bob Zurich
bob Sydney alice Romealice Bern alice Paris
Tables <-> topic log-compaction
A table’s underlying topic is compacted by default
to save Kafka storage space
to speed up failover and recovery for processing
34
Note: Compaction intentionally removes part of a table’s history. If you need the full
history and don’t have the historic data elsewhere, consider disabling compaction.
@miguno
3535
@miguno
TL;DR for Log Compaction
Have a Stream?
→ Disable log compaction for its topic (= default)
Have a Table?
→ Enable log compaction for its topic (= default)
Disable only when needed, see previous slide
3636
Concept Schema Partitioned Unbounded Ordering Mutable Unique key constraint Fault-tolerant
Storage Layer
Topic No (raw bytes) Yes Yes Yes No No Yes
Processing Layer
Stream Yes Yes Yes Yes No No Yes
Table Yes Yes No* No Yes Yes Yes
Global Table Yes No No* No Yes Yes Yes
*Generally speaking the answer is Yes but, in practice, tables are almost always bounded due to finite key space.
Topics vs. Streams and Tables
@miguno
3737
Working with Streams and Tables
38
Max processing parallelism = #input partitions
38
...
...
...
...
P1
P2
P3
P4
Topic Application Instance 1
Application Instance 2
Application Instance 3
Application Instance 4
Application Instance 5 *** idle ***
Application Instance 6 *** idle ***
→ Need higher parallelism? Increase the original topic’s partition count.
→ Higher parallelism for just one use case? Derive a new topic from the
original with higher partition count. Lower its retention to save storage.
@miguno
39
How to increase number of partitions when needed
KSQL example: statement below creates a new stream with
the desired number of partitions.
39
CREATE STREAM products_repartitioned
WITH (PARTITIONS=30) AS
SELECT * FROM products
@miguno
40
‘Hot’ partitions can be problematic, often caused by
1. Events not evenly distributed across partitions
2. Events evenly distributed but certain events take longer to process
40
Strategies to address hot partitions include
1a. Ingress: Find better partitioning function ƒ(event.key) for producers
1b. Storage: Re-partition data into new topic if you can’t change the original
2. Scale processing vertically, e.g. more powerful CPU instances
...
...
...
...
P1
P2
P3
P4
@miguno
41
Joining Streams and Tables
Data must be ‘co-partitioned’
41
TableStream
Join Output
(Stream)
@miguno
42
Joining Streams and Tables
Data must be ‘co-partitioned’
42
bob male
alice female
alex male
alice Paris
Table
P1
P2
P3
zoie female
andrew male
mina female
natalie female
blake male
alice Paris
Stream
P2
(alice, Paris) from stream’s
P2 has a matching entry
for alice in the table’s P2.
female
@miguno
43
Joining Streams and Tables
Data is looked up in same partition number
43
alice Paris alice male
alice female
alice Paris
Stream Table
P2 P1
P2
P3
Here, key ‘alice’ exists in
multiple partitions.
But entry in P2 (female) is
used because the
stream-side event is from
stream’s partition P2.
female
Scenario 2
@miguno
44
Joining Streams and Tables
Data is looked up in same partition number
44
alice Paris alice male
alice Paris
Stream Table
P2 P1
P2
P3
Here, key ‘alice’ exists only
in the table’s P1 != P2.
null
no
match!
Scenario 3
@miguno
45
Data co-partitioning requirements in detail
1. Same keying scheme for both input sides
2. Same number of partitions
3. Same partitioning function ƒ(event.key)
45
Further Reading on Joining Streams and Tables:
https://www.confluent.io/kafka-summit-sf18/zen-and-the-art-of-streaming-joins
https://docs.confluent.io/current/ksql/docs/developer-guide/partition-data.html
@miguno
46
Why is that so?
Because of how input data is mapped to stream tasks
46
...
...
...
P1
P2
P3
storage
processing state
Stream Task 2
read via
network
Stream
Topic
...
...
...
P1
P2
P3
Table
Topic
from stream’s P2
from table’s P2
@miguno
47
How to re-partition your data when needed
KSQL example: statement below creates a new stream with
changed number of partitions and a new field as event.key
(so that its data is now correctly co-partitioned for joining)
47
CREATE STREAM products_repartitioned
WITH (PARTITIONS=42) AS
SELECT * FROM products
PARTITION BY product_id;
@miguno
48
Joining Streams and Global Tables
No need to worry about co-partitioning!
48
Global TableStream
Join Output
(Stream)
@miguno
Stream Task 1 2 + 3 + 5 + 2 = 12 GB
That’s because each stream task has the full data from
all the table’s partitions:
49
Capacity Planning and Sizing
Sorry, not covered here! I recommend:
KSQL Performance Tuning for Fun and Profit, by Nick Dearden
October 1, 2019 @ 2:50-3:30 pm, Stream Processing track
https://kafka-summit.org/sessions/ksql-performance-tuning-fun-profit/
49
@miguno
50
Streams and Tables in KSQL
Stream 01
Stream 02
Stream 03
Table
Process event streams to create new, continuously updated streams or tables
QueryQuery
Push
Query
CREATE TABLE OrderTotals AS SELECT * FROM ... EMIT CHANGES
@miguno
51
Streams and Tables in KSQL
Query tables similar to a relational database
Table
QueryQuery
Pull
Query
SELECT * FROM OrderTotals WHERE region = ‘Europe’
Result
New feature
@miguno
52
Query tables from other apps with push or pull queries
Other Applications
(Java, Go, Python, etc.)
can directly query tables
Result
via network
(KSQL REST API)
Table
SELECT * FROM OrderTotals WHERE region = ‘Europe’
Streams and Tables in KSQL
@miguno
53
Streaming import/export of Tables
KSQL integrates with Kafka Connect
CREATE SOURCE CONNECTOR my-postgres-jdbc WITH (
connector.class = "io.confluent.connect.jdbc.jdbcSourceConnector",
connection.url = "jdbc:postgresql://dbserver:5432/my-db", ...);
New feature
controls controls
@miguno
54
KSQL example use case
Creating an event-driven dashboard from a customer database
@miguno
customers
table
Table
Kafka Connect is
streaming change events
Stream
Aggregations are
computed in real-time
Table
Results are
continuously updating
Elasticsearch
Table (Index)Stream
55
THANK YOU
@miguno
michael@confluent.io
cnfl.io/meetups cnfl.io/blog cnfl.io/slack

More Related Content

What's hot

Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
Databricks
 
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
confluent
 

What's hot (20)

What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
 
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
HPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark verticaHPBigData2015 PSTL kafka spark vertica
HPBigData2015 PSTL kafka spark vertica
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 

Similar to Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019

Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
HostedbyConfluent
 

Similar to Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019 (20)

What every software engineer should know about streams and tables in kafka ...
What every software engineer should know about streams and tables in kafka   ...What every software engineer should know about streams and tables in kafka   ...
What every software engineer should know about streams and tables in kafka ...
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
spark stream - kafka - the right way
spark stream - kafka - the right way spark stream - kafka - the right way
spark stream - kafka - the right way
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)Unified stateful big data processing in Apache Beam (incubating)
Unified stateful big data processing in Apache Beam (incubating)
 
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache BeamAljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDB
 
Handout3o
Handout3oHandout3o
Handout3o
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Parallel computing in Python: Current state and recent advances
Parallel computing in Python: Current state and recent advancesParallel computing in Python: Current state and recent advances
Parallel computing in Python: Current state and recent advances
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Crosstalk
CrosstalkCrosstalk
Crosstalk
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014
 

More from Michael Noll

Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Michael Noll
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Michael Noll
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 

More from Michael Noll (9)

Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019

  • 1. 1 Kafka 102: Streams and Tables All the Way Down Kafka Summit San Francisco, Sep 2019 Michael G. Noll Technologist, Office of the CTO, Confluent @miguno
  • 5. 5 An Event Streaming Platform gives you three key functionalities 5 Publish & Subscribe to Events Store Events Process & Analyze Events @miguno
  • 6. 6 An Event records the fact that something happened 6 A good was sold An invoice was issued A payment was made A new customer registered @miguno
  • 7. 7 A Stream records history as a sequence of Events 7 @miguno
  • 8. 88 @miguno Event Streaming Paradigm Highly Scalable Durable Persistent Maintains Order Fast (Low Latency) Kafka = Source of Truth, stores every article since 1851 Denormalized into “Content View” Normalized assets (images, articles, bylines, etc.) https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Streams record history, even hundreds of years
  • 9. 99 “The ledger of sales.” “The sales totals.” Streams record history Tables represent state @miguno
  • 10. 1010 1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. d3 Nf6 5. Nbd2 “The sequence of moves.” “The state of the board.” Streams record history Tables represent state @miguno
  • 11. 11 Streams = INSERT only Immutable, append-only Tables = INSERT, UPDATE, DELETE Mutable, row key (event.key) identifies which row 11 @miguno
  • 12. 12 The key to mutability is … the event.key! 12 @miguno Stream Table Has unique key constraint? No Yes First event with key ‘alice’ arrives INSERT INSERT Another event with key ‘alice’ arrives INSERT UPDATE Event with key ‘alice’ and value == null arrives INSERT DELETE Event with key == null arrives INSERT <ignored> RDBMS analogy: A Stream is ~ a Table that has no unique key and is append-only.
  • 13. 13 Creating a table from a stream or topic streams
  • 14. 1414 @miguno Stream (facts) Table (dims) alice Berlin bob Lima alice Berlin alice Rome bob Lima alice Paris bob Sydney alice Berlin alice Rome bob Lima alice Paris bob Sydney 90°
  • 15. 1515 aggregation (like SUM, COUNT) table changes *See Streams and Tables: Two Sides of the Same Coin, M. Sax et al., BIRTE ’18 Streams record history Tables represent state Duality @miguno
  • 16. 16 Aggregating a stream (COUNT example) streams @miguno
  • 17. 17 Aggregating a stream (COUNT example) streams @miguno
  • 18. 1818 Kafka Topics The storage foundation of Streams and Tables
  • 19. 19 Data storage of a Kafka topic is partitioned which impacts data processing as we see later 19 ... ... ... ... P1 P2 P3 P4 storage Topic @miguno
  • 20. 20 Producers determine target partition of an event through a partitioning function ƒ(event.key) 20 ... ... ... ... P1 P2 P3 P4 storage Topic Producer client 1 Producer client 2 event sent and appended to partition 1 @miguno
  • 21. 21 Events with same key should be in same partition to ensure proper ordering of related events to ensure processing by Consumers returns expected results 21 ... ... ... ... P1 P2 P3 P4 Producer client 1 Producer client 2 Yellow events should always be stored in partition 3 @miguno
  • 22. 22 Top causes for same key in different partitions 1. You increased/decreased number of partitions 2. A producer uses a custom partitioner → Be careful in this situation! 22 @miguno
  • 23. 23 Processing Layer (KStreams, KSQL, etc.) Storage Layer (Brokers) Partitions play a central role in Kafka 23 @miguno Topics are partitioned. Partitions enable scalability, elasticity, fault-tolerance. joined based on stored in replicated based on log-compacted based on read from and written to processed based on partitionsData is
  • 24. 2424 From Topics to Streams and Tables
  • 25. 25 Topics live in the Kafka storage layer, ‘filesystem’ (Brokers) Streams and Tables live in the Kafka processing layer (KStreams, KSQL) 25 @miguno
  • 26. 26 Processing Layer (KSQL, KStreams) 26 00100 11101 11000 00011 00100 00110Topic alice Paris bob Sydney alice RomeStream plus schema (serdes) alice Rome bob Sydney Table plus aggregation Storage Layer (Brokers) Topics vs. Streams and Tables @miguno
  • 27. 27 Kafka Processing Data is processed per-partition 27 ... ... ... ... P1 P2 P3 P4 storage processing state Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4 read via network Application Instance 1Topic Application Instance 1 Application Instance 2 @miguno
  • 28. 28 Streams and Tables are partitioned, too And so is their processing! 28 ... ... ... ... P1 P2 P3 P4 Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4 KTable / TABLE 2 GB 3 GB 5 GB 2 GB @miguno
  • 29. 29 Global Tables give complete data to every task Great for joins without re-keying the input or to broadcast info 29 ... ... ... ... P1 P2 P3 P4 Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4 GlobalKTable 2 + 3 + 5 + 2 = 12 GB 12 GB 12 GB 12 GB @miguno
  • 30. 30 Tables are cached in ‘state stores’ on local disk Tables and other state don’t need to fit into RAM. Enables large-scale state. Saves $$$ on cloud instances. 30 But: The ‘source of truth’ of a table is its underlying topic! @miguno Stream Task 1 Cached on local disk under /var (default storage engine: RocksDB) Global Table Table
  • 31. 31 streaming restore via network Stream Task 1 On machine B Tables and other state are always fault-tolerant because backed by Kafka topics (cf. event sourcing) 31 streaming backup via network table’s changelog topic (log-compacted) Kafka Storage KSQL, Kafka Streams On machine A Stream Task 1 @miguno
  • 32. 32 Standby Replicas speed up application recovery App instances can optionally maintain copies of another instance’s local state stores to minimize failover times 32 @miguno num.standby.replicas = 1 Stream Task 1 Stream Task 2 App Instance 1 App Instance 2 num.standby.replicas = 0 (default) Stream Task 1 Stream Task 2 Stream Task 2 failover
  • 33. 33 Elastic scaling Stream tasks are migrated, including their state (via Kafka) 33 @miguno App Instance 1 App Instance 2 Stream Task 1 Stream Task 3 Stream Task 2 Stream Task 4 App Instance 3 App Instance 4 Stream Task 2 Stream Task 4
  • 34. 34 bob Zurich alice Rome bob Zurich bob Sydney alice Romealice Bern alice Paris Tables <-> topic log-compaction A table’s underlying topic is compacted by default to save Kafka storage space to speed up failover and recovery for processing 34 Note: Compaction intentionally removes part of a table’s history. If you need the full history and don’t have the historic data elsewhere, consider disabling compaction. @miguno
  • 35. 3535 @miguno TL;DR for Log Compaction Have a Stream? → Disable log compaction for its topic (= default) Have a Table? → Enable log compaction for its topic (= default) Disable only when needed, see previous slide
  • 36. 3636 Concept Schema Partitioned Unbounded Ordering Mutable Unique key constraint Fault-tolerant Storage Layer Topic No (raw bytes) Yes Yes Yes No No Yes Processing Layer Stream Yes Yes Yes Yes No No Yes Table Yes Yes No* No Yes Yes Yes Global Table Yes No No* No Yes Yes Yes *Generally speaking the answer is Yes but, in practice, tables are almost always bounded due to finite key space. Topics vs. Streams and Tables @miguno
  • 38. 38 Max processing parallelism = #input partitions 38 ... ... ... ... P1 P2 P3 P4 Topic Application Instance 1 Application Instance 2 Application Instance 3 Application Instance 4 Application Instance 5 *** idle *** Application Instance 6 *** idle *** → Need higher parallelism? Increase the original topic’s partition count. → Higher parallelism for just one use case? Derive a new topic from the original with higher partition count. Lower its retention to save storage. @miguno
  • 39. 39 How to increase number of partitions when needed KSQL example: statement below creates a new stream with the desired number of partitions. 39 CREATE STREAM products_repartitioned WITH (PARTITIONS=30) AS SELECT * FROM products @miguno
  • 40. 40 ‘Hot’ partitions can be problematic, often caused by 1. Events not evenly distributed across partitions 2. Events evenly distributed but certain events take longer to process 40 Strategies to address hot partitions include 1a. Ingress: Find better partitioning function ƒ(event.key) for producers 1b. Storage: Re-partition data into new topic if you can’t change the original 2. Scale processing vertically, e.g. more powerful CPU instances ... ... ... ... P1 P2 P3 P4 @miguno
  • 41. 41 Joining Streams and Tables Data must be ‘co-partitioned’ 41 TableStream Join Output (Stream) @miguno
  • 42. 42 Joining Streams and Tables Data must be ‘co-partitioned’ 42 bob male alice female alex male alice Paris Table P1 P2 P3 zoie female andrew male mina female natalie female blake male alice Paris Stream P2 (alice, Paris) from stream’s P2 has a matching entry for alice in the table’s P2. female @miguno
  • 43. 43 Joining Streams and Tables Data is looked up in same partition number 43 alice Paris alice male alice female alice Paris Stream Table P2 P1 P2 P3 Here, key ‘alice’ exists in multiple partitions. But entry in P2 (female) is used because the stream-side event is from stream’s partition P2. female Scenario 2 @miguno
  • 44. 44 Joining Streams and Tables Data is looked up in same partition number 44 alice Paris alice male alice Paris Stream Table P2 P1 P2 P3 Here, key ‘alice’ exists only in the table’s P1 != P2. null no match! Scenario 3 @miguno
  • 45. 45 Data co-partitioning requirements in detail 1. Same keying scheme for both input sides 2. Same number of partitions 3. Same partitioning function ƒ(event.key) 45 Further Reading on Joining Streams and Tables: https://www.confluent.io/kafka-summit-sf18/zen-and-the-art-of-streaming-joins https://docs.confluent.io/current/ksql/docs/developer-guide/partition-data.html @miguno
  • 46. 46 Why is that so? Because of how input data is mapped to stream tasks 46 ... ... ... P1 P2 P3 storage processing state Stream Task 2 read via network Stream Topic ... ... ... P1 P2 P3 Table Topic from stream’s P2 from table’s P2 @miguno
  • 47. 47 How to re-partition your data when needed KSQL example: statement below creates a new stream with changed number of partitions and a new field as event.key (so that its data is now correctly co-partitioned for joining) 47 CREATE STREAM products_repartitioned WITH (PARTITIONS=42) AS SELECT * FROM products PARTITION BY product_id; @miguno
  • 48. 48 Joining Streams and Global Tables No need to worry about co-partitioning! 48 Global TableStream Join Output (Stream) @miguno Stream Task 1 2 + 3 + 5 + 2 = 12 GB That’s because each stream task has the full data from all the table’s partitions:
  • 49. 49 Capacity Planning and Sizing Sorry, not covered here! I recommend: KSQL Performance Tuning for Fun and Profit, by Nick Dearden October 1, 2019 @ 2:50-3:30 pm, Stream Processing track https://kafka-summit.org/sessions/ksql-performance-tuning-fun-profit/ 49 @miguno
  • 50. 50 Streams and Tables in KSQL Stream 01 Stream 02 Stream 03 Table Process event streams to create new, continuously updated streams or tables QueryQuery Push Query CREATE TABLE OrderTotals AS SELECT * FROM ... EMIT CHANGES @miguno
  • 51. 51 Streams and Tables in KSQL Query tables similar to a relational database Table QueryQuery Pull Query SELECT * FROM OrderTotals WHERE region = ‘Europe’ Result New feature @miguno
  • 52. 52 Query tables from other apps with push or pull queries Other Applications (Java, Go, Python, etc.) can directly query tables Result via network (KSQL REST API) Table SELECT * FROM OrderTotals WHERE region = ‘Europe’ Streams and Tables in KSQL @miguno
  • 53. 53 Streaming import/export of Tables KSQL integrates with Kafka Connect CREATE SOURCE CONNECTOR my-postgres-jdbc WITH ( connector.class = "io.confluent.connect.jdbc.jdbcSourceConnector", connection.url = "jdbc:postgresql://dbserver:5432/my-db", ...); New feature controls controls @miguno
  • 54. 54 KSQL example use case Creating an event-driven dashboard from a customer database @miguno customers table Table Kafka Connect is streaming change events Stream Aggregations are computed in real-time Table Results are continuously updating Elasticsearch Table (Index)Stream