SlideShare a Scribd company logo
1 of 37
ksqlDB: A Stream-Relational
Database System
Matthias J. Sax | Software Engineer
@MatthiasJSax
The Vision of a Streaming Database
2
SQL abstraction over mutable TABLEs and immutable append-only STREAMs
• OLTP?
• transaction, concurrent updates
• OLAP?
• complex ad-hoc queries
• Online Stream Processing:
• Continuous queries over STREAMs and TABLEs
• Simple state lookups into TABLEs
• Subscribe to data streams
• Streaming ETL
• Materialized View maintenance
Apache Kafka and ksqlDB
3
Broker Cluster
“Storage Layer”
ksqlDB Cluster
“Compute Layer”
Network
Replicated & partitioned, immutable append-only log of messages
Apache Kafka: Data Model
4
Messages:
• plain bytes: key (for partitioning), value
• long timestamp
Topic cleanup policies:
• retention: purge old messages (time or size based)
• compaction: keep the latest message per key
Apache Kafka: Data Model (cont.)
5
The Stream-Table Duality
6
ksqlDB: Data Model
Streams and tables of structured records
7
CREATE STREAM clickstream (
time BIGINT,
url VARCHAR,
status INTEGER,
bytes INTEGER,
user_id VARCHAR,
agent VARCHAR)
WITH (
kafka_topic = ‘cs_topic',
value_format = 'JSON'
);
CREATE TABLE users (
user_id INTEGER PRIMARY KEY,
registered_at LONG,
username VARCHAR,
name VARCHAR,
city VARCHAR,
level VARCHAR)
WITH (
kafka_topic = ‘users_topic',
value_format = 'AVRO’
);
ksqlDB: Queries
Persistent continuous queries (CQ)
• Take one or more input STREAMs/TABLEs and compute a result STREAM or TABLE
• Deployed in the ksqlDB servers
• Executed in a data-parallel manner
• Scalable
• Fault-tolerant
8
CREATE STREAM resultStream
AS SELECT...
CREATE TABLE resultTable
AS SELECT...
ksqlDB: Queries (cont.)
Transient client queries
• CLI / Java client / etc.
• Pull queries: simple state lookups against TABLEs
• “classic” queries
• Limited in ksqlDB (no aggregations or joins)
• Push queries: subscription to a result STREAM
9
ksqlDB: Overview
10
Kafka Cluster
(stores STREAMs/TABLEs as partitioned logs)
ksqlDB cluster
Upstream
Producers
CQ
Downstream
Consumers
App App
CQ
Push (Subscription) Pull (Query)
Querying
Queries Semantics
12
Streaming queries have temporal semantics based on event-time
TABLE queries:
• Like regular SQL
• However: TABLEs are treated as “versioned” tables that evolve over time
How to query a STREAM?
Simple Stream Queries
13
Filter, projection etc (stateless)
CREATE STREAM user_clicks AS
SELECT user_id, status, ucase(agent)
FROM clickstream
WHERE user_id = ‘mjsax’;
Stream Aggregation
14
Windowed / non-windowed: returns a continuously updating TABLE
CREATE TABLE clicks AS
SELECT user_id, COUNT(url)
FROM clickstream
WHERE bytes > 1024
WINDOW TUMBLIND
(size 30 seconds)
GROUP BY user_id
HAVING COUNT(url) > 20;
Stream-Stream Join
15
Sliding-window join:
14:041 14:162 14:083
14:01A 14:11B 14:23C
14:041⨝A 14:162⨝B 14:113⨝B
max(l.ts; r.ts)
CREATE STREAM joinedStream AS
SELECT *
FROM leftStream AS l JOIN rightStream AS r
WITHIN 5 minutes ON l.id = r.id;
Chaining stream-stream joins is not associative!
• Order matters: ⨝(s1,s2,s3) != (s1 ⨝ s2) ⨝ s3 != (s1 ⨝ s3) ⨝ s2 != (s2 ⨝ s3) ⨝ s1
N-way Stream-Stream Join
16
Chaining stream-stream joins is not associative!
• Order matters: ⨝(s1,s2,s3) != (s1 ⨝ s2) ⨝ s3 != (s1 ⨝ s3) ⨝ s2 != (s2 ⨝ s3) ⨝ s1
N-way Stream-Stream Join
17
14:06X 14:21Y
14:212⨝Y⨝b
14:16b14:11a
14:011 14:26314:162
* window size=5min
Chaining stream-stream joins is not associative!
• Order matters: ⨝(s1,s2,s3) != (s1 ⨝ s2) ⨝ s3 != (s1 ⨝ s3) ⨝ s2 != (s2 ⨝ s3) ⨝ s1
N-way Stream-Stream Join
18
14:06X 14:21Y
14:011 14:26314:162
14:212⨝Y14:061⨝X 14:263⨝Y
* window size=5min
Chaining stream-stream joins is not associative!
• Order matters: ⨝(s1,s2,s3) != (s1 ⨝ s2) ⨝ s3 != (s1 ⨝ s3) ⨝ s2 != (s2 ⨝ s3) ⨝ s1
N-way Stream-Stream Join
19
14:16b14:11a
14:212⨝Y⨝b14:111⨝Y⨝a
14:212⨝Y14:061⨝X 14:263⨝Y
* window size=5min
Table Queries
20
As in regular SQL, however…
2121
Do you think that’s a table you are querying ?
Table Queries
22
Filters, projections, and aggregations are executed on changelog streams
Query
Stream-Table Join
23
Data enrichment via table lookups
Stream-Table Join (cont.)
24
Stream-Table join is a temporal join
14:01a 14:03b 14:05c 14:08b 14:11a
14:02… 14:04… 14:07…14:06… 14:10…
14:01a
14:03b
14:05c
14:05
14:01a
14:08b
14:05c
14:08
14:11a
14:08b
14:05c
14:11
14:01a
14:03b
14:03
14:01a
14:01
14:06 14:07 14:1014:0414:02
Table-Table Join
25
Table-Table join is a temporal join
users
details
profile
v1 v5 v6
v2 v6
v2 v5 v6
Runtime: Kafka Streams
Persistent Queries
Executed as Kafka Streams programs
Kafka Streams:
• Java client library (part of the Apache Kafka project)
• High level DSL to process Kafka topics as KStreams and KTables
• Executes a dataflow program, represented as a directed graph of operators
• filter()/map()
• groupBy()/windowedBy()
• aggregate()
• join()
• Etc.
• Scalable
• Fault-Tolerant
27
Persistent Queries
Kafka Stream topologies:
28
Parallel execution:
Kafka Streams Internals
Consumers, Producers, and RocksDB
29
Kafka Streams Internals (cont.)
Data repartitioning and fault-tolerance
30
Fault-Tolerance and Hot Standbys
31
Recovery: replying the
changelog topic
Hot standbys: eagerly
replaying the changelog
topic
Hot standbys allow for
instant fail-over.
Hot standbys allow for HA
pull queries.
Elasticity
32
Dynamic scaling as special case of store recovery
Challenges
Work in Progress
Streaming SQL
• Concise and more powerful language
• Improved time/operator semantics
Consistency guarantees
• ksqlDB is an async system: vector clock approach for improved time tracking
Applications and transient queries
• More efficient and more scalable pull/push queries
• Improved INSERT/UPDATE/DELETE support?
34
Future Work
Query optimization
• Currently state: rule based
• filter push down
• merging of repartition topic
• merging of input/output/changelog topics
• Streaming cost model and cost-based optimization
• Adaptive re-optimization at runtime
• Query merging/splitting
Runtime improvements:
• Internal and optimized (binary) data format to avoid expensive (de)serialization costs
• Task assignment: load balancing vs stickyness
Richer SQL:
• Sub-query support
35
References
• KSQL: Streaming SQL Engine for Apache Kafka
Hojjat Jafarpour, Rohan Desai, Damian Guy
EDBT '19: Proceedings of the 22nd International Conference on Extending Database Technology, 2019
• Streams and Tables: Two Sides of the Same Coin
Matthias J. Sax, Guozhang Wang, Matthias Weidlich, Johann-Christoph Freytag
BIRTE '18: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics, 2018
• https://kafka.apache.org/books-and-papers
• https://ksqldb.io/
36
Thanks! We are hiring!
@MatthiasJSax
matthias@confluent.io | mjsax@apache.org

More Related Content

What's hot

Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...confluent
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafkaconfluent
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityJean-Paul Azar
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
ksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time EventsksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time Eventsconfluent
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connectKnoldus Inc.
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드confluent
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 

What's hot (20)

Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
kafka
kafkakafka
kafka
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
ksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time EventsksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time Events
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 

Similar to ksqlDB: A Stream-Relational Database System

KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafkaconfluent
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Reactivesummit
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaAkara Sucharitakul
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLconfluent
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lightbend
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLFlorent Ramiere
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshopconfluent
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsKonrad Malawski
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Paolo Castagna
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafkaconfluent
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 

Similar to ksqlDB: A Stream-Relational Database System (20)

KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

ksqlDB: A Stream-Relational Database System

  • 1. ksqlDB: A Stream-Relational Database System Matthias J. Sax | Software Engineer @MatthiasJSax
  • 2. The Vision of a Streaming Database 2 SQL abstraction over mutable TABLEs and immutable append-only STREAMs • OLTP? • transaction, concurrent updates • OLAP? • complex ad-hoc queries • Online Stream Processing: • Continuous queries over STREAMs and TABLEs • Simple state lookups into TABLEs • Subscribe to data streams • Streaming ETL • Materialized View maintenance
  • 3. Apache Kafka and ksqlDB 3 Broker Cluster “Storage Layer” ksqlDB Cluster “Compute Layer” Network
  • 4. Replicated & partitioned, immutable append-only log of messages Apache Kafka: Data Model 4
  • 5. Messages: • plain bytes: key (for partitioning), value • long timestamp Topic cleanup policies: • retention: purge old messages (time or size based) • compaction: keep the latest message per key Apache Kafka: Data Model (cont.) 5
  • 7. ksqlDB: Data Model Streams and tables of structured records 7 CREATE STREAM clickstream ( time BIGINT, url VARCHAR, status INTEGER, bytes INTEGER, user_id VARCHAR, agent VARCHAR) WITH ( kafka_topic = ‘cs_topic', value_format = 'JSON' ); CREATE TABLE users ( user_id INTEGER PRIMARY KEY, registered_at LONG, username VARCHAR, name VARCHAR, city VARCHAR, level VARCHAR) WITH ( kafka_topic = ‘users_topic', value_format = 'AVRO’ );
  • 8. ksqlDB: Queries Persistent continuous queries (CQ) • Take one or more input STREAMs/TABLEs and compute a result STREAM or TABLE • Deployed in the ksqlDB servers • Executed in a data-parallel manner • Scalable • Fault-tolerant 8 CREATE STREAM resultStream AS SELECT... CREATE TABLE resultTable AS SELECT...
  • 9. ksqlDB: Queries (cont.) Transient client queries • CLI / Java client / etc. • Pull queries: simple state lookups against TABLEs • “classic” queries • Limited in ksqlDB (no aggregations or joins) • Push queries: subscription to a result STREAM 9
  • 10. ksqlDB: Overview 10 Kafka Cluster (stores STREAMs/TABLEs as partitioned logs) ksqlDB cluster Upstream Producers CQ Downstream Consumers App App CQ Push (Subscription) Pull (Query)
  • 12. Queries Semantics 12 Streaming queries have temporal semantics based on event-time TABLE queries: • Like regular SQL • However: TABLEs are treated as “versioned” tables that evolve over time How to query a STREAM?
  • 13. Simple Stream Queries 13 Filter, projection etc (stateless) CREATE STREAM user_clicks AS SELECT user_id, status, ucase(agent) FROM clickstream WHERE user_id = ‘mjsax’;
  • 14. Stream Aggregation 14 Windowed / non-windowed: returns a continuously updating TABLE CREATE TABLE clicks AS SELECT user_id, COUNT(url) FROM clickstream WHERE bytes > 1024 WINDOW TUMBLIND (size 30 seconds) GROUP BY user_id HAVING COUNT(url) > 20;
  • 15. Stream-Stream Join 15 Sliding-window join: 14:041 14:162 14:083 14:01A 14:11B 14:23C 14:041⨝A 14:162⨝B 14:113⨝B max(l.ts; r.ts) CREATE STREAM joinedStream AS SELECT * FROM leftStream AS l JOIN rightStream AS r WITHIN 5 minutes ON l.id = r.id;
  • 16. Chaining stream-stream joins is not associative! • Order matters: ⨝(s1,s2,s3) != (s1 ⨝ s2) ⨝ s3 != (s1 ⨝ s3) ⨝ s2 != (s2 ⨝ s3) ⨝ s1 N-way Stream-Stream Join 16
  • 17. Chaining stream-stream joins is not associative! • Order matters: ⨝(s1,s2,s3) != (s1 ⨝ s2) ⨝ s3 != (s1 ⨝ s3) ⨝ s2 != (s2 ⨝ s3) ⨝ s1 N-way Stream-Stream Join 17 14:06X 14:21Y 14:212⨝Y⨝b 14:16b14:11a 14:011 14:26314:162 * window size=5min
  • 18. Chaining stream-stream joins is not associative! • Order matters: ⨝(s1,s2,s3) != (s1 ⨝ s2) ⨝ s3 != (s1 ⨝ s3) ⨝ s2 != (s2 ⨝ s3) ⨝ s1 N-way Stream-Stream Join 18 14:06X 14:21Y 14:011 14:26314:162 14:212⨝Y14:061⨝X 14:263⨝Y * window size=5min
  • 19. Chaining stream-stream joins is not associative! • Order matters: ⨝(s1,s2,s3) != (s1 ⨝ s2) ⨝ s3 != (s1 ⨝ s3) ⨝ s2 != (s2 ⨝ s3) ⨝ s1 N-way Stream-Stream Join 19 14:16b14:11a 14:212⨝Y⨝b14:111⨝Y⨝a 14:212⨝Y14:061⨝X 14:263⨝Y * window size=5min
  • 20. Table Queries 20 As in regular SQL, however…
  • 21. 2121 Do you think that’s a table you are querying ?
  • 22. Table Queries 22 Filters, projections, and aggregations are executed on changelog streams Query
  • 24. Stream-Table Join (cont.) 24 Stream-Table join is a temporal join 14:01a 14:03b 14:05c 14:08b 14:11a 14:02… 14:04… 14:07…14:06… 14:10… 14:01a 14:03b 14:05c 14:05 14:01a 14:08b 14:05c 14:08 14:11a 14:08b 14:05c 14:11 14:01a 14:03b 14:03 14:01a 14:01 14:06 14:07 14:1014:0414:02
  • 25. Table-Table Join 25 Table-Table join is a temporal join users details profile v1 v5 v6 v2 v6 v2 v5 v6
  • 27. Persistent Queries Executed as Kafka Streams programs Kafka Streams: • Java client library (part of the Apache Kafka project) • High level DSL to process Kafka topics as KStreams and KTables • Executes a dataflow program, represented as a directed graph of operators • filter()/map() • groupBy()/windowedBy() • aggregate() • join() • Etc. • Scalable • Fault-Tolerant 27
  • 28. Persistent Queries Kafka Stream topologies: 28 Parallel execution:
  • 29. Kafka Streams Internals Consumers, Producers, and RocksDB 29
  • 30. Kafka Streams Internals (cont.) Data repartitioning and fault-tolerance 30
  • 31. Fault-Tolerance and Hot Standbys 31 Recovery: replying the changelog topic Hot standbys: eagerly replaying the changelog topic Hot standbys allow for instant fail-over. Hot standbys allow for HA pull queries.
  • 32. Elasticity 32 Dynamic scaling as special case of store recovery
  • 34. Work in Progress Streaming SQL • Concise and more powerful language • Improved time/operator semantics Consistency guarantees • ksqlDB is an async system: vector clock approach for improved time tracking Applications and transient queries • More efficient and more scalable pull/push queries • Improved INSERT/UPDATE/DELETE support? 34
  • 35. Future Work Query optimization • Currently state: rule based • filter push down • merging of repartition topic • merging of input/output/changelog topics • Streaming cost model and cost-based optimization • Adaptive re-optimization at runtime • Query merging/splitting Runtime improvements: • Internal and optimized (binary) data format to avoid expensive (de)serialization costs • Task assignment: load balancing vs stickyness Richer SQL: • Sub-query support 35
  • 36. References • KSQL: Streaming SQL Engine for Apache Kafka Hojjat Jafarpour, Rohan Desai, Damian Guy EDBT '19: Proceedings of the 22nd International Conference on Extending Database Technology, 2019 • Streams and Tables: Two Sides of the Same Coin Matthias J. Sax, Guozhang Wang, Matthias Weidlich, Johann-Christoph Freytag BIRTE '18: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics, 2018 • https://kafka.apache.org/books-and-papers • https://ksqldb.io/ 36
  • 37. Thanks! We are hiring! @MatthiasJSax matthias@confluent.io | mjsax@apache.org