SlideShare a Scribd company logo
1 of 60
1Confidential
KSQL
An Open Source Streaming SQL Engine for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
2KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
3KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
4KSQL- Streaming SQL for Apache Kafka
Apache Kafka - A Distributed, Scalable Commit Log
5KSQL- Streaming SQL for Apache Kafka
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Apache Kafka – The Rise of a Streaming Platform
6KSQL- Streaming SQL for Apache Kafka
Apache Kafka – The Rise of a Streaming Platform
7KSQL- Streaming SQL for Apache Kafka
KSQL – A Streaming SQL Engine for Apache Kafka
8KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
9KSQL- Streaming SQL for Apache Kafka
Why KSQL?
Population
CodingSophistication
Realm of Stream Processing
New, Expanded Realm
BI
Analysts
Core
Developers
Data
Engineers
Core Developers
who don’t like
Java
10KSQL- Streaming SQL for Apache Kafka
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
• mapValues()
• filter()
• punctuate()
• Select…from…
• Join…where…
• Group by..
Flexibility Simplicity
Kafka Streams KSQL
Consumer
Producer
11KSQL- Streaming SQL for Apache Kafka
What is it for ?
Streaming ETL
• Kafka is popular for data pipelines
• KSQL enables easy transformations of data within the pipe
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
12KSQL- Streaming SQL for Apache Kafka
What is it for ?
Simple Derivations of Existing Topics
• One-liner to re-partition and / or re-key a topic for new uses
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS
SELECT *
FROM clickstream
PARTITION BY user_id;
13KSQL- Streaming SQL for Apache Kafka
What is it for ?
Analytics, e.g. Anomaly Detection
• Identifying patterns or anomalies in real-time data, surfaced in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY card_number
HAVING count(*) > 3;
14KSQL- Streaming SQL for Apache Kafka
What is it for ?
Real Time Monitoring
• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
15KSQL- Streaming SQL for Apache Kafka
Where is KSQL not such a great fit (at least today)?
Powerful ad-hoc query
○ Limited span of time usually
retained in Kafka
○ No indexes
BI reports (Tableau etc.)
○ No indexes
○ No JDBC (most Bi tools are not
good with continuous results!)
16KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
17KSQL- Streaming SQL for Apache Kafka
KSQL – A Streaming SQL Engine for Apache Kafka
18KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● No need for source code
• Zero, none at all, not even one line.
• No SerDes, no generics, no lambdas, ...
● All the Kafka Streams “magic” out-of-the-box
• Exactly Once Semantics
• Windowing
• Event-time aggregation
• Late-arriving data
• Distributed, fault-tolerant, scalable, ...
19KSQL- Streaming SQL for Apache Kafka
STREAM and TABLE as first-class citizens
20KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS syntax
CREATE STREAM `stream_name`
[WITH (`property = expression` [, …] ) ]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WHERE `condition` ]
[ PARTITION BY `column_name` ]
● where property can be any of the following:
KAFKA_TOPIC = name - what to call the sink topic
FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream
AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to
PARTITIONS = # - number of partitions in sink topic
TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the
event time.
21KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS syntax
CREATE TABLE `stream_name`
[WITH ( `property_name = expression` [, ...] )]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
● where property values are same as for ‚Create Streams as Select‘
22KSQL- Streaming SQL for Apache Kafka
SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
[ LIMIT n ]
where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition
23KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries
● Three types supported (same as Kafka Streams):
• TUMBLING (= SLIDING)
• SELECT appname, ip, COUNT(appname) AS problem_count FROM
logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR'
GROUP BY appname, ip;
• HOPPING
• SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (
size 20 second, advance by 5 second) GROUP BY itemid;
• SESSION
• SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20
second) GROUP BY itemid;
24KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
25KSQL- Streaming SQL for Apache Kafka
Create a STREAM and a TABLE from Kafka Topics
ksql> CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews',
value_format='DELIMITED');
ksql> CREATE TABLE users_original (registertime bigint, gender varchar, regionid varchar, userid varchar) WITH
(kafka_topic='users', value_format='JSON');
ksql> SELECT pageid FROM pageviews_original LIMIT 3;
ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM
pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE';
26KSQL- Streaming SQL for Apache Kafka
Live Demo – KSQL Hello World
27KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
28KSQL- Streaming SQL for Apache Kafka
KSQL - Components
KSQL has 3 main components:
1. The CLI, designed to be familiar to users of MySQL, Postgres etc.
2. The Engine which actually runs the Kafka Streams topologies
3. The REST server interface enables an Engine to receive instructions from the CLI
(Note that you also need a Kafka Cluster… KSQL is deployed independently)
29KSQL- Streaming SQL for Apache Kafka
Kafka Cluster
JVM
KSQL EngineRESTKSQL>
#1 STAND-ALONE AKA ‘LOCAL MODE’
30KSQL- Streaming SQL for Apache Kafka
#1 STAND-ALONE AKA ‘LOCAL MODE’
Starts a CLI, an Engine,
and a REST server all
in the same JVM
Ideal for laptop development
• Start with default settings:
• > bin/ksql-cli local
Or with customized settings:
• > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
31KSQL- Streaming SQL for Apache Kafka
#2 CLIENT-SERVER
Kafka Cluster
JVM
KSQL Engine
REST
KSQL>
JVM
KSQL Engine
REST
JVM
KSQL Engine
REST
32KSQL- Streaming SQL for Apache Kafka
#2 CLIENT-SERVER
Start any number
of Server nodes
• > bin/ksql-server-start
Start any number of CLIs and
specify ‘remote’ server address
• >bin/ksql-cli remote http://myserver:8090
All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up / down without restart
33KSQL- Streaming SQL for Apache Kafka
#3 AS PRE-DEFINED APP
Kafka Cluster
JVM
KSQL Engine
JVM
KSQL Engine
JVM
KSQL Engine
34KSQL- Streaming SQL for Apache Kafka
#3 AS PRE-DEFINED APP
Running the KSQL server
with a pre-defined set of
instructions/queries
• Version control your queries and
transformations as code
Start any number of Engine
instances
• Pass a file of KSQL statements to execute
• > bin/ksql-node query-file=foo/bar.sql
All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up/down without restart
35KSQL- Streaming SQL for Apache Kafka
Dedicating resources
36KSQL- Streaming SQL for Apache Kafka
How do you deploy applications?
37KSQL- Streaming SQL for Apache Kafka
Where to develop and operate your applications?
38KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
39KSQL- Streaming SQL for Apache Kafka
Demo: Clickstream Analysis
Kafka
Producer
Elastic
search
Grafana
Kafka
Cluster
Kafka
Connect
KSQL
Stream of
Log Events
40KSQL- Streaming SQL for Apache Kafka
Demo: Clickstream Analysis
• https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo#clickstream-analysis
• Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana
• 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I
• Setup in 5 minutes (with or without Docker)
SELECT STREAM
CEIL(timestamp TO HOUR) AS timeWindow, productId,
COUNT(*) AS hourlyOrders, SUM(units) AS units
FROM Orders GROUP BY CEIL(timestamp TO HOUR),
productId;
timeWindow | productId | hourlyOrders | units
------------+-----------+--------------+-------
08:00:00 | 10 | 2 | 5
08:00:00 | 20 | 1 | 8
09:00:00 | 10 | 4 | 22
09:00:00 | 40 | 1 | 45
... | ... | ... | ...
41KSQL- Streaming SQL for Apache Kafka
Live Demo – KSQL Clickstream Analysis
42KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
43KSQL- Streaming SQL for Apache Kafka
KSQL Quick Start
github.com/confluentinc/ksql
Local runtime
or
Docker container
44KSQL- Streaming SQL for Apache Kafka
Remember: Developer Preview!
Caveats of Developer Preview
• No ORDER BY yet
• No Stream-stream joins yet
• Limited function library
• Avro support only via workaround
• Breaking API / Syntax changes still possible
BE EXCITED, BUT BE ADVISED
45KSQL- Streaming SQL for Apache Kafka
Resources and Next Steps
Get Involved
• Try the Quickstart on GitHub
• Check out the code
• Play with the examples
The point of a developer preview is to improve things—together!
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
46KSQL- Streaming SQL for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.confluent.io
www.kai-waehner.de
LinkedIn
Questions? Feedback?
Please contact me…
Come to our booth…
Come to Kafka Summit London in April 2018…
47KSQL- Streaming SQL for Apache Kafka
Appendix
48KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● STREAM and TABLE as first-class citizens
● Interpretations of topic content
● STREAM - data in motion
● TABLE - collected state of a stream
• One record per key (per window)
• Current values (compacted topic) not yet
• Changelog
● STREAM – TABLE Joins
49KSQL- Streaming SQL for Apache Kafka
Schema & Format
● A Kafka broker knows how to move bytes
• Technically a key-value message (byte[], byte[])
● To enable declarative SQL-like queries and transformations we have to define
a richer structure
● Structural metadata maintained in an in-memory catalog
• DDL is recorded in a special topic
50KSQL- Streaming SQL for Apache Kafka
Schema & Format
Start with message (value) format
● JSON - the simplest choice
● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in.
Will be expanded.
● AVRO - requires that you also supply a schema-file (.avsc)
Pseudo-columns are automatically provided
• ROWKEY, ROWTIME - for querying the message key and timestamp
• (PARTITION, OFFSET coming soon)
• CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH
(value_format = 'delimited', kafka_topic='my_pageview_topic');
51KSQL- Streaming SQL for Apache Kafka
Schema & Datatypes
● varchar / string
● boolean / bool
● integer / int
● bigint / long
● double
● array(of_type) - of-type must be primitive (no nested Array or Map yet)
● map(key_type, value_type) - key-type must be string, value-type must be
primitive
52KSQL- Streaming SQL for Apache Kafka
Interactive Querying
● Great for iterative development
● LIST (or SHOW) STREAMS / TABLES
● DESCRIBE STREAM / TABLE
● SELECT
• Selects rows from a KSQL stream or table.
• The result of this statement will be printed out in the console.
• To stop the continuous query in the CLI press Ctrl+C.
53KSQL- Streaming SQL for Apache Kafka
SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
[ LIMIT n ]
where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition
54KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries :-)
● Three types supported (same as KStreams):
• TUMBLING (= SLIDING)
• SELECT appname, ip, COUNT(appname) AS problem_count FROM
logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR'
GROUP BY appname, ip;
• HOPPING
• SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (
size 20 second, advance by 5 second) GROUP BY itemid;
• SESSION
• SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20
second) GROUP BY itemid;
55KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS SELECT
● Once your query is ready and you want to run your query non-interactively
• CREATE STREAM AS SELECT ...;
● Creates a new KSQL Stream along with the corresponding Kafka topic and
streams the result of the SELECT query into the topic
● To find what streams are already running:
• SHOW QUERIES;
● If you need to stop one:
• TERMINATE query_id;
56KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS syntax
CREATE STREAM `stream_name`
[WITH (`property = expression` [, …] ) ]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WHERE `condition` ]
[ PARTITION BY `column_name` ]
● where property can be any of the following:
KAFKA_TOPIC = name - what to call the sink topic
FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream
AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to
PARTITIONS = # - number of partitions in sink topic
TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the
event time.
57KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS SELECT
● Once your query is ready and you want to run it non-interactively
● CREATE TABLE AS SELECT ...;
● Just like ‚CREATE STREAM AS SELECT‘ but for aggregations
58KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS syntax
CREATE TABLE `stream_name`
[WITH ( `property_name = expression` [, ...] )]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
● where property values are same as for ‚Create Streams as Select‘
59KSQL- Streaming SQL for Apache Kafka
Functions
● Scalar Functions:
• CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE
• ABS, CEIL, FLOOR, RANDOM, ROUND
• StringToTimestamp, TimestampToString
• GetStringFromJSON
• CAST
● Aggregate Functions:
• SUM, COUNT, MIN, MAX
● User- defined Functions:
• Java Interface
60KSQL- Streaming SQL for Apache Kafka
Session Variables
● Just as in MySQL, ORCL etc. there are settings to control how your CLI
behaves
● Set any property the KStreams consumers/producers will understand
● Defaults can be set in the ksql.properties file
● To see a list of currently set or default variable values:
• ksql> show properties;
● Useful examples:
• num.stream.threads=4
• commit.interval.ms=1000
• cache.max.bytes.buffering=2000000
● TIP! - Your new best friend for testing or building a demo is:
• ksql> set ‘auto.offset.reset’ = ‘earliest’;

More Related Content

What's hot

Terraform Modules and Continuous Deployment
Terraform Modules and Continuous DeploymentTerraform Modules and Continuous Deployment
Terraform Modules and Continuous DeploymentZane Williamson
 
Amazon EKS를 통한 빠르고 편리한 컨테이너 플랫폼 활용 – 이일구 AWS 솔루션즈 아키텍트:: AWS Cloud Week - Ind...
Amazon EKS를 통한 빠르고 편리한 컨테이너 플랫폼 활용 – 이일구 AWS 솔루션즈 아키텍트:: AWS Cloud Week - Ind...Amazon EKS를 통한 빠르고 편리한 컨테이너 플랫폼 활용 – 이일구 AWS 솔루션즈 아키텍트:: AWS Cloud Week - Ind...
Amazon EKS를 통한 빠르고 편리한 컨테이너 플랫폼 활용 – 이일구 AWS 솔루션즈 아키텍트:: AWS Cloud Week - Ind...Amazon Web Services Korea
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLconfluent
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafkaconfluent
 
CRX2Oak - all the secrets of repository migration
CRX2Oak - all the secrets of repository migrationCRX2Oak - all the secrets of repository migration
CRX2Oak - all the secrets of repository migrationTomasz Rękawek
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Oracle E-Business Suite on Kubernetes Cluster
Oracle E-Business Suite on Kubernetes ClusterOracle E-Business Suite on Kubernetes Cluster
Oracle E-Business Suite on Kubernetes Clustervasuballa
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 ActsUnlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 ActsHostedbyConfluent
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka confluent
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overviewJulian Hyde
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
 

What's hot (20)

Terraform Modules and Continuous Deployment
Terraform Modules and Continuous DeploymentTerraform Modules and Continuous Deployment
Terraform Modules and Continuous Deployment
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon EKS를 통한 빠르고 편리한 컨테이너 플랫폼 활용 – 이일구 AWS 솔루션즈 아키텍트:: AWS Cloud Week - Ind...
Amazon EKS를 통한 빠르고 편리한 컨테이너 플랫폼 활용 – 이일구 AWS 솔루션즈 아키텍트:: AWS Cloud Week - Ind...Amazon EKS를 통한 빠르고 편리한 컨테이너 플랫폼 활용 – 이일구 AWS 솔루션즈 아키텍트:: AWS Cloud Week - Ind...
Amazon EKS를 통한 빠르고 편리한 컨테이너 플랫폼 활용 – 이일구 AWS 솔루션즈 아키텍트:: AWS Cloud Week - Ind...
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQL
 
kafka
kafkakafka
kafka
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
 
CRX2Oak - all the secrets of repository migration
CRX2Oak - all the secrets of repository migrationCRX2Oak - all the secrets of repository migration
CRX2Oak - all the secrets of repository migration
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Oracle E-Business Suite on Kubernetes Cluster
Oracle E-Business Suite on Kubernetes ClusterOracle E-Business Suite on Kubernetes Cluster
Oracle E-Business Suite on Kubernetes Cluster
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 ActsUnlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
 
Advanced Terraform
Advanced TerraformAdvanced Terraform
Advanced Terraform
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Apache Calcite overview
Apache Calcite overviewApache Calcite overview
Apache Calcite overview
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 

Similar to Streaming SQL for Apache Kafka

KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...Kai Wähner
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLFlorent Ramiere
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Paolo Castagna
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLconfluent
 
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaReal Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaDavid Peterson
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafkaconfluent
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019confluent
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020Maheedhar Gunturu
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Matt Stubbs
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaMatthias J. Sax
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLconfluent
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 

Similar to Streaming SQL for Apache Kafka (20)

KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaReal Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache Kafka
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 

More from Kai Wähner

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Kai Wähner
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail IndustryKai Wähner
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Kai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingKai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesKai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Kai Wähner
 

More from Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Streaming SQL for Apache Kafka

  • 1. 1Confidential KSQL An Open Source Streaming SQL Engine for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  • 2. 2KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 3. 3KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 4. 4KSQL- Streaming SQL for Apache Kafka Apache Kafka - A Distributed, Scalable Commit Log
  • 5. 5KSQL- Streaming SQL for Apache Kafka The Log ConnectorsConnectors Producer Consumer Streaming Engine Apache Kafka – The Rise of a Streaming Platform
  • 6. 6KSQL- Streaming SQL for Apache Kafka Apache Kafka – The Rise of a Streaming Platform
  • 7. 7KSQL- Streaming SQL for Apache Kafka KSQL – A Streaming SQL Engine for Apache Kafka
  • 8. 8KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 9. 9KSQL- Streaming SQL for Apache Kafka Why KSQL? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java
  • 10. 10KSQL- Streaming SQL for Apache Kafka Trade-Offs • subscribe() • poll() • send() • flush() • mapValues() • filter() • punctuate() • Select…from… • Join…where… • Group by.. Flexibility Simplicity Kafka Streams KSQL Consumer Producer
  • 11. 11KSQL- Streaming SQL for Apache Kafka What is it for ? Streaming ETL • Kafka is popular for data pipelines • KSQL enables easy transformations of data within the pipe CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 12. 12KSQL- Streaming SQL for Apache Kafka What is it for ? Simple Derivations of Existing Topics • One-liner to re-partition and / or re-key a topic for new uses CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS SELECT * FROM clickstream PARTITION BY user_id;
  • 13. 13KSQL- Streaming SQL for Apache Kafka What is it for ? Analytics, e.g. Anomaly Detection • Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTES) GROUP BY card_number HAVING count(*) > 3;
  • 14. 14KSQL- Streaming SQL for Apache Kafka What is it for ? Real Time Monitoring • Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 15. 15KSQL- Streaming SQL for Apache Kafka Where is KSQL not such a great fit (at least today)? Powerful ad-hoc query ○ Limited span of time usually retained in Kafka ○ No indexes BI reports (Tableau etc.) ○ No indexes ○ No JDBC (most Bi tools are not good with continuous results!)
  • 16. 16KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 17. 17KSQL- Streaming SQL for Apache Kafka KSQL – A Streaming SQL Engine for Apache Kafka
  • 18. 18KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● No need for source code • Zero, none at all, not even one line. • No SerDes, no generics, no lambdas, ... ● All the Kafka Streams “magic” out-of-the-box • Exactly Once Semantics • Windowing • Event-time aggregation • Late-arriving data • Distributed, fault-tolerant, scalable, ...
  • 19. 19KSQL- Streaming SQL for Apache Kafka STREAM and TABLE as first-class citizens
  • 20. 20KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS syntax CREATE STREAM `stream_name` [WITH (`property = expression` [, …] ) ] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WHERE `condition` ] [ PARTITION BY `column_name` ] ● where property can be any of the following: KAFKA_TOPIC = name - what to call the sink topic FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to PARTITIONS = # - number of partitions in sink topic TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the event time.
  • 21. 21KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS syntax CREATE TABLE `stream_name` [WITH ( `property_name = expression` [, ...] )] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] ● where property values are same as for ‚Create Streams as Select‘
  • 22. 22KSQL- Streaming SQL for Apache Kafka SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  • 23. 23KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries ● Three types supported (same as Kafka Streams): • TUMBLING (= SLIDING) • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  • 24. 24KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 25. 25KSQL- Streaming SQL for Apache Kafka Create a STREAM and a TABLE from Kafka Topics ksql> CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews', value_format='DELIMITED'); ksql> CREATE TABLE users_original (registertime bigint, gender varchar, regionid varchar, userid varchar) WITH (kafka_topic='users', value_format='JSON'); ksql> SELECT pageid FROM pageviews_original LIMIT 3; ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE';
  • 26. 26KSQL- Streaming SQL for Apache Kafka Live Demo – KSQL Hello World
  • 27. 27KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 28. 28KSQL- Streaming SQL for Apache Kafka KSQL - Components KSQL has 3 main components: 1. The CLI, designed to be familiar to users of MySQL, Postgres etc. 2. The Engine which actually runs the Kafka Streams topologies 3. The REST server interface enables an Engine to receive instructions from the CLI (Note that you also need a Kafka Cluster… KSQL is deployed independently)
  • 29. 29KSQL- Streaming SQL for Apache Kafka Kafka Cluster JVM KSQL EngineRESTKSQL> #1 STAND-ALONE AKA ‘LOCAL MODE’
  • 30. 30KSQL- Streaming SQL for Apache Kafka #1 STAND-ALONE AKA ‘LOCAL MODE’ Starts a CLI, an Engine, and a REST server all in the same JVM Ideal for laptop development • Start with default settings: • > bin/ksql-cli local Or with customized settings: • > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
  • 31. 31KSQL- Streaming SQL for Apache Kafka #2 CLIENT-SERVER Kafka Cluster JVM KSQL Engine REST KSQL> JVM KSQL Engine REST JVM KSQL Engine REST
  • 32. 32KSQL- Streaming SQL for Apache Kafka #2 CLIENT-SERVER Start any number of Server nodes • > bin/ksql-server-start Start any number of CLIs and specify ‘remote’ server address • >bin/ksql-cli remote http://myserver:8090 All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up / down without restart
  • 33. 33KSQL- Streaming SQL for Apache Kafka #3 AS PRE-DEFINED APP Kafka Cluster JVM KSQL Engine JVM KSQL Engine JVM KSQL Engine
  • 34. 34KSQL- Streaming SQL for Apache Kafka #3 AS PRE-DEFINED APP Running the KSQL server with a pre-defined set of instructions/queries • Version control your queries and transformations as code Start any number of Engine instances • Pass a file of KSQL statements to execute • > bin/ksql-node query-file=foo/bar.sql All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up/down without restart
  • 35. 35KSQL- Streaming SQL for Apache Kafka Dedicating resources
  • 36. 36KSQL- Streaming SQL for Apache Kafka How do you deploy applications?
  • 37. 37KSQL- Streaming SQL for Apache Kafka Where to develop and operate your applications?
  • 38. 38KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 39. 39KSQL- Streaming SQL for Apache Kafka Demo: Clickstream Analysis Kafka Producer Elastic search Grafana Kafka Cluster Kafka Connect KSQL Stream of Log Events
  • 40. 40KSQL- Streaming SQL for Apache Kafka Demo: Clickstream Analysis • https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo#clickstream-analysis • Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana • 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I • Setup in 5 minutes (with or without Docker) SELECT STREAM CEIL(timestamp TO HOUR) AS timeWindow, productId, COUNT(*) AS hourlyOrders, SUM(units) AS units FROM Orders GROUP BY CEIL(timestamp TO HOUR), productId; timeWindow | productId | hourlyOrders | units ------------+-----------+--------------+------- 08:00:00 | 10 | 2 | 5 08:00:00 | 20 | 1 | 8 09:00:00 | 10 | 4 | 22 09:00:00 | 40 | 1 | 45 ... | ... | ... | ...
  • 41. 41KSQL- Streaming SQL for Apache Kafka Live Demo – KSQL Clickstream Analysis
  • 42. 42KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 43. 43KSQL- Streaming SQL for Apache Kafka KSQL Quick Start github.com/confluentinc/ksql Local runtime or Docker container
  • 44. 44KSQL- Streaming SQL for Apache Kafka Remember: Developer Preview! Caveats of Developer Preview • No ORDER BY yet • No Stream-stream joins yet • Limited function library • Avro support only via workaround • Breaking API / Syntax changes still possible BE EXCITED, BUT BE ADVISED
  • 45. 45KSQL- Streaming SQL for Apache Kafka Resources and Next Steps Get Involved • Try the Quickstart on GitHub • Check out the code • Play with the examples The point of a developer preview is to improve things—together! https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql
  • 46. 46KSQL- Streaming SQL for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.confluent.io www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me… Come to our booth… Come to Kafka Summit London in April 2018…
  • 47. 47KSQL- Streaming SQL for Apache Kafka Appendix
  • 48. 48KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● STREAM and TABLE as first-class citizens ● Interpretations of topic content ● STREAM - data in motion ● TABLE - collected state of a stream • One record per key (per window) • Current values (compacted topic) not yet • Changelog ● STREAM – TABLE Joins
  • 49. 49KSQL- Streaming SQL for Apache Kafka Schema & Format ● A Kafka broker knows how to move bytes • Technically a key-value message (byte[], byte[]) ● To enable declarative SQL-like queries and transformations we have to define a richer structure ● Structural metadata maintained in an in-memory catalog • DDL is recorded in a special topic
  • 50. 50KSQL- Streaming SQL for Apache Kafka Schema & Format Start with message (value) format ● JSON - the simplest choice ● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in. Will be expanded. ● AVRO - requires that you also supply a schema-file (.avsc) Pseudo-columns are automatically provided • ROWKEY, ROWTIME - for querying the message key and timestamp • (PARTITION, OFFSET coming soon) • CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH (value_format = 'delimited', kafka_topic='my_pageview_topic');
  • 51. 51KSQL- Streaming SQL for Apache Kafka Schema & Datatypes ● varchar / string ● boolean / bool ● integer / int ● bigint / long ● double ● array(of_type) - of-type must be primitive (no nested Array or Map yet) ● map(key_type, value_type) - key-type must be string, value-type must be primitive
  • 52. 52KSQL- Streaming SQL for Apache Kafka Interactive Querying ● Great for iterative development ● LIST (or SHOW) STREAMS / TABLES ● DESCRIBE STREAM / TABLE ● SELECT • Selects rows from a KSQL stream or table. • The result of this statement will be printed out in the console. • To stop the continuous query in the CLI press Ctrl+C.
  • 53. 53KSQL- Streaming SQL for Apache Kafka SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  • 54. 54KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries :-) ● Three types supported (same as KStreams): • TUMBLING (= SLIDING) • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  • 55. 55KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS SELECT ● Once your query is ready and you want to run your query non-interactively • CREATE STREAM AS SELECT ...; ● Creates a new KSQL Stream along with the corresponding Kafka topic and streams the result of the SELECT query into the topic ● To find what streams are already running: • SHOW QUERIES; ● If you need to stop one: • TERMINATE query_id;
  • 56. 56KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS syntax CREATE STREAM `stream_name` [WITH (`property = expression` [, …] ) ] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WHERE `condition` ] [ PARTITION BY `column_name` ] ● where property can be any of the following: KAFKA_TOPIC = name - what to call the sink topic FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to PARTITIONS = # - number of partitions in sink topic TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the event time.
  • 57. 57KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS SELECT ● Once your query is ready and you want to run it non-interactively ● CREATE TABLE AS SELECT ...; ● Just like ‚CREATE STREAM AS SELECT‘ but for aggregations
  • 58. 58KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS syntax CREATE TABLE `stream_name` [WITH ( `property_name = expression` [, ...] )] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] ● where property values are same as for ‚Create Streams as Select‘
  • 59. 59KSQL- Streaming SQL for Apache Kafka Functions ● Scalar Functions: • CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE • ABS, CEIL, FLOOR, RANDOM, ROUND • StringToTimestamp, TimestampToString • GetStringFromJSON • CAST ● Aggregate Functions: • SUM, COUNT, MIN, MAX ● User- defined Functions: • Java Interface
  • 60. 60KSQL- Streaming SQL for Apache Kafka Session Variables ● Just as in MySQL, ORCL etc. there are settings to control how your CLI behaves ● Set any property the KStreams consumers/producers will understand ● Defaults can be set in the ksql.properties file ● To see a list of currently set or default variable values: • ksql> show properties; ● Useful examples: • num.stream.threads=4 • commit.interval.ms=1000 • cache.max.bytes.buffering=2000000 ● TIP! - Your new best friend for testing or building a demo is: • ksql> set ‘auto.offset.reset’ = ‘earliest’;