Exploring KSQL Patterns

Neil is a senior engineer and technologist at Confluent, the
company founded by the creators of Apache Kafka®. He has over
20 years of expertise of working on distributed computing,
messaging and stream processing. He has built or redesigned
commercial messaging platforms, distributed caching products as
well as developed large scale bespoke systems for tier-1 banks.
After a period at ThoughtWorks, he went on to build some of the
first distributed risk engines in financial services. In 2008 he
launched a startup that specialised in distributed data analytics and
visualization. Then prior to joining Confluent he was the CTO at a
fintech consultancy.
Neil Avery
Senior Engineer and Technologist, Confluent

Declarative
Stream
Language
Processing
KSQLis a

KSQLis the
Streaming
SQL Enginefor
Apache Kafka

KSQL Concepts
• Streams are first-class citizens
• Tables are first-class citizens
• 2 types of Queries (Pers, Trans)
• Some queries are persistent
• Persistent Queries populate Streams and Tables
• Transient Queries are for users interaction
• All queries run until terminated

CREATE STREAM clickstream
WITH (
value_format = ‘JSON’,
kafka_topic=‘my_clickstream_topic’
);
Creating a Stream
• “Data in motion”
• Let’s say we have a topic called my_clickstream_topic
• The topic contains JSON data
• KSQL now knows about that topic

Exploring that Stream
SELECT status, bytes
FROM clickstream
WHERE user_agent =
‘Mozilla/5.0 (compatible; MSIE 6.0)’;
• Now that the stream exists, we can examine its contents
• Simple, declarative filtering
• A non-persistent query

CREATE TABLE users
(user_id int, registered_At long …)
WITH (
key = ‘user_id',
kafka_topic=‘clickstream_users’,
value_format=‘JSON’
);
Creating a Table
• “A stateful view of a data in motion”
• Can be built from a kafka topic OR another KSQL Stream
• We have a topic called my_clickstream_topic
• The topic contains JSON data, The topic contains changelog data

CREATE TABLE events_per_min
AS SELECT user_id, count(*)AS events
FROM clickstream WINDOW TUMBLING(
size 10 seconds= ‘user_id'
) GROUP BY user_id;
Creating a Table
• Derived from a stream
• Windowed aggregate

Inspecting that Table
SELECT userid, username
FROM users
WHERE level = ‘Platinum’;
• Now that the table exists, we can examine its contents
• Simple, declarative filtering
• A non-persistent query

Joining a Stream to a Table
• Now that we have clickstream and users, we can join them
• This allows us to do filtering of clicks on a user attribute
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';

KSQL for Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe.
• Transforming data while moving from Kafka to another system.
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';

KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds

KSQL for Real-Time
Monitoring• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;

KSQL for Data Transformation
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS
SELECT * FROM clickstream PARTITION BY user_id;
Make simple derivations of existing topics from the command line

Stream Patterns
Stream
Stream with partitioning
(scaling out)
Stream
(fork)
Table view
(windowed - from stream)
Table
(scaled out – across nodes)
Stream table join
(left-join)

Streaming Apps - Clickstream
clickstream
stream (topic)
events_per_mi
n
table (window)
users table
table (topic)
platinum users
stream-table join

• Streams, Tables and Queries
• Many streaming patterns
• The clickstream demo is a good place to ‘grok'
Recap

Resources and Next Steps
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
Signup for
- Part 2: Development
- Part 3 Deployment
https://www.confluent.io/empowering-streams-through-ksql

Exploring KSQL Patterns

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exploring KSQL Patterns

Similar to Exploring KSQL Patterns (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Exploring KSQL Patterns