After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
2. Guido Schmutz
Working at Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Apache Kafka – Scalable Stream Processing and more!
3. Agenda
1. What is Apache Kafka?
2. Kafka Connect
3. Kafka Integration with other components
4. Kafka Streams
5. KSQL
Apache Kafka – Scalable Stream Processing and more!
4. What is Apache Kafka?
Apache Kafka – Scalable Stream Processing and more!
5. Apache Kafka History
2012 2013 2014 2015 2016 2017
Cluster mirroring
data compression
Intra-cluster
replication
0.7
0.8
0.9
Data Processing
(Streams API)
0.10
Data Integration
(Connect API)
0.11
2018
Exactly Once
Semantics
Performance
Improvements
KSQL Developer
Preview
Apache Kafka – Scalable Stream Processing and more!
1.0 JBOD Support
Support Java 9
6. Apache Kafka – A Streaming Platform
Apache Kafka – Scalable Stream Processing and more!
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
7. Strong Ordering Guarantees
most business systems need strong
ordering guarantees
messages that require relative
ordering need to be sent to the same
partition
supply same key for
all messages that
require a relative order
To maintain global ordering use a
single partition topic
Producer 1
Consumer 1
Broker 1
Broker 2
Broker 3
Consumer 2
Consumer 3
Key-1
Key-2
Key-3
Key-4
Key-5
Key-6
Key-3
Key-1
Apache Kafka – Scalable Stream Processing and more!
13. Demo – Run Producer and Kafka-Console-Consumer
Apache Kafka – Scalable Stream Processing and more!
14. Demo – Java Producer to "truck_position"
Constructing a Kafka Producer
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers","broker-1:9092);
kafkaProps.put("key.serializer", "...StringSerializer");
kafkaProps.put("value.serializer", "...StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);
ProducerRecord<String, String> record =
new ProducerRecord<>("truck_position", driverId, eventData);
try {
metadata = producer.send(record).get();
} catch (Exception e) {}
Apache Kafka – Scalable Stream Processing and more!
15. Demo - MQTT instead of Kafka
Truck-2
truck/nn/
position
Truck-1
Truck-3
2016-06-02 14:39:56.605|98|27|803014426|
Wichita to Little Rock Route2|
Normal|38.65|90.21|5187297736652502631
Apache Kafka – Scalable Stream Processing and more!
16. Demo –MQTT instead of Kafka
Apache Kafka – Scalable Stream Processing and more!
17. Demo MQTT instead of Kafka – how to get the data into
Kafka?
Truck-2
truck/nn/
position
Truck-1
Truck-3
truck
position raw
?
2016-06-02 14:39:56.605|98|27|803014426|
Wichita to Little Rock Route2|
Normal|38.65|90.21|5187297736652502631
Apache Kafka – Scalable Stream Processing and more!
18. Apache Kafka – wait there is more!
Apache Kafka – Scalable Stream Processing and more!
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
21. Kafka Connect – Single Message Transforms (SMT)
Simple Transformations for a single message
Defined as part of Kafka Connect
• some useful transforms provided out-of-the-box
• Easily implement your own
Optionally deploy 1+ transforms with each
connector
• Modify messages produced by source
connector
• Modify messages sent to sink connectors
Makes it much easier to mix and match connectors
Some of currently available
transforms:
• InsertField
• ReplaceField
• MaskField
• ValueToKey
• ExtractField
• TimestampRouter
• RegexRouter
• SetSchemaMetaData
• Flatten
• TimestampConverter
Apache Kafka – Scalable Stream Processing and more!
22. Kafka Connect – Many Connectors
60+ since first release (0.9+)
20+ from Confluent and Partners
Source: http://www.confluent.io/product/connectors
Confluent supported Connectors
Certified Connectors Community Connectors
Apache Kafka – Scalable Stream Processing and more!
23. Demo – Kafka Connect
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position
console
consumer
Apache Kafka – Scalable Stream Processing and more!
2016-06-02 14:39:56.605|98|27|803014426|
Wichita to Little Rock Route2|
Normal|38.65|90.21|5187297736652502631
25. Demo – Call REST API and Kafka Console Consumer
Apache Kafka – Scalable Stream Processing and more!
26. Kafka Integration with other
components
Apache Kafka – Scalable Stream Processing and more!
27. Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular products / frameworks
• Apache Spark Streaming
• Apache Flink
• Apache Storm
• Apache Apex
• Apache NiFi
• StreamSets
• Oracle Stream Analytics
• Oracle Service Bus
• Oracle GoldenGate
• Oracle Event Hub Cloud Service
• Debezium CDC
• …
Additional Info: https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Apache Kafka – Scalable Stream Processing and more!
28. StreamSets Data Collector
• Founded by ex-Cloudera, Informatica
employees
• Continuous open source, intent-driven, big data
ingest
• Visible, record-oriented approach fixes
combinatorial explosion
• Batch or stream processing
• Standalone, Spark cluster, MapReduce cluster
• IDE for pipeline development by ‘civilians’
• Relatively new - first public release September
2015
• So far, vast majority of commits are from
StreamSets staff
Apache Kafka – Scalable Stream Processing and more!
29. Demo StreamSets Data Collector
Truck-3
truck
position raw
truck/nn/
positionTruck-4
Truck-5
Kafka to
Cassandra
{"truckid":"57","driverid":"15","routeid":"1927624662
","eventtype":"Normal","latitude":"38.65","longitude":
"-90.21","correlationId":"4412891759760421296"}
MQTT-2
to Kafka
Edge
Port: 1883
trucking
Apache Kafka – Scalable Stream Processing and more!
30. Demo StreamSets Data Collector
Apache Kafka – Scalable Stream Processing and more!
31. Demo StreamSets Data Collector
Apache Kafka – Scalable Stream Processing and more!
32. Demo StreamSets Data Collector
Apache Kafka – Scalable Stream Processing and more!
33. Demo StreamSets Data Collector
Apache Kafka – Scalable Stream Processing and more!
34. Demo StreamSets Data Collector
Truck-3
truck
position raw
truck/nn/
positionTruck-4
Truck-5
Kafka to
Cassandra
{"truckid":"57","driverid":"15","routeid":"1927624662
","eventtype":"Normal","latitude":"38.65","longitude":
"-90.21","correlationId":"4412891759760421296"}
MQTT-2
to Kafka
Edge
Port: 1883
trucking
what about some
analytics ?
Apache Kafka – Scalable Stream Processing and more!
36. Kafka Streams - Overview
• Designed as a simple and lightweight library in Apache
Kafka
• no external dependencies on systems other than Apache
Kafka
• Part of open source Apache Kafka, introduced in 0.10+
• Leverages Kafka as its internal messaging layer
• Supports fault-tolerant local state
• Event-at-a-time processing (not microbatch) with millisecond
latency
• Windowing with out-of-order data using a Google DataFlow-like
model
Apache Kafka – Scalable Stream Processing and more!
37. Kafka Stream DSL and Processor Topology
KStream<Integer, String> stream1 =
builder.stream("in-1");
KStream<Integer, String> stream2=
builder.stream("in-2");
KStream<Integer, String> joined =
stream1.leftJoin(stream2, …);
KTable<> aggregated =
joined.groupBy(…).count("store");
aggregated.to("out-1");
1 2
lj
a
t
State
Apache Kafka – Scalable Stream Processing and more!
38. Kafka Stream DSL and Processor Topology
KStream<Integer, String> stream1 =
builder.stream("in-1");
KStream<Integer, String> stream2=
builder.stream("in-2");
KStream<Integer, String> joined =
stream1.leftJoin(stream2, …);
KTable<> aggregated =
joined.groupBy(…).count("store");
aggregated.to("out-1");
1 2
lj
a
t
State
Apache Kafka – Scalable Stream Processing and more!
39. Kafka Streams Cluster
Processor Topology
Kafka Cluster
input-1
input-2
store (changelog)
output
1 2
lj
a
t
State
Apache Kafka – Scalable Stream Processing and more!
44. KSQL: a Streaming SQL Engine for Apache Kafka
• Enables stream processing with zero coding required
• The simples way to process streams of data in real-time
• Powered by Kafka and Kafka Streams: scalable, distributed, mature
• All you need is Kafka – no complex deployments
• available as Developer preview!
• STREAM and TABLE as first-class citizens
• STREAM = data in motion
• TABLE = collected state of a stream
• join STREAM and TABLE
Apache Kafka – Scalable Stream Processing and more!
45. Demo – KSQL
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position
detect_danger
ous_driving
dangerous_
driving
console
consumer
2016-06-02 14:39:56.605|98|27|803014426|
Wichita to Little Rock Route2|
Normal|38.65|90.21|5187297736652502631
Apache Kafka – Scalable Stream Processing and more!
Kafka to
Cassandra
trucking
46. Demo (V) - Start Kafka KSQL
$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092
======================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ | | =
= | ' /| (___ | | | | | =
= | < ___ | | | | | =
= | . ____) | |__| | |____ =
= |_|______/ __________| =
= =
= Streaming SQL Engine for Kafka =
Copyright 2017 Confluent Inc.
CLI v0.1, Server v0.1 located at http://localhost:9098
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>
Apache Kafka – Scalable Stream Processing and more!