Kafka streams - From pub/sub to a complete stream processing platform

Kafka Streams
From pub/sub to a complete
stream processing platform
Kafka Meetup Utrecht
Thursday, 8th June 2017
< paolo @ confluent.io >

https://www.confluent.io/blog/stream-data-platform-1/
Industry shift from Big Data
to Fast Data and Stream Processing

$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Apache Kafka APIs and UNIX analogy

Connect APIs

Producer/Consumer APIs

Streams APIs

Streams APIs
part of Apache Kafka
http://kafka.apache.org/documentation/streams
http://docs.confluent.io/current/streams

Build applications, not clusters
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.2.1</version>
</dependency>

How do I run in production?
As any other Java applications...

Uncool Cool

http://docs.confluent.io/current/streams/introduction.html

Elastic and scalable
http://docs.confluent.io/current/streams/developer-guide.html#elastic-scaling-of-your-application

Typical high level architecture

Real-time
Data
Ingestion

Stream
Processing
Storage
Real-time
Data
Ingestion

Data
Publishing /
Visualization
Stream
Processing
Storage
Real-time
Data
Ingestion

How many clusters do you count?
NoSQL
(Cassandra,
HBase,
Couchbase,
MongoDB, …)
or
Elasticsearch,
Solr,
…
Storm, Flink,
Spark
Streaming,
Ignite, Akka
Streams, Apex,
…
HDFS, NFS,
Ceph,
GlusterFS,
Lustre,
...
Apache Kafka

Simplicity is the ultimate sophistication
Apache Kafka
and Kafka Streams APIs
Stream Processing Platform
Publish & Subscribe
to streams of data like a
messaging system
Store
streams of data safely in a
distributed replicated cluster
Process
streams of data efficiently
and in real-time
Node.js

Duality of Streams and Tables
http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables

Interactive Queries
http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-interactive-queries

Kafka Streams DSL
http://docs.confluent.io/current/streams/developer-guide.html#kafka-streams-dsl

WorldCount (and Java 8)
WordCountLambdaExample.java
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
...
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
final KStreamBuilder builder = new KStreamBuilder();
final KStream<String, String> textLines = builder.stream(stringSerde, stringSerde, "TextLinesTopic");
final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS);
final KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
.groupBy((key, word) -> word)
.count("Counts");
wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic");
final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
streams.cleanUp();
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));

Easy to Develop, Easy to Test
WordCountLambdaIntegrationTest.java
EmbeddedSingleNodeKafkaCluster CLUSTER =
new EmbeddedSingleNodeKafkaCluster();
…
CLUSTER.createTopic(inputTopic);
…
Properties producerConfig = new Properties();
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
CLUSTER.bootstrapServers());

Apache Kafka and Streams APIs benefits
• Build applications, not clusters
• Native integration with Apacke Kafka
• Elastic, fast, distributed, fault-tolerant, secure
• Scalable: S, M, L, XL, XXL
• Run everywhere: from containers to cloud
• Streams (with KStream) and tables (with KTable)
• Local state replicated to Kafka for fault-tolerance
• Windowing and event time semantics out of the box
• Supports late-arriving and out-of-order events

References
• http://kafka.apache.org/
• http://kafka.apache.org/documentation/streams/
• http://docs.confluent.io/
• http://docs.confluent.io/current/streams/
• http://docs.confluent.io/current/streams/javadocs/
• http://blog.confluent.io/
• http://github.com/confluentinc/examples/
• http://github.com/apache/kafka/tree/trunk/streams/

The easiest way to get you started
https://www.confluent.io/download/

Discount code: kafcom17
‪Use the Apache Kafka community discount code to get $50 off
‪www.kafka-summit.org
Kafka Summit San Francisco: August 28
Presented by

Kafka streams - From pub/sub to a complete stream processing platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kafka streams - From pub/sub to a complete stream processing platform

Similar to Kafka streams - From pub/sub to a complete stream processing platform (20)

More from Paolo Castagna

More from Paolo Castagna (7)

Recently uploaded

Recently uploaded (20)

Kafka streams - From pub/sub to a complete stream processing platform