Kafka Streams allows developers to build stream processing applications that integrate with Apache Kafka. It provides APIs for processing streams of data in real-time and elastically scaling applications. Typical architectures using Kafka Streams involve ingesting real-time data, processing it with stream processing, and storing or publishing the results.
19. Typical high level architecture
Stream
Processing
Storage
Real-time
Data
Ingestion
20. Typical high level architecture
Data
Publishing /
Visualization
Stream
Processing
Storage
Real-time
Data
Ingestion
21. How many clusters do you count?
NoSQL
(Cassandra,
HBase,
Couchbase,
MongoDB, …)
or
Elasticsearch,
Solr,
…
Storm, Flink,
Spark
Streaming,
Ignite, Akka
Streams, Apex,
…
HDFS, NFS,
Ceph,
GlusterFS,
Lustre,
...
Apache Kafka
22. Simplicity is the ultimate sophistication
Apache Kafka
and Kafka Streams APIs
Stream Processing Platform
Publish & Subscribe
to streams of data like a
messaging system
Store
streams of data safely in a
distributed replicated cluster
Process
streams of data efficiently
and in real-time
Node.js
23. Duality of Streams and Tables
http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
24. Duality of Streams and Tables
http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
28. WorldCount (and Java 8)
WordCountLambdaExample.java
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
...
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
final KStreamBuilder builder = new KStreamBuilder();
final KStream<String, String> textLines = builder.stream(stringSerde, stringSerde, "TextLinesTopic");
final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS);
final KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
.groupBy((key, word) -> word)
.count("Counts");
wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic");
final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
streams.cleanUp();
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
29. Easy to Develop, Easy to Test
WordCountLambdaIntegrationTest.java
EmbeddedSingleNodeKafkaCluster CLUSTER =
new EmbeddedSingleNodeKafkaCluster();
…
CLUSTER.createTopic(inputTopic);
…
Properties producerConfig = new Properties();
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
CLUSTER.bootstrapServers());
30. Apache Kafka and Streams APIs benefits
• Build applications, not clusters
• Native integration with Apacke Kafka
• Elastic, fast, distributed, fault-tolerant, secure
• Scalable: S, M, L, XL, XXL
• Run everywhere: from containers to cloud
• Streams (with KStream) and tables (with KTable)
• Local state replicated to Kafka for fault-tolerance
• Windowing and event time semantics out of the box
• Supports late-arriving and out-of-order events
36. Discount code: kafcom17
Use the Apache Kafka community discount code to get $50 off
www.kafka-summit.org
Kafka Summit San Francisco: August 28
Presented by