Things were easier when all our data used to be offline, analyzed overnight in batches. Now our data is online, in motion, and generated constantly. For architects, developers and their businesses, this means that there is an urgent need for tools and applications that can deliver real-time (or near real-time) streaming ETL capabilities.
In this session by Konrad Malawski, author, speaker and Senior Akka Engineer at Lightbend, you will learn how to build these streaming ETL pipelines with Akka Streams, Alpakka and Apache Kafka, and why they matter to enterprises that are increasingly turning to streaming Fast Data applications.
4. Make building powerful concurrent &
distributed applications simple.
Akka is a toolkit and runtime
for building highly concurrent,
distributed, and resilient
message-driven applications
on the JVM
5. Actors – simple & high performance concurrency
Cluster / Remoting – location transparency, resilience
Cluster tools – and more prepackaged patterns
Streams – back-pressured stream processing
Persistence – Event Sourcing
HTTP – complete, fully async and reactive HTTP Server
Official Kafka, Cassandra, DynamoDB integrations, tons
more in the community
Complete Java & Scala APIs for all features
What’s in the toolkit?
29. Reactive Streams
Reactive Streams is an initiative to provide a standard for
asynchronous stream processing with non-blocking back
pressure. This encompasses efforts aimed at runtime
environments as well as network protocols
http://www.reactive-streams.org
31. Reactive Streams
Reactive Streams is an initiative to provide a standard for
asynchronous stream processing with non-blocking back
pressure. This encompasses efforts aimed at runtime
environments as well as network protocols
http://www.reactive-streams.org
32. Part of JDK 9 (!)
java.util.concurrent.Flow
http://openjdk.java.net/projects/jdk9/
33. Part of JDK 9 (!)
java.util.concurrent.Flow
http://openjdk.java.net/projects/jdk9/
java.util.concurrent.Flow.* is exactly Reactive Streams.
It follows the RS specification 1:1,
and implementations must be verified using the RS TCK.
35. JEP-266 – NOW RELEASED - IN JDK9!
public final class Flow {
private Flow() {} // uninstantiable
@FunctionalInterface
public static interface Publisher<T> {
public void subscribe(Subscriber<? super T> subscriber);
}
public static interface Subscriber<T> {
public void onSubscribe(Subscription subscription);
public void onNext(T item);
public void onError(Throwable throwable);
public void onComplete();
}
public static interface Subscription {
public void request(long n);
public void cancel();
}
public static interface Processor<T,R> extends Subscriber<T>, Publisher<R> {
}
}
NOW
36. Reactive Streams / j.u.c.Flow
RS Library A RS library B
async
boundary
“Make building powerful concurrent &
distributed applications simple.”
38. Akka Streams native JDK9 support (Akka 2.5.5)
java.util.concurrent.Flow support is merged,
and about to be released this week
(before JavaOne next week).
https://github.com/akka/akka/pull/23650
50. Alpakka – a community for Stream connectors
http://developer.lightbend.com/docs/alpakka/current/
A few months ago we only had these…
51. Alpakka – a community for Stream connectors
http://developer.lightbend.com/docs/alpakka/current/
Now we have these! And still growing!
52. Alpakka – a community for Stream connectors
http://developer.lightbend.com/docs/alpakka/current/
Now we have these! And still growing!
53. Getting things DONE, with Alpakka
http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code
End-to-end Streaming from Kafka to Streaming HTTP endpoint in 5 minutes
54. Getting things DONE, with Alpakka
http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code
1. Skim Alpakka docs, find the dependency (dependencies) you need
55. Getting things DONE, with Alpakka
http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code
1. Skim Alpakka docs, find the dependency (dependencies) you need
2. “Pick your Source[T, …]” (from docs, or APIs)
Example Kafka sources (different semantics):
56. Getting things DONE, with Alpakka
http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code
1. Skim Alpakka docs, find the dependency (dependencies) you need
2. “Pick your Source[T, …]” (from docs, or APIs)
3. Connect that Source to some Sink
Example Kafka sources (different semantics):
57. Getting things DONE, with Alpakka
http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code
1. Skim Alpakka docs, find the dependency (dependencies) you need
2. “Pick your Source[T, …]” (from docs, or APIs)
3. Connect that Source to some Sink
Example Kafka sources (different semantics):
4. Profit.
58. Getting things DONE, with Alpakka
http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code
1. Skim Alpakka docs, find the dependency (dependencies) you need
2. “Pick your Source[T, …]” (“recursively walk FTP and import files”)
3. Connect that Source to some Sink
4. Profit.
.flatMapMerge(3, file => Ftp.fromPath(path, settings)…))(
59. )( .flatMapMerge(3, file => Ftp.fromPath(path, settings)…)
Getting things DONE, with Alpakka
http://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#example-code
1. Skim Alpakka docs, find the dependency (dependencies) you need
2. “Pick your Source[T, …]” (“recursively walk FTP and import files”)
3. Connect that Source to some Sink
4. Profit.
61. Akka, a toolbox – when to use which tool?
http://developer.lightbend.com/docs/alpakka/current/
modeling power
complexity
actors
streams
(completable)
futures
threads, locks
undefined complex concurrency models
blocking non-reactive tech
typed actors
(preview available)
62. Actors ❤ Streams ==True
Actors – perfect for managing state, things with lifecycles,
restarting, running many separate instances in parallel, best
for gathering data from multiple places and resiliency
(Reactive) Streams – best suited for less-dynamic layouts,
has simple lifecycle (running,completed,failed), best for
moving data from A to B
blog.colinbreck.com/integrating-akka-streams-and-akka-actors-part-i/
Excellent post explaining this (look up the author, Colin Breck):
63. But, distributed systems?
Stream-in-Actor
C-1
Shard C
Akka Streams is a local abstraction
Combine with
Akka Cluster
for distributed superpowers!
Stream-in-Actor
A-1
Shard A
Stream-in-Actor
B-1
Shard B
Stream-in-Actor
B-2
65. Next steps for Akka
Release stable new Akka Remoting (over 700.000+ msg/s (!)),
(it is built using Akka Streams, Aeron).
Even more integrations for Akka Streams stages, project Alpakka.
Collaborating with IBM to deliver integrations with new tech (S3, JDBC).
Continued polish of Reactive Kafka important part of Alpakka.
Plans to expand beyond Kafka too!
Akka Typed Cluster, Persistence, Streams integration preview: now.
Preview of working Akka HTTP 2.0.
Akka HTTP powering Play Framework by default.
Streaming in Akka HTTPNext up for Akka and Alpakka
66. Multi Data Center support – coming soon
More details soon…
- Active + Active “Cluster Sharding”
- Proximity aware routers?
- Talk to us about your use cases :)
- …?
67. Next steps for AkkaStreaming in Akka HTTPAkka Monitoring & Tracing
developer.lightbend.com/docs/monitoring/latest/home.html
+
DataDog || StatsD || Graphite || …anything!
Working Zipkin support
Working Jaeger support
69. Totally, go for it.
“If the JDK adopting Reactive Streams isn’t the best
long-term endorsement of stability and commitment…
…then, I don’t know what is.“
70. Read more tutorials and deep-dives:
http://blog.akka.io/
http://developer.lightbend.com/guides
Next up for Akka and Alpakka
71. Further reading:
Akka docs: akka.io/docs
Alpakka docs:
Reactive Streams: reactive-streams.org
Free O’Reilly report – bit.ly/why-reactive
Example Sources:
ktoso/akka-streams-alpakka-talk-demos
Konrad ktoso@lightbend.com Malawski
http://kto.so / @ktosopl