Announcing the next-generation dA Platform 2, which includes open source Apache Flink and the new Application Manager. dA Platform 2 makes it easier than ever to operationalize your Flink-powered stream processing applications in production.
3. Stateful Stream Processing with Flink
• As of today, Flink is the most advanced stateful stream
processor available
• Stateful streaming is a hot topic, and it’s here to stay
3
Features:
• Unified Batch & Streaming SQL
• Complex Event Processing Library
• Rich Windowing API
• Event-Time semantics
• Versatile APIs
• Exactly-once fault tolerance
• Queryable State
• Fully scalable and distributed
processing
Integrations:
• Apache Kafka (with exactly-once)
• Apache Hadoop YARN
• Apache Mesos (and DC/OS)
• AWS Kinesis
• Docker & Kubernetes
• ElasticSearch & Cassandra &
HBase
• Legacy message queues
• Hadoop-supported file-systems
• Apache Beam Runner
Operational Features:
• Incremental Checkpointing
• Pluggable, fully asynchronous
Statebackends
• RocksDB file-based state backend
• High-Availability
• Savepoints
• Kerberos Authentication
• SSL data encryption
• Backwards-compatibility for state
and APIs
• Metrics
4. Architectures are changing …
4
The big
central database
Application Application Application Application
From centralized architectures …
DB team
App teams
5. … to Microservices …
5
Application Application Application
Application Application
decentralized
infrastructure
DevOps
decentralized responsibilities
still involves
managing databases
everyone is a mini-
DBA
6. … and Stateful Stream
Processing
6
Application
Sensor
APIs
Application
Application
Application
very simple: state is just part
of the application
micro services on steroids!
encourages to build even more
lightweight and specialized apps
7. … and Stateful Stream
Processing
7
Application
Sensor
APIs
Application
Application
Application
very simple: state is just part
of the application
micro services on steroids!
encourages to build even more
lightweight and specialized apps
Problem: A complete toolset
for managing these kinds of
applications doesn't yet exist
8. The rise of streaming platforms
To solve these problems, companies started building
internal streaming platforms
For example, Netflix presented its Flink-based SPaaS
(Stream Processing as a Service) platform at Flink
Forward San Francisco 2017
There is a need for self-service tools for stateful
streaming applications
8
Netflix SPaaS: https://www.slideshare.net/mdaxini/flink-forward2017netflix-keystonespaas
9. Lessons learned
1. Apache Flink is here to stay
2. The stateful streaming architecture has been widely
adopted
3. There's a gap to fill in tooling for this new architecture
9
11. dA Platform 2
Manage applications and state together
Instead of maintaining separate tools for applications (e.g.
container environment) and state (e.g. databases), use one tool
to manage their stateful streaming applications.
Reduce time to production
dA Platform 2 comes with all the infrastructure needed to
reliably operate streaming applications
It provides a self-service platform to operate streaming apps
Easily adopt streaming within an organization
11
12. Instead of managing Flink streaming jobs
manually …
• Requires users to manually call the APIs in Flink at the
right time
• Handling any unexpected issues on the way
• Manual bookkeeping of savepoints, streaming job
versions, configurations
12
Time
Fraud Detection@c8b3f Fraud Detection@33de6
…
13. … dA Platform manages Flink
• dA Platform operates on a new concept: Applications,
abstracting away the low-level details of Flink
13Time
Fraud Detection@c8b3f Fraud Detection@33de6
Apache Flink managed by dA Platform
14. Application Manager Intro
• Management layer within dA Platform 2, taking care of
application lifecycle and metadata
Application Manager
Fraud Detection@c8b3f Fraud Detection@33de6
Apache Flink managed by dA Platform
15. Lifecycle Management
15
• Start, suspend (without state-loss) or cancel an
application
• Manually Trigger a savepoint, restore to any savepoint
Screenshot of savepoint list +
restore
16. Upgrading an application
16
• Deploy a newer application version
• Upgrade Flink
• Change configuration
• Upgrade modes:
Savepoint Cancel
Submit with
Savepoint
Cancel Submit
Old version New version Old version New version
Suspend and upgrade Cancel and upgrade
17. Forking an application
17
• Stage changes in a pre-production environment
• Run experiments (a/b tests)
• Reprocess past events
Fraud Detection@c8b3f
Fraud Detection@c8b3f
Time
Application keeps running …
18. Architecture
dA Platform 2
Apache Flink
Stateful stream processing
Kubernetes
Container platform
Logging
Metrics
Real-time
Analytics
Anomaly- &
Fraud Detection
Real-time Data
Integration
Reactive
Microservices
Streams from
Kafka, S3,
HDFS,
databases, ...
dA
Application
Manager
Application lifecycle
management
20. Demo Components
20
dA Platform 2
Application Manager
One Application
Payments Dashboard (Europe)
Two Deployments
Staging (eu-west-1)
Production (eu-west-1)
21. Demo Components
21
GitLab to host the code repository and
trigger builds in Jenkins
Jenkins to build and test the code and
initiate upgrades via the Application
Manager’s HTTP API
22. Demo Components
22
Elasticsearch and Kibana
to store and visualize the
dashboard’s data
Data is simulated payments coming
in from around Europe
Upper pane visualizes the relative
proportion of payments from each
country
Lower pane plots the rate over time
of the five highest volume sources
24. Integrations
• Application Manager integrates with centralized logging
and metrics services
• Access log of application for any point in time
Make debugging and monitoring as easy as possible
from day one
24
25. Connectivity
• REST API as first class citizen for custom integrations
• Web-based user interface and command line interface
25
da Platform 2
Application Manager
Managed Flink instances
Your custom service
26. Configuration and Deployments
• Advanced configuration management
• Default configs + deployment specific configuration
• Configuration history
• Support for deploying to multiple deployment targets
• A deployment target is the abstraction for any resource
manager supported by Flink
26
27. dA Platform: Detailed Architecture
27
Resource Management
LoggingMetrics
Application
Manager
Data Persistence
Resource
Allocation
Job
Control
Persistent Storage
28. Architecture notes
• All components are chosen to be cloud-ready. dA
Platform runs on public clouds and on-premise
• All components are pluggable. In particular metrics and
logging integrations
• We plan to support more deployment targets than just
Kubernetes in the future
28
30. dA Platform 2
• Manage applications and state together
• Reduce time to production by relying on the best
practices from the original creators of Apache Flink
• Manage streaming application lifecycle easily
• Make streaming technologies accessible as self-service
platform
30
31. dA Platform 2: Roadmap
• Signup on the data Artisans website for a product
newsletter and Early Access Program.
• General Availability is planned for end of 2017 / early
2018
• Visit the data Artisans booth to learn more
• Reach out at platform@data-artisans.com
31
32. Q & A
dA Platform 2 with Application Manager and Apache Flink®
32
Reach out to us at platform@data-artisans.com
36. Data Sources dA Platform 2 Data Sinks
Resource Management
Data Persistence
dA Platform Architecture
36
Logging
Metrics
Kafka /
Msg Bus
File System
Kafka /
Msg Bus
Database
Document
store
Application
Manager
38. 39
@
A social network build entirely
on the streaming architecture
More: https://data-artisans.com/blog/drivetribe-cqrs-apache-flink
39. Building streaming applications is easy …
• … productionizing them is hard
• Integration with existing infrastructures and processes
• build pipeline
• resource / cluster management
• monitoring
• data sources and sinks, persistent state storage
• Figuring out which components to choose
• Feedback: More time spend on operations than on
implementation
40
40. Self-service streaming platforms
• Companies are building their own Flink streaming
platforms
• Integration with internal infrastructures
• Right now, Flink has limited integration capabilities
41
41. dA Platform 2: Making Flink easy
• dA Platform 2 solves the following problems:
• Managing stateful Flink streaming jobs
• Integrating Flink into infrastructures, and providing
best practices for them
• Providing a self-service Flink Platform
Reduce to time production
• You get the best tools from day one more developer
productivity
42
42. every team needs to solve…
consistent stateful upgrades
application evolution and bug fixes
migration of application state
cluster migration, A/B testing
re-processing and reinstatement
fix corrupt results, bootstrap new applications
state evolution (schema evolution)
43
43. Rethinking data architectures
The infrastructure requirements are changing with this
new architecture
Deployment, scaling, migrations, upgrades and
debugging are easier -- because state and compute are
in the same system.
However, this new architecture requires different tools
and systems.
Feedback from users: Implementation of streaming
applications is easier than deployment and operations
44
Editor's Notes
“In this session, we will introduce dA Platform 2, including AppMgr and Apache Flink. dA Platform makes deployment and management of Apache Flink easier.
We’ll first give a product overview, and then demo the platform”
“my name is RM, I’m a Co Founder and Eng Lead at dA. I’m presenting with Patrick Lucas Senior Data Engineer at dA”.
Worked with uber, netflix, alibaba and many other companies of all sizes on the forefront of innovation in the streaming space
3 main lessons learned
Lesson learned:Maybe the most important lesson: Flink is a huge success
Many good features available! Too many to name them individually
Why is streaming such a hot topic? We believe it’s the architectural shift!
That brings us to the second lesson …
2. Lesson: About Architectures
Individual teams for the applications
Expensive central databases, managed by a dedicated team
performance: state access across boundaries
Consistency: distributed txns required
Upgrades: necessary provision state and compute together.
No dependencies on central data store
Each team manages their application, and their database and just offers a service to the outside world.
But: every team needs tools to manage applications (for example a resource management tool like Mesos or Kubernetes)
And a DBA to manage the DB (including backups, migrations etc.)
We get rid of the external state store by moving state into the database.
State becomes part of the application.
Other benefits of new arch:
- You can immediately react to new incoming signals (general streaming benefit)
performance: state access doesn’t cross boundaries
Consistency: no distributed txns required
Scaling:compute and state are scaled together
Upgrades: provision state and compute together
Time, out-of-order handling naturally fits
There’s just one problem in practice: How do you manage these new applications?
There are no tools available for this right now!
Ideally, there’s one tool that manages stateful applications (not multiple tools)
… This concludes my lessons‘s learned as number 3.
We are working with many companies adopting this new streaming architecture.
And we saw a pattern at companies in how they adopt streaming architectures: When they scale it internally, there’s a need for plattformization.
Flink is the best stream processor available, as we’ll learn again today at Flink Forward
Stateful streaming architecture makes many things so much easier
There’s a need for tools to manage this new architecture.
App manager by example .. A common problem by many users
For example if you have a “Fraud Detection” use case!
And the component in dA Platform managing Applications is of course called “Application Manager”
Declarative application control
Tell the system you want it running – it’ll make sure Flink reaches the desired state
More upgrade modes will come in the future
TODO: Animate slide? Might be an info overload
For the demo, I’ll be flipping between a few different screens to show what’s going on.
First, I’ll show the application manager itself with our running application.
You’ve heard Stephan and Robert talk about this concept of an “Application” that abstracts over Flink jobs as they evolve over time with code and configuration changes.
In the Application Manager, an Application can have multiple Deployments, which represent a copy or instance of the application code running in different ways.
In this demo, there are two Deployments, the production instance and a staging instance used to verify code and configuration changes before going live, both in the same datacenter. You could also have deployments of the same application that run in different datacenters or with different configurations.
Additionally, I’ll be using GitLab to host the code and Jenkins to build, test, and deploy the code via the Application Manager’s API.
I’ve set up a webhook that will cause GitLab to kick off a build in Jenkins whenever I push to the repo.
Finally, that brings me to the demo application itself.
Like in Stephan’s keynote, we have a dashboard that shows simulated payments from throughout Europe.
The Flink application consumes a stream of payment events, groups them by country, and updates an Elasticsearch index with how many payments we saw in the last hour.
On the top there’s a map that visualizes the latest data computed by our Flink job. The bigger and redder a dot, the more payments we saw from that country in the last hour.
On the bottom there’s a plot of the five highest volume countries over the last few days.
App Manager provides everything you need while developing streaming applications from the beginning
REST with Swagger Spec
Stateful streaming is disrupting the way we organize data and its processing
Other benefits of new arch:
performance: state access doesn’t cross boundaries
Consistency: no distributed txns required
Scaling:compute and state are scaled together
Upgrades: provision state and compute together
Time, out-of-order handling naturally fits
Companies of all sizes and shapes
With very view Flink jobs, but also those with many.
Flink has very limited integration capabiltities REST interface, wrappers around CLI frontend
This benefits companies at all sizes and shapes:
- You get the best tools from day one more developer productivity
– You don’t need to learn all the tools needed yourself focus on your use case
Other benefits of new arch:
performance: state access doesn’t cross boundaries
Consistency: no distributed txns required
Scaling:compute and state are scaled together
Upgrades: provision state and compute together
Time, out-of-order handling naturally fits