http://flink-forward.org/kb_sessions/beaming-flink-to-the-cloud-netflix/
Netflix is a data driven company and we process over 700 billion streaming events per day with at-least once processing semantics in the cloud. To enable extracting intelligence from this unbounded stream easily we are building Stream Processing as a Service (SPaaS) infrastructure so that the user can focus on extracting value and not have to worry about boilerplate infrastructure and scale. We will share our experience in building a scalable SPaaS using Flink, Apache Beam and Kafka as the foundation layer to process over 1.3 PB of event data without service disruption.
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Monal Daxini - Beaming Flink to the Cloud @ Netflix
1. Monal Daxini
Engineering Manager
Stream Processing, Real Time Data Infrastructure
@monaldax @Netflix #keystone
Beaming Flink to the Cloud @ Netflix
Sep 12 2016
40. SPaaS Vision (plot)
● Multi-tenant support for stateful stream processing apps
● Autoscaling managed infrastructure
● Support for schemas
● Self Service Tooling
41. SPaaS Architecture (plot)
SPaaS Manager
Titus Container
Runtime
Framework Specific API
or
Common API (Beam)
[ Dockerized Job ]
1. Create
2. Submit 3. Launch
Runner
Flink / Spark /
Mantis
Running Job
1. Submit Job DSL (SQL)
Tooling/ Dashboard
42. Why Apache Beam?
○ Portable API layer to build sophisticated data processing apps
■ Support multiple execution engines
○ Unified model API over bounded and unbounded data sources
○ Millwheel, FlumeJava, Dataflow model lineage
SPaaS - “Beam Me Up, Scotty ! "
43. Iterative build out: then
● First - Flink on Titus in VPC, AWS
○ Titus is a cloud runtime platform for container based jobs
● Next - Apache Beam and Flink runner
SPaaS - Pilot
44. 2.
Keystone SPaaS-Flink Pilot Use Cases
Stream
Consumers
Router
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Demux
MergeControl Plane
Self Service UI
45. 2.
1.
Keystone SPaaS-Flink Pilot Use Cases
Stream
Consumers
Router
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
3.
Demux
MergeControl Plane
Self Service UI
47. Titus Job
Task Manager
IP
Titus Host 4 Titus Host 5
Flink (1.2) Program Deployment (prod shadow)
Zookeeper
Job Manager
(standby)
Job Manager
(master)
Task Manager
Titus Host 1
IP
Titus Host 2
….
Task Manager
Titus Host 3
IP
Titus Job
IPIP
AWS
VPC
ENI
48. Titus High Level Architecture
Titus UITitus UI
Docker
Registry
Docker
Registry
Rhea
container
container
container
docker
Titus Agent
metrics agent
container
container
SPaaS-Flink
Titus executor
logging agent
zfsmesos agent
docker
RheaTitus API
Cassandra
Titus Master
Job Management
& Scheduler
S3
Zookeeper
Docker
Registry
EC2 Autoscaling
API
Mesos Master
Titus UI
(CI/CD)
Fenzo
58. Flink Router perf test (YMMV)
○ Note
■ The tests were performed on a specific use case,
■ running in a specific environment, and with
■ with one specific event stream, and setup.
60. Details..
○ Different runtimes for Flink & Samza routers
○ Massively parallel use-case
■ Per element processing
○ Focused on net outcomes
61. Titus Job
Task Manager
IP
Titus Host 4 Titus Host 5
Flink (1.2) Router
Zookeeper
Job Manager
(standby)
Job Manager
(master)
Task Manager
Titus Host 1
IP
Titus Host 2
….
Task Manager
Titus Host 3
IP
Titus Job
IPIP
AWS
VPC
ENI
Backed state
64. Titus Job
Task Manager
IP
Titus Host 4 Titus Host 5
Fine Grained Recovery FLIP-1
Zookeeper
Job Manager
(standby)
Job Manager
(master)
Task Manager
Titus Host 1
IP
Titus Host 2
….
Task Manager
Titus Host 3
IP
Titus Job
IPIP
X
Flink Job
Restarts
65. Awaiting Flink Features (now)
● Checkpoints and savepoints
○ FLINK-4484 - Unify, Persistent checkpoints, periodic savepoints
○ Compatible across Flink version upgrades
○ Inspection tool for debugging
66. Awaiting Flink Features (now)
● Atomic savepoint and stop (pause) the job
● Dynamic Scaling
○ Resume from savepoint with different parallelism
○ Job elasticity
● FLINK-4545 Adjust TaskManager network buffer when scaling
DONE
67. Awaiting Flink Features (now)
● Cluster Runtime and Elasticity (FLIP - 6)
● Extending Window Function Metadata (FLIP-2)
● Large State support
○ Incremental checkpointing
○ hot standby
68. Awaiting Flink Features
● Metric tags as key-value pairs - FLINK-4245
● FLIP-9 Window Trigger DSL
● FLIP-11 Table API
● Side Inputs / Side Outputs (handle late data)
DONE