SlideShare a Scribd company logo
1 of 130
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Netflix Keystone SPaaS
S t r e a m P r o c e s s i n g A s a S e r v i c e
A B D 3 2 0
Monal Daxini @monaldax #reInvent #Netflix
Stream Processing Infrastructure
@monaldax
● Data Engineer Why stream processing, and what to expect
from a platform?
● Data Leader Product / Vision of Stream Processing As a
Service platform
● Platform engineer How to build and operate a a Stream
What Do I Get Out Of This Talk?
Different perspectives:
@monaldax
@monaldax
● I will focus on stream processing platform for business
insights, which my team builds mostly based on Flink
● I won’t be addressing operational insights for which we
have different systems
@monaldax
Why Stream
Processing?
@monaldax
@monaldax
● Low latency insights and analytics
● Processing data as it arrives helps spread workload
over time, & reduce processing redundancy
● Need to process unbounded data sets becoming
increasingly common
Why Real Time Data?
@monaldax
● Enable users to focus on data and business insights,
and not worry about building stream processing
infrastructure and tooling
Why Build A Stream Processing
Platform?
@monaldax
What Does A Stream
Processing Platform
Offer?
@monaldax
Platform Needs To Offer Robust Way To Process
Streams Allowing To Tradeoff Between Ease,
Capability, &Flexibility
SPaa
S
@monaldax
Point & Click
Routing, Filtering,
Projection
Streaming
Jobs
● Support Streaming SQL Future
● Interactive exploration of streams for quick
prototyping Future
Stream Processing as a Service platform
offers
@monaldax
Point & Click
Routing, Filtering,
Projection
@monaldax
Event
Producers
Sinks
Ingest Pipelines Are The Backbone Of A
Real-time Data Infrastructure
SERVERL
ESS
Turnkey
100% in
AWS
@monaldax
Keystone Pipeline– Provision A Managed
Data Stream 📽
@monaldax
Keystone Self-serve - Optional Filter &
Message Parser
* We would eventually like to move away from xpath & our
custom parser
@monaldax
* We would eventually like to move away from xpath & our
custom parser
Keystone Self-serve – Message Format
@monaldax
Keystone Self-Serve – Optional
Projection 📽
@monaldax
Keystone Self-serve – Elasticsearch
Sink Config
@monaldax
Keystone Self-serve – Kafka Sink Partition
Key Support
@monaldax
Keystone - Configure 1 Data Stream, A Filter, &
3 Sinks
Event
Producer
Create Kafka Topic, And Three
Separate Jobs
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
KC
W
Elasticsearc
h
3 Jobs1
Topic
Keystone
Management
1
Topic
@monaldax
Event Flow: Producer Uses Kafka Client
Wrapper Or Proxy
SPaaS
Router
Fronting
Kafka
Event
Producer
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Elasticsearc
h
@monaldax
Event Flow: Events Queued In Kafka
SPaaS
Router
Fronting
Kafka
Event
Producer
KSGateway
Consumer
Kafka
KC
W
Elasticsearc
h
3
instance
s
Keystone
Management
@monaldax
Event Flow: Each Router Reads From Source,
Optionally Applies Filter & Projection
SPaaS
Router
Fronting
Kafka
Event
Producer
KSGateway
Consumer
Kafka
KC
W
Elasticsearc
h
3
instance
s
Keystone
Management
@monaldax
Event Flow: Each Router Writes To Their
Respective Sinks
SPaaS
Router
Fronting
Kafka
Event
Producer
KSGateway
Consumer
Kafka
KC
W
Elasticsearc
h
3
instance
s
Non-
Keyed
Keyed
Supported
Keystone
Management
@monaldax
Dashboard & Alert Config Generation For
Provisioned Streams
Searchable Router Job Logs
Flink Job Web UI
@monaldax
k
@monaldax
Keystone Admin Links
Data Stream Operations is Managed
• Fully managed scaling
• Managed capacity planning
• 24 X 7 availability [Scale]
• Garbage collect unused streams
@monaldax
Keystone Pipeline - The Road Ahead
• Additional components – UDFs, Data Hygiene, Data
Alerting, etc
• Component chaining
• Schema Support
• Data Lineage
• Cost attribution
@monaldax
@monaldax
Point & Click
Routing, Filtering,
Projection
(prod)
Streaming
Jobs
Why A Streaming Job?
• When we need more flexibility and power than
what Point & Click pipeline offers, use stream
processing jobs.
@monaldax
Generate Streaming Job From Template
@monaldax
Generated Jenkins Build
Run And Debug Locally In The IDE
@monaldax
Create A New Streaming Job Config For
Deployment
@monaldax
Deploying A Streaming Job In Test
@monaldax
Deploying A Streaming Job In Other
Environments
@monaldax
Deployment Status Of A Sample Streaming Job
Streaming Job Actions & Links
@monaldax
Streaming Job Dashboard
@monaldax
Searchable Streaming-Job Logs
@monaldax
@monaldax
● Use case specific consulting
● Recipes
● Examples and Documentation
In Addition, Consulting &
Documentation
@monaldax
Types of Streaming
Jobs
Broadly, Two Categories Of Streaming Jobs
• Stateless
• No state maintained across events
• Stateful
• State maintained across events
@monaldax
Event
Producer
Streaming Job In Context Of Keystone Pipeline
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Elasticsearc
h
Streaming
Job
@monaldax
Image adapted from:
Stateless Stream Processor – No Internal State
@monaldax
Stateless Stream Processor – External State
Image adapted from:
Stephan Ewen@monaldax
Stateless Example: Generating Plays Feed For
Personalization,
And Discovery Of Shows
@monaldax
Stateless Streaming Job Use Case: High Level
Architecture
Enriching And Identifying Certain Plays
Playback
History
Service
Video
Metada
ta
Streaming
Job
Play
Logs
Live
Service
Lookup
Data
Stateful Stream Processing
Image adapted from:
Stephan Ewen
@monaldax
Stateful Example: Creating Search Sessions
Search Personalization – Custom
Windowing On Out-of-order Events
...... S ES
……….Session 2: S
Hour
s
S E
Session 1:
SE …
@monaldax
Streaming
Application
Flink
Engine Local
State
Stateful Streaming Application With Local
State, Checkpoints, And Savepoints
Sinks
Savepoints
(Explicitly
Triggered)
Checkpoi
nts
(Automatic)
Sources
@monaldax
Streaming Job (Flink) Savepoint Tooling
Support
• S3 based multi-tenant storage management
• Auto savepoint and resume from savepoint on redeploy
• Resume from an existing savepoint
@monaldax
Streaming Job (Flink) High Level Features
• Stateless jobs
• Event enrichment support by accessing services using
platform thick clients
• Stateful jobs 100s of GB, with larger state support in the
works
• Reusable blocks (in progress)
• Job development, deployment, and monitoring tooling
(alpha)@monaldax
Streaming Jobs - The Road Ahead
• Easy resource provisioning estimates
• Flink support for reading and writing from data warehouse,
backfill
• Continue to evolve tooling and support for large state
• Reusable Components - sources, sinks, operators, schema support,
data hygiene
• Tooling support for Spark Streaming
@monaldax
@monaldax
Scale?
Prod – Trending Events & Scale
With Events Flowing To Hive, Elasticsearch,
Kafka
≅ 80B to
1.3T
• 1.3T+ events processed per day
• 600B to 1T unique per day
• 2+ PB in 4.5+ PB out per day
• Peak: 12M events in / sec & 36
GB / sec
@monaldax
@monaldax
Keystone Router Stream Processing
Jobs Scale
m4.4xl
@monaldax
How Do We Do It?
@monaldax
RTDI Consists Of 4 Systems. Keystone Pipeline
Runs 24 X 7, & Does Not Impact Members
Ability To Play Videos
Keystone
Stream
Processing
(SPaaS)
Keystone
Management
Keystone
Messaging
24 x 7
- Dev
- Test
- Prod
Granular
shadowin
g
Event
Producer
Components & Streaming Jobs
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Hive
Elasticsearc
h
Streaming
Job
@monaldax
Event
Producer
Event Producer Library
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Hive
Elasticsearc
h
Streaming
Job
@monaldax
• Inject event metadata - GUID, timestamp, host, app
• Transparent and dynamic traffic routing for
producers
• Chaski - Custom binary data wrapper within
Keystone pipeline
• Multiple serialization support & Additional
metadata
Producer Library - Kafka Client Wrapper
@monaldax
Streaming
JobEvent
Producer
Boundary Of Custom Binary Data Wrapper
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Hive
Elasticsearc
h
@monaldax
• Automated Kafka producer buffer (60s) tuning
based on traffic
• Best effort delivery, Prioritizes host application
availability
• acks=1, Do not block to send events, Unclean
leader election
• Non-keyed messages, retry send to available
Producer Library - Kafka Client Wrapper
@monaldax
Event
Producer
Ksgateway - Event Proxy For Non-java Clients,
REST & GRPC
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Hive
Elasticsearc
h
Streaming
Job
@monaldax
Event
Producer
Kafka Clusters (0.10) on Amazon EC2
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Elasticsearc
h
Streaming
Job
@monaldax
• Have message sizes > 1MB and up to 10MB
• Large Scale Keystone Ingest pipelines results in
large fan out
• Lower Latency – used for ad-hoc messaging as
well
• Open source – enhance, patch, or extend
Why Kafka?
@monaldax
Scale for Large Fan-out and Isolation -
Cascading Topology
Fronting
Kafka
Consumer
Kafka
Consumer
@monaldax
Alternative: Logical Stream (Topic) Spread
Across Multiple Topics Across Multiple
Clusters (WIP)
Multi-Cluster
Producer
Multi-
Cluster
Consumer
@monaldax
• Dedicated Zookeeper cluster per Kafka cluster
• Small Clusters < 200 brokers, partitions <= 10K
• Partitions distributed evenly across brokers
• Rack-aware replica assignment, brokers spread in
3 Zones
• 2 copies & Unclean leader election on
• Non-transactional
Kafka Deployment Strategies – Version 0.10
(YMMV)
@monaldax
• 36+ Kafka & Zookeeper clusters
• 4000+ brokers (EC2), 700+ topics
• 3000+ d2.xl, 900+ i2.2xl
• Highly available 99.99%+
• Retention 2hr, 4hr, 8hr, 24hr
Kafka Clusters Scale
@monaldax
Event
Producer
Stream Processing Platform
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Elasticsearc
h
Stream
Consumers
@monaldax
High-level Stream Processing Platform
Architecture - Routers
Keystone
Management
Point & Click
Router
Streaming Job
Container
Runtime
1.
Create
Streaming
Job
2. Launch Job
with
Config,
Source, Sink,
Filters,
Projections,
etc.
3. Launch
Containers
• Immutable Image
• Automated, system driven config
overrides
@monaldax
• Keystone pipeline is built on Flink Routers
• Each Flink Router is a stream processing job
• Router provisioning based on incoming traffic or
estimates
• Runs on containers atop EC2
• Island mode - single AWS Region
Streaming Jobs 1.3.2
@monaldax
High-level Stream Processing Platform
Architecture
Streaming Jobs
Keystone
Management
Point & Click or
Streaming Job
Container
Runtime
1.
Create
Streaming
Job
2. Launch Job
with
Config
overrides 3. Launch
Containers
• Immutable Image
• User driven config
overrides
@monaldax
Stream Processing Platform - Layered cake
Amazon EC2
Titus Container Runtime
Stream Processing Platform
(Flink Streaming Engine, Config Management)
Reusable Components
Source & Sink Connectors, Filtering, Projection, etc.
Routers
(Streaming Job)
Streaming Jobs
@monaldax
@monaldax
Flink Job Cluster In HA Mode
Zookeeper
Job
Manager
Leader
(WebUI)
Task
Manager
Task
Manager
Task
Manager
Job
Manager
(WebUI)
One dedicated
Zookeeper cluster for
all streamig Jobs
Flink Task Slots & Automatic Operator
Chaining
Image: Flink 1.2
documentation@monaldax
@monaldax
Flink Job Cluster In HA Mode With
Checkpoints
Zookeeper
Job
Manager
(Leader)
Task
Manager
Task
Manager
Task
Manager
Job
Manager
State
Checkpoints
State
Metadata
Checkpoint
s
Flink Checkpoints Similar To 2 Phase Commit
Image: Flink 1.2 documentation@monaldax
@monaldax
Titus
Job
Task
Manager
I
P
Titus Host 4 Titus Host 5
Checkpoints Are Taken Often
Zookeep
er
Job
Manager
(standby)
Job
Manager
(master)
Task
Manager
Titus Host
1 I
P
Titus
Host 1
…. Task
Manager
Titus Host
2 I
P
Titus
Job I
P
I
P
AWS
VPC
State
-
Checkpoints
- Kafka
Offset
Save
@monaldax
Titus
Job
Task
Manager
I
P
Titus Host 4 Titus Host 5
Checkpoints Are Taken Often. A Container
Could Fail…
Zookeep
er
Job
Manager
(standby)
Job
Manager
(master)
Task
Manager
Titus Host
1 I
P
Titus
Host 1
…. Task
Manager
Titus Host
2 I
P
Titus
Job I
P
I
P
AWS
VPC
State
-
Checkpoints
- Kafka
Offset
Save
X
@monaldax
Titus
Job
Task
Manager
I
P
Titus Host 4 Titus Host 5
Zookeep
er
Job
Manager
(standby)
Job
Manager
(master)
Task
Manager
Titus Host
1 I
P
Titus
Host 2
…. Task
Manager
Titus
Host 3 I
P
Titus
Job I
P
I
P
AWS
VPC
State
-
Checkpoints
- Kafka
Offset
Restor
e
Restored To Last Checkpoint, Partially
Recovery Supported
Replacement
container
Event
Producer
and Streaming Jobs Management
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Hive
Elasticsearc
h
Streaming
Job
@monaldax
@monaldax
Keystone Management Current Architecture -
Imperative
Composab
le Joblets
Composab
le Joblets
@monaldax
Keystone Management New Architecture (WIP) –
Declarative
@monaldax
Keystone Management New Architecture (WIP)
• The ability to pass data along the chain of Joblets within a
Job
• Locks and semaphores on resources spanning across jobs
• Customization and integration into Netflix ecosystem –
Eureka, etc.
Keystone Management Unique Features
@monaldax
@monaldax
How Do We Operate
It?
Scale Operations Using Systems Not
Humans
• No separate Ops team
• No separate QA team
• No separate Dev team
• It’s all done by developers of the Real Time Data
Infrastructure
We Run What We Build!
@monaldax
• We rely on metrics, monitoring, alerting & paging, &
automation
• Separate metrics system – Atlas
• Separate alert configuration and alert actions system
• Options for separate system to run cross-system
automation tasks
We Leverage Other Netflix Systems
@monaldax
Easy Alert Configuration And Status
@monaldax
Easy View Of Fired Alerts
@monaldax
Streaming
JobEvent
Producer
Operating Ksgateway - Event Proxy For Non-
Java Clients
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management Hive
Elasticsearc
h
• Stateless Service
• Scaled Using Elastic Load Balancing and
Auto Scaling Group
• Pre-scaled for planned increase in traffic
@monaldax
Streaming
JobEvent
Producer
Event Producer Related Monitoring And Alerts
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Elasticsearc
h
@monaldax
@monaldax
Monitoring Producer, Alert On Drop Rate
Event
Producer
Kafka Clusters
SPaaS
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Hive
Elasticsearc
h
Streaming
Job
@monaldax
@monaldax
Kafka Failover - Fronting Kafka Clusters
@monaldax
Fully
Automated
Kafka Cluster Failover – As Fast As 5 Minute
@monaldax
Kafka Cluster & Routers In
Healthy State
Flink
RouterFronting
Kafka
Event
Producer
@monaldax
Issue With Kafka Cluster
Flink
RouterFronting
Kafka
Event
Producer
X
@monaldax
Launch Backup Kafka Cluster With Same
Number Of Instances, But Smaller Instance
Type
Flink
RouterFronting
Kafka
Event
Producer
Bring up
failover
Kafka cluster
Copy
metadata
from
Zookeeper
X
@monaldax
Change Producer Config To Produce To
Failover Cluster, And Launch Routers For
Failover Traffic
Flink
RouterFronting
Kafka
Event
Producer
Failover
Flink
Router
X
@monaldax
Change Producer Config To Original
Cluster, And Finish Draining Events From
Backup Flink Router
Flink
RouterFronting
Kafka
Event
Producer
Failover
Flink
Router
@monaldax
Decommission Backup Cluster And Router
Once Original Cluster Is Fixed, Or A
Replacement Cluster Is Live
Flink
RouterFronting
Kafka
Event
Producer
Failover
Flink
Router
X X
@monaldax
Flink
RouterFronting
Kafka
Event
Producer
Back To Steady State With Click Of A
Button
• Failover currently supported for Fronting Kafka
clusters only
• We are working on multi-consumer client with
support for keyed message to support failover of
consumer Kafka clusters.
Consumer Kafka Clusters
@monaldax
Planned & Regular
Kafka Kong
This Automation Also Serves As Kafka Kong,
A Tool That Follows Principles Of Chaos
Engineering
@monaldax
• Over provision for variations and traffic for
failover
• Broker health & outlier detection and auto
termination
• 99 percentile response time
• Broker TCP timeouts, errors, retransmissions
• Producer’s send latency
Kafka Operation Strategies (YMMV)
@monaldax
• Scale up by
• Adding partitions – to new brokers, requires no
keyed messages
• Partition reassignment – in small batches with
custom tool
• Scale down by
• Create New topics / New clusters
• Create new clusters - use Kafka failover
Kafka Operation Strategies (YMMV)
@monaldax
Event
Producer
Stream Processing Platform
Router
Fronting
Kafka
KSGateway
Consumer
Kafka
Keystone
Management
KC
W
Elasticsearc
h
Flink Streaming
Job
@monaldax
• Container replacement
• Checkpoints and Savepoints
• Keep retrying if event data format is valid
• Isolation – issue with one sink does not impact
another
Routers & Streaming Job Fault Tolerance
By Design
@monaldax
• Provision new or updated streams
• Bulk updates and terminate routers and re-
deployment
• Automatic partial recovery allows zero-touch
migration of underlying container infrastructure
• Manual – KSRunbook
Router Deployment Automation
@monaldax
Manual Intervention, We Have Runbook.
Goal Is To Automate And Keep Runbook
Small
@monaldax
• Per stream provisioning based on past weeks traffic or
bit rate estimate
• Provision buffer capacity
• Run 1 additional container for latency sensitive
consumers
• Manual, % increase, easy to compute and deploy
• Plan capacity to handle service failover, and
Router Capacity Planning And Provisioning
@monaldax
Admin Tooling To Scale Up Manually, Or To
Deploy A New Build
@monaldax
Application Metrics – Router Message Flow
@monaldax
Application Metrics – Router filtering
@monaldax
Platform-level Metrics – Kafka Offset
Metrics
System Metrics - Router JVM Metrics
@monaldax
Alerts– Hive Sink Router
@monaldax
@monaldax
Flink Streaming Job
● Split between application and infrastructure
● Metrics and monitoring and
● Alerts
● Paging and on-call rotations
● Platform customers follow the same “We build it we run it
model”
Example Streaming Job Application Level
Simulated Metrics
Example Streaming Job System Level
Simulated Metrics
@monaldax
Operations – The road ahead
● True auto scaling
● Bootstrap capacity planning for stateful streaming jobs
● Automated Canary tooling & Data parity
● Point and Click components quick testing, and performance
profiling
● E.g., - iterating over a Filter definition
@monaldax
I Want To Learn More
● http://bit.ly/mLOOP - Deep dive into Unbounded Data
Processing Systems
● http://bit.ly/m17FF - Keynote – Stream Processing with Flink
at Netflix
● http://bit.ly/2BoYAq0 - Multi-tenant Multi-cluster Kafka
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU!
M o n a l D a x i n i
@ m o n a l d a x
Que sti o ns?

More Related Content

What's hot

Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentIntroducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentHostedbyConfluent
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetesconfluent
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit... Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...Databricks
 

What's hot (20)

Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentIntroducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
kafka
kafkakafka
kafka
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit... Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 

Similar to Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - re:Invent 2017

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming ApplicationsC4Media
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a ServiceSteven Wu
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 confluent
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...HostedbyConfluent
 
Patterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxiniPatterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxiniMonal Daxini
 
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobAkka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobLightbend
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comCedric Vidal
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker seriesMonal Daxini
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 

Similar to Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - re:Invent 2017 (20)

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming Applications
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
 
Patterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxiniPatterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxini
 
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobAkka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - re:Invent 2017