An elastic batch-and stream-processing stack with Pravega and Apache Flink

Unified and Elastic
Batch and Stream Processing
with Pravega and Apache Flink
Stephan Ewen, data Artisans
Flavio Junqueira, Pravega

Batch and Stream Processing
2DataWorks Summit Berlin - April, 2018

What changes faster? Data or Query?
3
Data changes slowly
compared to fast
changing queries
ad-hoc queries, data exploration,
ML training and
(hyper) parameter tuning
Batch Processing
Use Case
Data changes fast
application logic
is long-lived
continuous applications,
data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream Processing
Use Case
DataWorks Summit Berlin - April, 2018

Streams as a Unified View on Data

Stream Processing Unifies Data Use Cases
Batch Processing
process static and
historic data
Data Stream
Processing
realtime results
from data streams
Event-driven
Applications
data-driven actions
and services
Stateful Computations Over Data Streams
DataWorks Summit Berlin - April, 2018 5

The Quest for Unified
Batch- and Stream Processing

Querying the Past
SELECT
campaign,
TUMBLE_START(clickTime, INTERVAL ’1’ HOUR),
COUNT(ip) AS clickCnt
FROM adClicks
WHERE clickTime BETWEEN ‘2015-01-01’ AND ‘2017-12-31’
GROUP BY campaign, TUMBLE(clickTime, INTERVAL ‘1’ HOUR)
Query
past futurenowstart of
the stream
Use a Batch Processor
(or capable stream processor)
Connect to a bulk storage
(S3, HDFS, GFS, …)
7

Querying the Past
Recorded Events
(File system,
Object storage)
Batch Processor
Massively parallel,
unordered scan.
Algorithms and data structures
to process finite data
8

Querying the Future
SELECT
campaign,
FROM adClicks
WHERE clickTime > now()
Query
the stream
Use a Stream Processor
Connect to a PubSub
(Kafka, Kinesis, PubSub, …)
9

Querying the Future
Real-time Events
(Message Queue,
Event Log)
Stream Processor
Serves real-time events
in order
State and event-time support to
process unbounded data
10

Querying the Past and the Future
SELECT
campaign,
FROM adClicks
WHERE clickTime > ‘2017-01-01’
Query
the stream
Use a Stream Processor?
Connect to both bulk storage
and PubSub?
11

Querying the Past and the Future
Real-time Events
(Message Queue,
Event Log)
Unified
Batch/Stream
Processor
Low-latency serving
of real-time events
Recorded Events
(File system,
Object storage)
Parallel scans of
historic data
Reading data from two systems. Switch from
batch scan to stream ingestion.
12

Building a Unified
Batch and Streaming Stack

The Stack
14
Unified Model,
Semantics, APIs
Unified Storage
Unified Runtime
Same Model/API to treat
historic and real-time data
Same view on and access to
Handle both large historic
and low latency real-time data

The Stack
15
Unified Model,
Semantics, APIs
Unified Storage
Unified Runtime
Same Model/API to treat
Same view on and access to
Handle both large historic
and low latency real-time data

Pravega
Model, Semantics, APIs
Storage
Execution Runtime

Pravega
• Storing data streams
• Young project, under active development
• Open source
http://pravega.io
http://github.com/pravega/pravega

Streams in Pravega

Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
19

Messaging
Pub-sub
Bulk store
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
20

Pravega
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
21

Pravega
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
Unbounded
amount of data
Ingestion rate
might vary
22

Pravega and Streams
….. 01110110 01100001 01101100
….. 01001010 01101111 01101001
01000110
01110110
Ingest stream
data
Process stream
data
23
01000110
01110110
Append
Read
Pravega

Pravega and Streams
01000110
01110110
Append
Read
Group
• Load balance
• Grow and shrink
Pravega
Ingest stream data Process stream data
Event
Writer
Event
Reader
Event
Reader
Event
Writer

Segment
store
Segment
store
The write path
Event
Stream
Writer
Controller
1
2
Apache
BookKeeper
Long-term
storage
3
4
• Synchronous write
• Temporarily stored
• Truncated once flushed to next
storage tier
• Optimized for low-latency writes
• Asynchronous write
• Permanently stored
• Options: HDFS, NFS, Extended S3,
BookKeeper
• High read/write throughput
Append bytes
Locate segment
Segment
store

Guarantees of the write path
• Order
• Writer appends following application order
• Per key order
• No duplicates
• Writer IDs
• Maps to last appended data on the segment store
• Exactly once on the write path
• Deduplication based on writer IDs
• Atomicity for groups of writes with transactions

Segment
store
Segment
store
The read path
Event
Stream
Reader
Controller
1
2
Apache
BookKeeper
Long-term
storage
3
• Used for recovery alone
• Not used to serve reads
• Bytes read from memory
• If not present, pull data from Tier 2
Read bytes
Locate segment
4 Bytes read
Segment
store

The read path
past nowstart of
the stream
Long-term
storage
(Tier 2)
recent
past
Cache
Stream data
Cache
miss
Segment
store
28

Segments

Segments in Pravega
01000111
01110110
11000110
01000111
01110110
11000110
Pravega
Stream Composition of
Segment:
• Stream unit
• Append only
• Sequence of bytes
30

Parallelism

Segments in Pravega
Pravega
01000110
01110110
Segments
Append Read
01101111
01101001
Segments
• Segments are sequences of bytes
• Use routing keys to determine segment
〈key, 01101001 〉
Routing
key
32
Event
Writer
Event
Reader
Event
Reader
Event
Writer

Segments can be sealed

Segments in Pravega
Pravega
01000110
01110110
Segments
Append Read
01101111
01101001
Segments
Once sealed, a segment can’t be
appended to any longer.
E.g., ad clicks
34
Event
Writer
Event
Writer Event
Reader
Event
Reader

How is sealing segments useful?

Segments in Pravega
Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
0110111101101111
01000110
01101111
Stream
Compose to form a stream

Segments in Pravega
Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
0110111101101111
01000110
01101111
Stream
• Each segment can live in a different server
• Not limited to the capacity of a single server
• Unbounded streams
00101111 01101001
37

Segments in Pravega
Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
01101111
01000110
01101111
Stream
01101111

Some useful ways to compose segments

01000110
Scaling a stream
….. 01110110 01100001 01101100 01000110
Stream has one
segment
1
….. 01110110 01100001 01101100
• Seal current
segment
• Create new ones
2
01000110
01000110
• Say input load has increased
• Need more parallelism
• Auto or manual scaling

Routing
key space
0.0
1.0
Time
Split
Segment 1
t0

Routing
key space
0.0
1.0
Time
Split
0.5
Segment 1 Segment 2
Segment 3
t0 t1
Hot keys

Routing
key space
0.0
1.0
Time
Split Split
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
t0 t1
t2
Hot keys

Routing
key space
0.0
1.0
Time
Split Split Merge
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
t0 t1
t2
No longer
hot keys

Routing
key space
0.0
1.0
Time
Split Split Merge
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
t0 t1
t2
Key ranges are not statically
assigned to segments

Daily Cycles
Peak rate is 10x higher than lowest rate
4:00 AM
9:00 AM
NYC Yellow Taxi Trip Records, March 2015
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Pravega Auto Scaling
Merge Split

Transactions
• Transactional writes
• All or nothing
• Any open segment of the stream
• No limitation on the routing key range
• Interleaved with regular writes
• Important for exactly-once semantics
• Either all writes become visible or none
• Aborted manually or via timeout
01100000
Txn segments01110110
01000110
01100000
s1
s2
01111000
Regular write to segment
49

Transactions
01100001
01000110
Stream has two
segments
1
Begin txn
2
Txn segments
Write to txn
3
01100001
Upon commit
5
Seal txn segment
01000110
6
01100001
Merge txn segment
into stream segment
01110110 01110110
01000110
01000110
01100000
Write to txn
4
01100001
01000110
01100000
01110110
01000110
01100000
s1
s2
s1
s2
s1
s2
s1
s2
s1
s2
01100001
Upon commit
s10111011001100000
s2

Transactions
01100001
01000110
Stream has two
segments
1
Begin txn
2
Txn segments
Write to txn
3
01100001
Upon abort
5
Eliminate
segments
01110110 01110110
01000110
01000110
01100000
Write to txn
4
01100001
01000110
01100000
01110110
01000110
s1
s2
s1
s2
s1
s2
s1
s2
s1
s2
01100001
01100000

Unordered reads
01000110
01000110
….. 01110110 01100001 01101100
01000110
• Stream started with single
segment
• Scale up from one to two
segments
• Three segments available
01000110
01000110
01000110
• One iterator per segment
• Can read in parallel from all segments
1 2
Iterate over segments
52

Putting it all together

Segments
Writers, Readers, and Reader Groups
Pravega
Event
Writer
Event
Reader
Event
Reader
01110110
01101111
01101001
Append Read
Reader
group
• Regular and
transactional
appends
• Coordinate the
assignment of
segments
• Checkpointing
Stream
54
Event
Writer

Putting everything together
Event
Writer
Event
Writer
Stream segments
Segment 1
Segment 2
Event
Reader 1
Event
Reader 2
Reader group
Reader group state:
Event Reader 1: {1}
Event Reader 2: {2}
Unassigned: {}

Event
Writer
Event
Writer
Stream segments
Event
Reader 1
Event
Reader 2
Reader group
• Start scaling
• Seal Segment 2
• Create Segments
3 and 4
Reader group state:
Event Reader 1: {1}
Event Reader 2: {2}
Unassigned: {}
Segment 1
Segment 2
Segment 3
Segment 4

Event
Writer
Event
Writer
Stream segments
Segment 1
Segment 2
Event
Reader 1
Event
Reader 2
Reader group
Segment 3
Segment 4
Reader group state:
Event Reader 1: {1}
Event Reader 2: {}
Unassigned: {3, 4}
Controller
• Get successors
from controller
• Add to reader
group state

Event
Writer
Event
Writer
Stream segments
Event
Reader 1
Event
Reader 2
Reader group
Reader group state:
Event Reader 1: {1, 3}
Event Reader 2: {4}
Unassigned: {}
Segment 1
Segment 2
Segment 3
Segment 4

Event
Writer
Event
Writer
Stream segments
Event
Reader 1
Event
Reader 2
Reader group
Reader group state:
Event Reader 2: {4}
Unassigned: {}
Segment 1
Segment 2
Segment 3
Segment 4

Event
Writer
Event
Writer
Stream segments
Event
Reader 1
Event
Reader 2
Reader group
Reader group state:
Event Reader 2: {4}
Unassigned: {}
Segment 1
Segment 2
Segment 3
Segment 4
Initiate checkpoint

Event
Writer
Event
Writer
Stream segments
Event
Reader 1
Event
Reader 2
Reader group
Reader group state:
Event Reader 2: {4}
Unassigned: {}
Segment 1
Segment 2
Segment 3
Segment 4
C
C
Checkpoint
events
Checkpoint
Checkpoint:
- Segment 1: 2
- Segment 3: 1
- Segment 4: 1

Apache Flink
APIs and Execution
Storage
Execution Runtime
63

Apache Flink in a Nutshell
64
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
Stateful computations over streams
real-time and historic
fast, scalable, fault tolerant, in-memory,
event time, large state, exactly-once
Historic
Data
Streams
Application

Internally: Everything Streams

Layered APIs
66
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch
Data Processing
High-level
Analytics API
Stateful Event-
Driven Applications
val stats = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Navigate simple to complex use cases

DataStream API
67
Source
Transformation
Windowed Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Streaming
Dataflow
Source Transform Window
(state read/write)
Sink

SQL (ANSI) – Streaming and Batch
68
SELECT
campaign,
FROM adClicks
Query
the stream

Flink in Practice
69
Athena X Streaming SQL
Platform Service
Streaming Platform as a Service
Fraud detection
Streaming Analytics Platform
100s jobs, 1000s nodes, TBs state
metrics, analytics, real time ML
Streaming SQL as a platform

Unified Batch- & Streaming APIs

Batch and Streaming in the APIs
71
Data changes slowly
compared to fast
changing queries
ML training and
Batch Processing
Use Case
Data changes fast
application logic
is long-lived
Stream Processing
Use Case

Batch and Streaming in the APIs
72
Data changes slowly
compared to fast
changing queries
ML training and
Batch Processing
Use Case
Data changes fast
application logic
is long-lived
Stream Processing
Use Case
DataStream API
Bounded Streams  Unbounded Streams
DataSet API DataWorks Summit Berlin - April, 2018

Latency vs. Completeness
• Streaming trades data completeness
(wait longer for delayed data)
with latency (emit results early)
• Tradeoff is captured by the watermark
which drives Flink's Event Time Clock
• Watermark captures full- or heuristic
completeness with respect to a
certain event time

Latency versus Completeness
74
Bounded/
Batch
Unbounded/
Streaming
Data is as complete
as it gets within the Job
No fine-grained
latency control
Trade of latency
versus completeness

APIs and Execution
Storage
Execution Runtime
75

Streaming Dataflows for Batch and Streaming
76
Source
Filter /
Transform
State
read/write
Sink

Snapshot-based Fault Tolerance
77
Scalable embedded state
Access at memory speed &
scales with parallel operators

Connecting Flink and Pravega
FlinkPravegaReader
• Exactly-once Reader
• Integrates Flink Checkpoints with Pravega Checkpoints
FlinkPravegaWriter
• Transactional exactly-once event producer
• Distributed 2-phase commit coordinated by async. checkpoints
https://github.com/pravega/flink-connectors

Streaming and Batch Reads
DataStream API
01000110
01000110
01000110
01000110
S4
S2
S3
S1
In-order reads
Parallelism limited to #segments
at a certain time
DataSet API
01000110
01000110
01000110
01000110S4
S2
S3
S1
Out-of-order reads
Fully parallel reads of
all segments
79

Wrap-up and Outlook
Storage
Execution Runtime
80

Status of Batch and Streaming Unification
We have unified Batch and Streaming APIs
• Apache Flink and Apache Beam (Dataflow Model style)
• Stream SQL (Apache Flink + Beam + Calcite)
• Batch makes some simplifying assumptions
Pravega is streaming storage with an end-to-end
Streaming abstraction
• Also has optimizations for Batch-style reads

Status of Batch and Streaming Unification
Batch and Streaming Runtimes still different
• Streaming: Needs some form of bounded out-of-orderness
• Batch: Highly-parallel bulk out-of-order processing
Potential to use both Modes in the same Application
• Use cases that process historic an realtime data (bootstrapping)
• Use batch-style execution on historic data
• Use streaming execution on live data

Outlook: Autoscaling
• Scaling policies (Flink 1.6.0+) enable applications that dynamically
adjust their parallelism
• The Pravega Source operator integrates with scaling policies
• Adjust the Flink source stage parallelism together with Pravega
Stream scaling.

Outlook: Batch and Streaming Runtime
Query
the stream
Batch Execution Streaming Execution
84

Query
the stream
01000110
01000110
01000110
01000110
S4
S2
S3
S1
Streaming (ordered) readsParallel batch reads
85

Query
the stream
010 S4 (cont.)
Streaming (ordered) readsParallel batch reads
01000110
01000110
01000110
00110S4 (part.)
S2
S3
S1
86

Questions?
http://pravega.io
http://github.com/pravega/pravega
http://flink.apache.org
http://github.com/pravega/flink-connectors
E-mail: fpj@apache.org / sewen@apache.org
Twitter: @fpjunqueira / @StephanEwen

Streaming SQL and Batch SQL
89
stream
stream
materialized
real-time view
K/V Store or
SQL Database
Streaming SQL
Query
continuous
queryDB
CDC
Appl.
View Materialization
Standing Query
STREAMING
Dashboard
Many short queries
BATCH

Batch and Streaming Illustrated

Batch Processing

Stream Processing

Apache Flink APIs Overview

Powerful Abstractions
94
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch
Data Processing
High-level
Analytics API
Stateful Event-
Driven Applications
val stats = stream
.keyBy("sensor")
.sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Layered abstractions to
navigate simple to complex use cases

DataStream API
95
Source
Transformation
Windowed Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer011(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Streaming
Dataflow
Source Transform Window
(state read/write)
Sink

Low Level: Process Function

High Level: SQL (ANSI)
97
SELECT
campaign,
FROM adClicks
Query
the stream

Event Time

Latency vs. Completeness (in my words)
99
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
2016
Rogue
One
III.5
2017
Episode
VIII

Details on the
Transactional Flink Pravega Writer

Transactional Writes
the stream
101

The FlinkPravegaWriter
• Regular Flink SinkFunction
• No partitioner, but a routing key
• Remember: No partitions in Pravega
• Just dynamically created segments
• Same key always goes to the same segment
• Order of elements guaranteed per key!
Flink Application Pravega Nodes
seg 2
seg 1
seg 3
seg 4

Exactly-Once Writes via Transactions
• Similar to a distributed 2-phase commit
• Coordinated by asynchronous checkpoints, no voting delays
• Basic algorithm:
• Between checkpoints: Produce into transaction
• On operator snapshot: Flush local transaction (vote-to-commit)
• On checkpoint complete: Commit transactions
• On recovery: check and commit any pending transactions

Exactly-Once Writes via Transactions
chk-1 chk-2
TXN-1
✔chk-1 ✔chk-2
TXN-2
✘
TXN-3
Pravega
Stream
✔ global ✔ global

Transaction fails after local snapshot
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
Pravega
Stream
✔ global

Transaction fails before commit…
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
Pravega
Stream
✔ global ✔ global

… commit on recovery
chk-2
TXN-2 TXN-3
Pravega
Stream
✔ global
recover
TXN handle
chk-3

Use Cases for Unified Stream-Batch Processing
• More applications than ”just” analytics
• Building a machine-learning model from the past (in batch mode) and
apply and refine it on real-time data
• Run A/B tests for algorithms on historic and live data
• …

• Abstract
• Stream processing is becoming more relevant as many applications provide low-latency response time and
new application domains emerge that naturally demand data to be processed in motion. One particularly
attractive characteristic of the stream processing paradigm is that it conceptually unifies batch processing
(bounded/static historic data) and continuous near-real-time data processing (unbounded streaming event
data).
• However, in practice, implementing a unified batch and streaming data architecture is not seamless: near-
real-time event data and bulk historic data use different storage systems (messages queues or logs versus
filesystems or object stores). Consequently, running the same analysis now and at some arbitrary time in the
future (e.g., months, possibly years ahead) means dealing with different data sources and APIs. Few systems
are capable of handling both near-real-time streaming workloads and large batch workloads at the same
time. And streaming workloads tend to be inherently dynamic, requiring both storage and compute to adjust
continuously for maximum resource efficiency.
• Flavio Junqueira and Fabian Hueske detail an open source streaming data stack consisting of Pravega (stream
storage) and Apache Flink (computation on streams) that offers an unprecedented way of handling
“everything as a stream” that includes unbounded streaming storage and unified batch and streaming
abstraction and dynamically accommodates workload variations in a novel way.
• Pravega enables the ingestion capacity of a stream to grow and shrink according to workload and sends
signals downstream to enable Flink to scale accordingly; it also offers a permanent streaming storage,
exposing an API than enables applications to access data in either near real time or at any arbitrary time in
the future in a uniform fashion. Apache Flink’s SQL and streaming APIs provide a common interface for
processing continuous near-real-time data and a set of historic data, or combinations of both. A deep
integration between these two systems provides end-to-end exactly once semantics for pipelines of streams
and stream processing and lets both systems jointly scale and adjust automatically to changing data rates.

Notes by Flavio
• The talk will have three parts:
• Motivation for “everything as a stream”.
• Realizing our vision with a combination of a stream store + unified stream/batch processor
• Where we are with respect to our vision and where we want to go
• Motivation
• There are three cases mentioned that we can use to motivate:
• 1. Always process data as a stream: same API independent of when the application processes the data (reprocessing, historical
processing)
• 2. Catch-up: does not require starting from a bulk store like HDFS and then switch to something else
• 3. Processing stream data in parallel (batch processing)
• Realizing vision
• Pravega intro
• Flink connector
• Flink examples?
• How do we compare to other systems?
• Apache Pulsar: Pub-sub messaging
• Apache Kafka: inflexible in a number of ways

I’ve heard Batch is a Subset of Streaming…
-> Stream processing subsumes batch processing.
Batch Stream
Input Bounded, fixed-sized input Unbounded, infinite input
Input Ordering No ordering required.
Full data set can be sorted.
Ordering can be required to
reason about completeness of
input.
Processing Algorithms can collect all input
data before processing it.
Algorithms must process data as
it arrives.
Termination &
Output
Batch programs terminate and
produce finite output
Streaming programs do not
terminate and produce
continuous output

Scanning the Past in Order
• Many streaming queries have temporal operations
• Time-windowed aggregations
• Joins with temporal condition
• Processor can leverage (imperfect) time order
• No full sort or hash tables required -> smaller memory requirements
• Clustered Index Scan in relational DBMS
• Not need to switch to ordered ingestion when reaching the tail of the stream
• BUT: Scanning in order typically means scanning with lower parallelism

Ordered Scans are not Always Beneficial
• Get total number of clicks per campaign.
• Query does not have a temporal operation
• Events can be processed without respecting time order
• Massively parallel catch-up scan of past
SELECT
campaign,
COUNT(*) AS clickCnt,
FROM adClicks
GROUP BY campaign
113

Requirements for Unified Stream-Batch Processing
• Storage
• Single storage system for historic and real-time data with unified API
• Scanning historic data in time order
• Scanning historic data out of time order with high parallelism
• Ingestion of data in time order
• Processor
• Efficient processing of nearly time-sorted data
• Efficient processing of unordered, bounded data

A System for Unified Stream-Batch Processing
• Stream Storage: Pravega
• Long-term storage with support for ordered and unordered scans
• Real-time event log with ordered scans
• Dynamically scales writes and reads
• Unified Stream-Batch Processor: Apache Flink
• Stream processing with sophisticated state handling
• Event-time with watermark support for ingestion of ordered data
• Dedicated algorithms to efficiently handle bounded data
• Tight integration of storage and processor
• End-to-end exactly-once processing
• Dynamic scaling

An elastic batch-and stream-processing stack with Pravega and Apache Flink

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to An elastic batch-and stream-processing stack with Pravega and Apache Flink

Similar to An elastic batch-and stream-processing stack with Pravega and Apache Flink (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

An elastic batch-and stream-processing stack with Pravega and Apache Flink