SlideShare a Scribd company logo
1 of 44
Download to read offline
Taming Velocity
A Tale of Four Streams
Emanuele Della Valle
prof @ Politecnico di Milano
founder @ Quantia Consulting
Streams take many forms …
@manudellavalle - http://emanueledellavalle.org 3
Myriads of tiny flows
that you can collect
@manudellavalle - http://emanueledellavalle.org 4
Continuous massive flows
than you cannot stop
5
Con4nuous numerous flows
that can turn into a torrent
@manudellavalle - http://emanueledellavalle.org
6
Myriads of continuous flows
of any size and speed that
form an immense delta
@manudellavalle - http://emanueledellavalle.org
@manudellavalle - http://emanueledellavalle.org 7
Can we tame their velocity?
1st premises: continuity matters
9
@manudellavalle - http://emanueledellavalle.org
1st premises: continuity matters
10
@manudellavalle - http://emanueledellavalle.org
1st premises: continuity matters
11
@manudellavalle - http://emanueledellavalle.org
@manudellavalle - h=p://emanueledellavalle.org 12
2nd premises: time matters!
13
Causal Time Model
• The order can be exploited to perform queries
• Does Alice meet Bob before Carl?
• Who does Carl meet first?
S e1
isWith(Alice,Bob)
e2
isWith(Alice,Carl)
e3
isWith(Bob,Diana)
e4
isWith(Daina,Carl)
@manudellavalle - http://emanueledellavalle.org
14
Absolute Time Model
• We can ask the queries in the previous slide
• We can start to compose queries taking into account the time
• How many people has Alice met in the last 5m?
• Does Diana meet Bob and then Carl within 5m?
e1 e2 e3 e4
S
t
3 6 9
1
isWith(Alice,Bob)
isWith(Alice,Carl)
isWith(Bob,Diana)
isWith(Daina,Carl)
@manudellavalle - http://emanueledellavalle.org
15
Interval-base Time Model
• We can ask the queries in the previous two slides
• It is possible to write even more complex queries:
• Which are the meetings the last less than 5m?
• Which are the meetings with conflicts?
S
t
3 6 9
1
isWith(Alice,Bob)
isWith(Alice,Carl)
isWith(Bob,Diana)
isWith(Daina,Carl)
e1
e2
e3
e4
@manudellavalle - http://emanueledellavalle.org
@manudellavalle - http://emanueledellavalle.org 16
Taming
Con$nuous massive flows
than you cannot stop
with
Data Stream Mng. Systems
17
Data Stream Management Systems (DSMS)
• Data streams are (unbounded)
sequences of time-varying
data elements
• Represent:
• an (almost) “continuous” flow of information
• with the recent information being more relevant as it describes
the current state of a dynamic system
time
@manudellavalle - http://emanueledellavalle.org
18
Data Stream Management Systems (DSMS)
• The nature of streams requires a
paradigma=c change*
• from persistent data
• one <me seman<cs
• to transient data
• con<nuous seman<cs
* This paradigma-c change first arose in DB community in the late '90s
@manudellavalle - http://emanueledellavalle.org
19
Continuous Semantics
• Continuous queries registered over streams that
are observed trough windows
window
input streams streams of answer
Registered
Con<nuous
Query
Dynamic
System
E. Della Valle
@manudellavalle - http://emanueledellavalle.org
20
DSMS Semantics
Arasu, A., Babu, S. and Widom, J., 2006. The CQL continuous query language: semantic
foundations and query execution. The VLDB Journal, 15(2), pp.121-142.
@manudellavalle - http://emanueledellavalle.org
@manudellavalle - h=p://emanueledellavalle.org 21
SQL Extensions
Windowed aggregation
SELECT x, COUNT(*), SUM(y)
FROM s1 WINDOW
TUMBLING (SIZE 1 MINUTE)
GROUP BY x
EMIT CHANGES;
Join two streams together
CREATE STREAM s3 AS
SELECT s1.c1, s2.c2
FROM s1 JOIN s2
WITHIN 5 MINUTES
ON s1.c1 = s2.c1
EMIT CHANGES;
22
Typically queries are placed in a network
[src:
https://docs.confluent.io/platform/current/control-center/ksql.html
]
@manudellavalle - h=p://emanueledellavalle.org
@manudellavalle - http://emanueledellavalle.org 23
Image by Pexels from Pixabay
Taming
Myriads of tiny flows
that you can collect
with
Time-series DB
@manudellavalle - http://emanueledellavalle.org 24
Time-series
• Time series is just a sequence of data elements
• The typical use case is a set of measurements made over a time
interval
• Much of the data generated by sensors, in a machine to machine
communication, in Internet of Things area
@manudellavalle - http://emanueledellavalle.org 25
SQL & Time Series
• Extremely simple schema
CREATE TABLE TS AS (
ts_time TIMESTAMP NOT NULL PRIMARY KEY,
ts_id INT,
ts_value FLOAT )
• High-frequent INSERTs prevail
• UPDATEs are uncommon
• DELETEs are massive and time-based (time-to-leave, TTL)
@manudellavalle - http://emanueledellavalle.org 26
Special Indexes! (a.k.a. fractal tree index)
Bender, M. A.; Farach-Colton, M.; Fineman, J.; Fogel, Y.; Kuszmaul, B.; Nelson, J. (2007). "Cache-Oblivious streaming
B-trees". Proceedings of the 19th ACM Symp. on Parallelism in Algorithms and Architectures. CA: ACM Press: 81–92.
@manudellavalle - http://emanueledellavalle.org 27
High insertion throughput
@manudellavalle - http://emanueledellavalle.org 28
High concurrency
@manudellavalle - h=p://emanueledellavalle.org 29
Low disk usage
@manudellavalle - http://emanueledellavalle.org 30
SQL extentions
SELECT item,slice_time,
ts_first_value(price, 'const') price
FROM ts_test
WHERE price_time BETWEEN
timestamp '2015-04-14 09:00' AND
timestamp '2015-04-14 09:25’
TIMESERIES slice_time AS '1 minute' OVER
(PARTITION BY item ORDER BY price_time)
ORDER BY item, slice_time, price;
@manudellavalle - http://emanueledellavalle.org 31
Dedicated languages
from(bucket: "training")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "fill_level")
|> filter(fn: (r) => r._field == "value")
|> filter(fn: (r) => r.tank == "B2")
|> aggregateWindow(every: 1m, fn: first)
|> holtWinters(n: 10, seasonality: 8, interval: 1m)
@manudellavalle - http://emanueledellavalle.org 32
Taming
Continuous numerous flows that
can turn into a torrent
with
Event-based Systems
33
Event-based systems
• Components
• collaborate by exchanging events
• publish events they observe
• subscribe to the events they are
interested in
• Communication is:
• Purely message based
• Asynchronous
• Multicast
• Implicit
• Anonymous
topic=fire* &
place=*
topic=* &
place=1st floor
topic=fire alarm &
place=*
fire alarm at
1st floor
fire alarm at
1st floor
fire alarm at
1st floor
fire alarm at
1st floor
fire training at
1st floor
fire training at
1st floor
fire training at
1st floor
@manudellavalle - http://emanueledellavalle.org
Eugster, P. T., Felber, P. A., Guerraoui, R., & Kermarrec, A. M. (2003). The many
faces of publish/subscribe. ACM computing surveys (CSUR), 35(2), 114-131.
34
Complex Event Processing (CEP)
• CEP systems adds the ability to deploy rules that describe
how composite events can be generated from primitive (or
composite) ones
• Typical CEP rules search
for sequences of events
• Raise C if A®B
• Time is a key aspect
in CEP Rules
-----
-----
-----
-----
@manudellavalle - http://emanueledellavalle.org
Cugola, G., & Margara, A. (2012). Processing flows of information: From data stream to complex event
processing. ACM Computing Surveys (CSUR), 44(3), 1-62.
@manudellavalle - http://emanueledellavalle.org 35
Detec4ng Languages
• Event Processing Language (EPL)
insert into alertIBM
select *
from pattern [
every (
StockTick(name="IBM",price>100)
->
(StockTick(name="IBM",price<100)
where timer:within(60 seconds))
];
• Supported since 2006 Esper by EsperTech Inc. and Oracle CEP
Write a query which alerts on each
IBM stock tick with a price greater
then 100 followed by an IBM stock
tick lower than 100 within 60 seconds
36
More Expressive Detecting Languages
• Match
Regex
Complex Events: MatchRegex
Martin Hirzel, IBM Research AI
Partition and Compose: Parallel
Complex Event Processing [DEBS'12]
14
Composite events Simple events
Regular
expression
Aggregation
Key
! Operator only, no extensions to SPL syntax
@manudellavalle - http://emanueledellavalle.org
@manudellavalle - http://emanueledellavalle.org 37
Photo by Nikolaj Jepsen from Pexels
[src: https://www.nasa.gov/content/okavango-inland-delta-in-northern-botswana/ ]
Taming
Myriads of continuous
flows of any size
and speed that
form an
immense
delta
with
Event-Driven Architectures
38
Service-oriented architecture
• DeterminisNc
• Rigid
• Thigh coupling
@manudellavalle - http://emanueledellavalle.org
39
Event-driven Architecture
• Responsive
• Flessible
• Light coupling
@manudellavalle - http://emanueledellavalle.org
40
Event-based systems
Time-series DB
Data Stream Mng. Sys.
@manudellavalle - http://emanueledellavalle.org
41
Event-driven enterprise
@manudellavalle - http://emanueledellavalle.org
Thank you for your a=en>on!
Ques>ons?
Taming Velocity
A Tale of Four Streams
Emanuele Della Valle
prof @ Politecnico di Milano
founder @ Quantia Consulting
44
Credits
• These slides are partially based on
• "A modeling framework for DSMS and CEP" by G. Cugola and A. Margara presented in the PhD
course on "Stream and complex event processing" offered by Politecnico di Milano in 2015.
• http://www.streamreasoning.org/courses/scep2015
• "Tutorial: Stream Processing Languages" by Martin Hirzel, IBM Research AI. 1 November 2017
Dagstuhl Seminar on Big Stream Processing Systems
• https://materials.dagstuhl.de/files/17/17441/17441.MartinHirzel1.Slides.pdf
• "The Death and Rebirth of the Event Driven Architecture" by Jay Kreps, Confluent, Kafka Summit
2018
• https://www.confluent.io/kafka-summit-london18/keynote-the-death-and-rebirth-of-the-event-
driven-architecture
• “Time Series Databases” by Dmitry Namiot
• https://www.slideshare.net/coldbeans/on-timeseries-databases
• “Fractal Tree Indexes : From Theory to Practice” by Tim Callaghan
• https://www.slideshare.net/tmcallaghan/20131112-plukfractaltreestheorytopractice
@manudellavalle - http://emanueledellavalle.org

More Related Content

What's hot

Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 

What's hot (20)

How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
ksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time EventsksqlDB: Building Consciousness on Real Time Events
ksqlDB: Building Consciousness on Real Time Events
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
 
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data ArchitectureRediscovering the Value of Apache Kafka® in Modern Data Architecture
Rediscovering the Value of Apache Kafka® in Modern Data Architecture
 
Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...
 
New Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQLNew Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQL
 
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
 
What every software engineer should know about streams and tables in kafka ...
What every software engineer should know about streams and tables in kafka   ...What every software engineer should know about streams and tables in kafka   ...
What every software engineer should know about streams and tables in kafka ...
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Confluent Steaming Webinar - Cape Town - Vitality
Confluent Steaming Webinar - Cape Town - VitalityConfluent Steaming Webinar - Cape Town - Vitality
Confluent Steaming Webinar - Cape Town - Vitality
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Using Kafka to integrate DWH and Cloud Based big data systems
Using Kafka to integrate DWH and Cloud Based big data systemsUsing Kafka to integrate DWH and Cloud Based big data systems
Using Kafka to integrate DWH and Cloud Based big data systems
 
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190...
 

Similar to Taming velocity - a tale of four streams

Lean&Mean Terminal Design benefits from advanced modelling (Terminal Operator...
Lean&Mean Terminal Design benefits from advanced modelling (Terminal Operator...Lean&Mean Terminal Design benefits from advanced modelling (Terminal Operator...
Lean&Mean Terminal Design benefits from advanced modelling (Terminal Operator...
Yvo Saanen
 
matlab basics and Simulink for beginners
matlab basics and Simulink for beginnersmatlab basics and Simulink for beginners
matlab basics and Simulink for beginners
A Rajendran Jps
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
Databricks
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
Arun Kejariwal
 

Similar to Taming velocity - a tale of four streams (20)

Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
 
SCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in HeavenSCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in Heaven
 
Lean&Mean Terminal Design benefits from advanced modelling (Terminal Operator...
Lean&Mean Terminal Design benefits from advanced modelling (Terminal Operator...Lean&Mean Terminal Design benefits from advanced modelling (Terminal Operator...
Lean&Mean Terminal Design benefits from advanced modelling (Terminal Operator...
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
Telegraph Cq English
Telegraph Cq EnglishTelegraph Cq English
Telegraph Cq English
 
matlab basics and Simulink for beginners
matlab basics and Simulink for beginnersmatlab basics and Simulink for beginners
matlab basics and Simulink for beginners
 
AutomationML: A Model-Driven View
AutomationML: A Model-Driven ViewAutomationML: A Model-Driven View
AutomationML: A Model-Driven View
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
Project A Data Modelling Best Practices Part II: How to Build a Data Warehouse?
 
How Much Parallelism?
How Much Parallelism?How Much Parallelism?
How Much Parallelism?
 
JUG SF - Introduction to data streaming
JUG SF - Introduction to data streamingJUG SF - Introduction to data streaming
JUG SF - Introduction to data streaming
 
JUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streamingJUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streaming
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
vJUG - Introduction to data streaming
vJUG - Introduction to data streamingvJUG - Introduction to data streaming
vJUG - Introduction to data streaming
 
Timing verification of real-time automotive Ethernet networks: what can we ex...
Timing verification of real-time automotive Ethernet networks: what can we ex...Timing verification of real-time automotive Ethernet networks: what can we ex...
Timing verification of real-time automotive Ethernet networks: what can we ex...
 
Mario Fusco - Reactive programming in Java - Codemotion Milan 2017
Mario Fusco - Reactive programming in Java - Codemotion Milan 2017Mario Fusco - Reactive programming in Java - Codemotion Milan 2017
Mario Fusco - Reactive programming in Java - Codemotion Milan 2017
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the Cloud
 

More from Emanuele Della Valle

More from Emanuele Della Valle (20)

Stream reasoning
Stream reasoningStream reasoning
Stream reasoning
 
Work in progress on Inductive Stream Reasoning
Work in progress on Inductive Stream ReasoningWork in progress on Inductive Stream Reasoning
Work in progress on Inductive Stream Reasoning
 
Big Data and Data Science W's
Big Data and Data Science W'sBig Data and Data Science W's
Big Data and Data Science W's
 
Knowledge graphs in search engines
Knowledge graphs in search enginesKnowledge graphs in search engines
Knowledge graphs in search engines
 
La città dei balocchi 2017 in numeri - Fluxedo
La città dei balocchi 2017 in numeri - FluxedoLa città dei balocchi 2017 in numeri - Fluxedo
La città dei balocchi 2017 in numeri - Fluxedo
 
Stream Reasoning: a summary of ten years of research and a vision for the nex...
Stream Reasoning: a summary of ten years of research and a vision for the nex...Stream Reasoning: a summary of ten years of research and a vision for the nex...
Stream Reasoning: a summary of ten years of research and a vision for the nex...
 
ACQUA: Approximate Continuous Query Answering over Streams and Dynamic Linked...
ACQUA: Approximate Continuous Query Answering over Streams and Dynamic Linked...ACQUA: Approximate Continuous Query Answering over Streams and Dynamic Linked...
ACQUA: Approximate Continuous Query Answering over Streams and Dynamic Linked...
 
Stream reasoning: an approach to tame the velocity and variety dimensions of ...
Stream reasoning: an approach to tame the velocity and variety dimensions of ...Stream reasoning: an approach to tame the velocity and variety dimensions of ...
Stream reasoning: an approach to tame the velocity and variety dimensions of ...
 
Big Data: how to use it to create value
Big Data: how to use it to create valueBig Data: how to use it to create value
Big Data: how to use it to create value
 
Listening to the pulse of our cities with Stream Reasoning (and few more tech...
Listening to the pulse of our cities with Stream Reasoning (and few more tech...Listening to the pulse of our cities with Stream Reasoning (and few more tech...
Listening to the pulse of our cities with Stream Reasoning (and few more tech...
 
Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF
 
Ist16-03 An Introduction to the Semantic Web
Ist16-03 An Introduction to the Semantic Web Ist16-03 An Introduction to the Semantic Web
Ist16-03 An Introduction to the Semantic Web
 
Ist16-02 HL7 from v2 (syntax) to v3 (semantics)
Ist16-02 HL7 from v2 (syntax) to v3 (semantics)Ist16-02 HL7 from v2 (syntax) to v3 (semantics)
Ist16-02 HL7 from v2 (syntax) to v3 (semantics)
 
IST16-01 - Introduction to Interoperability and Semantic Technologies
IST16-01 - Introduction to Interoperability and Semantic TechnologiesIST16-01 - Introduction to Interoperability and Semantic Technologies
IST16-01 - Introduction to Interoperability and Semantic Technologies
 
Stream reasoning: mastering the velocity and the variety dimensions of Big Da...
Stream reasoning: mastering the velocity and the variety dimensions of Big Da...Stream reasoning: mastering the velocity and the variety dimensions of Big Da...
Stream reasoning: mastering the velocity and the variety dimensions of Big Da...
 
On Stream Reasoning
On Stream ReasoningOn Stream Reasoning
On Stream Reasoning
 
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
 
Social listener-brera-design-district-2015-03
Social listener-brera-design-district-2015-03Social listener-brera-design-district-2015-03
Social listener-brera-design-district-2015-03
 
City Data Fusion for Event Management (in Italiano)
City Data Fusion for Event Management (in Italiano)City Data Fusion for Event Management (in Italiano)
City Data Fusion for Event Management (in Italiano)
 
Semantic technologies and Interoperability
Semantic technologies and InteroperabilitySemantic technologies and Interoperability
Semantic technologies and Interoperability
 

Recently uploaded

一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
aqpto5bt
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 

Recently uploaded (20)

Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 

Taming velocity - a tale of four streams

  • 1. Taming Velocity A Tale of Four Streams Emanuele Della Valle prof @ Politecnico di Milano founder @ Quantia Consulting
  • 2. Streams take many forms …
  • 3. @manudellavalle - http://emanueledellavalle.org 3 Myriads of tiny flows that you can collect
  • 4. @manudellavalle - http://emanueledellavalle.org 4 Continuous massive flows than you cannot stop
  • 5. 5 Con4nuous numerous flows that can turn into a torrent @manudellavalle - http://emanueledellavalle.org
  • 6. 6 Myriads of continuous flows of any size and speed that form an immense delta @manudellavalle - http://emanueledellavalle.org
  • 8. Can we tame their velocity?
  • 9. 1st premises: continuity matters 9 @manudellavalle - http://emanueledellavalle.org
  • 10. 1st premises: continuity matters 10 @manudellavalle - http://emanueledellavalle.org
  • 11. 1st premises: continuity matters 11 @manudellavalle - http://emanueledellavalle.org
  • 12. @manudellavalle - h=p://emanueledellavalle.org 12 2nd premises: time matters!
  • 13. 13 Causal Time Model • The order can be exploited to perform queries • Does Alice meet Bob before Carl? • Who does Carl meet first? S e1 isWith(Alice,Bob) e2 isWith(Alice,Carl) e3 isWith(Bob,Diana) e4 isWith(Daina,Carl) @manudellavalle - http://emanueledellavalle.org
  • 14. 14 Absolute Time Model • We can ask the queries in the previous slide • We can start to compose queries taking into account the time • How many people has Alice met in the last 5m? • Does Diana meet Bob and then Carl within 5m? e1 e2 e3 e4 S t 3 6 9 1 isWith(Alice,Bob) isWith(Alice,Carl) isWith(Bob,Diana) isWith(Daina,Carl) @manudellavalle - http://emanueledellavalle.org
  • 15. 15 Interval-base Time Model • We can ask the queries in the previous two slides • It is possible to write even more complex queries: • Which are the meetings the last less than 5m? • Which are the meetings with conflicts? S t 3 6 9 1 isWith(Alice,Bob) isWith(Alice,Carl) isWith(Bob,Diana) isWith(Daina,Carl) e1 e2 e3 e4 @manudellavalle - http://emanueledellavalle.org
  • 16. @manudellavalle - http://emanueledellavalle.org 16 Taming Con$nuous massive flows than you cannot stop with Data Stream Mng. Systems
  • 17. 17 Data Stream Management Systems (DSMS) • Data streams are (unbounded) sequences of time-varying data elements • Represent: • an (almost) “continuous” flow of information • with the recent information being more relevant as it describes the current state of a dynamic system time @manudellavalle - http://emanueledellavalle.org
  • 18. 18 Data Stream Management Systems (DSMS) • The nature of streams requires a paradigma=c change* • from persistent data • one <me seman<cs • to transient data • con<nuous seman<cs * This paradigma-c change first arose in DB community in the late '90s @manudellavalle - http://emanueledellavalle.org
  • 19. 19 Continuous Semantics • Continuous queries registered over streams that are observed trough windows window input streams streams of answer Registered Con<nuous Query Dynamic System E. Della Valle @manudellavalle - http://emanueledellavalle.org
  • 20. 20 DSMS Semantics Arasu, A., Babu, S. and Widom, J., 2006. The CQL continuous query language: semantic foundations and query execution. The VLDB Journal, 15(2), pp.121-142. @manudellavalle - http://emanueledellavalle.org
  • 21. @manudellavalle - h=p://emanueledellavalle.org 21 SQL Extensions Windowed aggregation SELECT x, COUNT(*), SUM(y) FROM s1 WINDOW TUMBLING (SIZE 1 MINUTE) GROUP BY x EMIT CHANGES; Join two streams together CREATE STREAM s3 AS SELECT s1.c1, s2.c2 FROM s1 JOIN s2 WITHIN 5 MINUTES ON s1.c1 = s2.c1 EMIT CHANGES;
  • 22. 22 Typically queries are placed in a network [src: https://docs.confluent.io/platform/current/control-center/ksql.html ] @manudellavalle - h=p://emanueledellavalle.org
  • 23. @manudellavalle - http://emanueledellavalle.org 23 Image by Pexels from Pixabay Taming Myriads of tiny flows that you can collect with Time-series DB
  • 24. @manudellavalle - http://emanueledellavalle.org 24 Time-series • Time series is just a sequence of data elements • The typical use case is a set of measurements made over a time interval • Much of the data generated by sensors, in a machine to machine communication, in Internet of Things area
  • 25. @manudellavalle - http://emanueledellavalle.org 25 SQL & Time Series • Extremely simple schema CREATE TABLE TS AS ( ts_time TIMESTAMP NOT NULL PRIMARY KEY, ts_id INT, ts_value FLOAT ) • High-frequent INSERTs prevail • UPDATEs are uncommon • DELETEs are massive and time-based (time-to-leave, TTL)
  • 26. @manudellavalle - http://emanueledellavalle.org 26 Special Indexes! (a.k.a. fractal tree index) Bender, M. A.; Farach-Colton, M.; Fineman, J.; Fogel, Y.; Kuszmaul, B.; Nelson, J. (2007). "Cache-Oblivious streaming B-trees". Proceedings of the 19th ACM Symp. on Parallelism in Algorithms and Architectures. CA: ACM Press: 81–92.
  • 27. @manudellavalle - http://emanueledellavalle.org 27 High insertion throughput
  • 30. @manudellavalle - http://emanueledellavalle.org 30 SQL extentions SELECT item,slice_time, ts_first_value(price, 'const') price FROM ts_test WHERE price_time BETWEEN timestamp '2015-04-14 09:00' AND timestamp '2015-04-14 09:25’ TIMESERIES slice_time AS '1 minute' OVER (PARTITION BY item ORDER BY price_time) ORDER BY item, slice_time, price;
  • 31. @manudellavalle - http://emanueledellavalle.org 31 Dedicated languages from(bucket: "training") |> range(start: v.timeRangeStart, stop: v.timeRangeStop) |> filter(fn: (r) => r._measurement == "fill_level") |> filter(fn: (r) => r._field == "value") |> filter(fn: (r) => r.tank == "B2") |> aggregateWindow(every: 1m, fn: first) |> holtWinters(n: 10, seasonality: 8, interval: 1m)
  • 32. @manudellavalle - http://emanueledellavalle.org 32 Taming Continuous numerous flows that can turn into a torrent with Event-based Systems
  • 33. 33 Event-based systems • Components • collaborate by exchanging events • publish events they observe • subscribe to the events they are interested in • Communication is: • Purely message based • Asynchronous • Multicast • Implicit • Anonymous topic=fire* & place=* topic=* & place=1st floor topic=fire alarm & place=* fire alarm at 1st floor fire alarm at 1st floor fire alarm at 1st floor fire alarm at 1st floor fire training at 1st floor fire training at 1st floor fire training at 1st floor @manudellavalle - http://emanueledellavalle.org Eugster, P. T., Felber, P. A., Guerraoui, R., & Kermarrec, A. M. (2003). The many faces of publish/subscribe. ACM computing surveys (CSUR), 35(2), 114-131.
  • 34. 34 Complex Event Processing (CEP) • CEP systems adds the ability to deploy rules that describe how composite events can be generated from primitive (or composite) ones • Typical CEP rules search for sequences of events • Raise C if A®B • Time is a key aspect in CEP Rules ----- ----- ----- ----- @manudellavalle - http://emanueledellavalle.org Cugola, G., & Margara, A. (2012). Processing flows of information: From data stream to complex event processing. ACM Computing Surveys (CSUR), 44(3), 1-62.
  • 35. @manudellavalle - http://emanueledellavalle.org 35 Detec4ng Languages • Event Processing Language (EPL) insert into alertIBM select * from pattern [ every ( StockTick(name="IBM",price>100) -> (StockTick(name="IBM",price<100) where timer:within(60 seconds)) ]; • Supported since 2006 Esper by EsperTech Inc. and Oracle CEP Write a query which alerts on each IBM stock tick with a price greater then 100 followed by an IBM stock tick lower than 100 within 60 seconds
  • 36. 36 More Expressive Detecting Languages • Match Regex Complex Events: MatchRegex Martin Hirzel, IBM Research AI Partition and Compose: Parallel Complex Event Processing [DEBS'12] 14 Composite events Simple events Regular expression Aggregation Key ! Operator only, no extensions to SPL syntax @manudellavalle - http://emanueledellavalle.org
  • 37. @manudellavalle - http://emanueledellavalle.org 37 Photo by Nikolaj Jepsen from Pexels [src: https://www.nasa.gov/content/okavango-inland-delta-in-northern-botswana/ ] Taming Myriads of continuous flows of any size and speed that form an immense delta with Event-Driven Architectures
  • 38. 38 Service-oriented architecture • DeterminisNc • Rigid • Thigh coupling @manudellavalle - http://emanueledellavalle.org
  • 39. 39 Event-driven Architecture • Responsive • Flessible • Light coupling @manudellavalle - http://emanueledellavalle.org
  • 40. 40 Event-based systems Time-series DB Data Stream Mng. Sys. @manudellavalle - http://emanueledellavalle.org
  • 41. 41 Event-driven enterprise @manudellavalle - http://emanueledellavalle.org
  • 42. Thank you for your a=en>on! Ques>ons?
  • 43. Taming Velocity A Tale of Four Streams Emanuele Della Valle prof @ Politecnico di Milano founder @ Quantia Consulting
  • 44. 44 Credits • These slides are partially based on • "A modeling framework for DSMS and CEP" by G. Cugola and A. Margara presented in the PhD course on "Stream and complex event processing" offered by Politecnico di Milano in 2015. • http://www.streamreasoning.org/courses/scep2015 • "Tutorial: Stream Processing Languages" by Martin Hirzel, IBM Research AI. 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems • https://materials.dagstuhl.de/files/17/17441/17441.MartinHirzel1.Slides.pdf • "The Death and Rebirth of the Event Driven Architecture" by Jay Kreps, Confluent, Kafka Summit 2018 • https://www.confluent.io/kafka-summit-london18/keynote-the-death-and-rebirth-of-the-event- driven-architecture • “Time Series Databases” by Dmitry Namiot • https://www.slideshare.net/coldbeans/on-timeseries-databases • “Fractal Tree Indexes : From Theory to Practice” by Tim Callaghan • https://www.slideshare.net/tmcallaghan/20131112-plukfractaltreestheorytopractice @manudellavalle - http://emanueledellavalle.org