SlideShare a Scribd company logo
1 of 45
{MAKING SENSE OF THE FIRE-HOSE IN REAL-TIME} 
{EVENT PROCESSING} 
Dr Andy Piper 
Push Technology 
Reappt, a Push Technology product offers the enterprise grade Diffusion technology as a 
service.
Time @cobbscomedyclub 
The past, the present and the future walked 
Copyright Push Technology 2014 
into a bar 
It was tense
Will the real Andy Piper please stand up? 
– CTO at Push Technology 
– Ex-BEA/Oracle 
– Spring contributor (Spring DM) and Author 
– Standards contributor – OMG, JCP etc 
– PhD, Cambridge, Distributed Systems 
✗ ✗ 
Copyright Push Technology 2014
Agenda in 140 characters 
What is it - What not? Why? History. Measure 
infinity. Windows. Queries. Going fast – 
reliably, distributed, distributed and fast and big 
Copyright Push Technology 2014
What is Event Stream Processing? 
Copyright Push Technology 2014
What is Event Stream Processing? 
• It’s not stream processing 
– Typically focused on local parallelism 
– I have opinions but they get me in trouble 
Copyright Push Technology 2014
What is Event Stream Processing? 
• Not event passing 
– Event exchange not processing, e.g. JMS 
– Stateless 
Copyright Push Technology 2014
What is Event Stream Processing? 
• Not event mediation (brokering) 
– Filtering, routing, and enrichment, e.g. ESB 
– Stateless 
Copyright Push Technology 2014
What is Event Stream Processing? 
“Event Stream Processing deals with the 
task of processing streams of event data 
with the goal of identifying the meaningful 
pattern within those streams” – Wikipedia 
Copyright Push Technology 2014
What is Event Stream Processing? 
• ESP is about querying data 
streams 
– Looking for something 
– Haystack won’t stay still! 
– Answers depend on multiple events 
– Extremely stateful 
Copyright Push Technology 2014 
Where the interesting questions 
are!
Meta-analogy 
“Producing thrust with a scramjet has been 
“Event stream processing is like looking for 
compared to lighting a match in a hurricane 
a needle in a haystack in a hurricane” 
and keeping it burning” - NASA 
Copyright Push Technology 2014
It’s like an inverted database 
• Data is ‘static’ 
• Queries are ‘dynamic’ 
Query Event 
Copyright Push Technology 2014 
• Data is ‘dynamic’ 
• Queries are ‘static’ 
RDBMS CEP 
Data Query
Why bother? 
• Too much data 
• Time is integral to the questions 
• Data is moving too fast 
• Databases assume static datasets 
Copyright Push Technology 2014 
?
History – Two schools of thought 
• Database and make it time driven 
• Logic approach with time constraints 
Copyright Push Technology 2014
Stream Processing History 
• Tapestry – ’92 
– Early inverted database (not Apache!) 
• Materialized views – ‘95 
– [A. Gupta and I. S. Mumick. “Maintenance of materialized views: Problems, 
techniques, and applications.” 1995] 
• David Luckham coined term CEP – “The Power Events”. 2001 
– Logic-based CEP 
– Company acquired by Avaya 
• Michael Franklin 
– Dataflow processing in PostgreSQL 
– [“TelegraphCQ: continuous dataflow processing.” 2003] 
• Aurora – ‘03 
– [Cherniack et al – “Scalable distributed stream processing.” 2003] 
• STREAM – ‘03 
– [Arasu et al – “STREAM: The Stanford Stream Data Manager.” 2003] 
• Borealis – ‘05 
– [Abadi et al – “The design of the Borealis stream processing engine.” 2005] 
Copyright Push Technology 2014
Some definitions 
• Tuple – a multi-set of elements ( e1, e2, … en ) 
– A single tuple is a monad! 
• Event or Data Stream 푺풏 - any ordered pair 풔, Δ 풏 
– 푠 is an unbounded sequence of tuples and 
– Δ is an unbounded sequence of positive real time 
intervals 
– 푠 and Δ are of equal length 
• Event stream processing transforms event streams 
into new event streams through queries 
• Outputs and inputs continuous 
– Operators are continuous queries 
Copyright Push Technology 2014
How do you measure infinity? 
Copyright Push Technology 2014 
How do you measure an event 
stream if it’s unbounded?
Measuring infinity 
• Don’t do it 
– But just event passing – where is the fun in 
that?! 
• Synopses – store summary information 
– Continuous average = running total + items 
• Windows – define working set 
– Continuous average over last N items 
Copyright Push Technology 2014
Measuring infinity 
Copyright Push Technology 2014
Types of window 
• Sliding 
• Jumping (batching) 
• Partitioned 
• Time-based 
• Others 
Copyright Push Technology 2014
What to do with a working set? 
• Windows define the scope of interest 
• Run queries against working set as it 
changes 
– Continuous Queries 
Copyright Push Technology 2014
When should you run queries? 
• Run queries when output is not 
idempotent 
• When is that? 
– Contents of the window changes – maybe? 
– Time advances – possibly? 
– Depends on window and query 
Linking cause and effect in an efficient manner 
lies at the heart of CEP and is why the answer 
Copyright Push Technology 2014 
is not simply programming
How can we define queries on windows? 
• Describe queries on windows using a 
SQL-like syntax 
SELECT AVG(price) FROM stream [ROWS N] 
• [Arasu et al. – “The CQL Continuous Query 
Language: Semantic Foundations and Query 
Execution” 2003] 
Copyright Push Technology 2014
Querying windows 
• Sliding 
SELECT * FROM s [ROWS 4 SLIDE 4] 
• Partitioned 
SELECT a, b FROM s 
[PARTITION BY b ROWS 3] 
• Time-based 
SELECT * FROM s [RANGE 30 SECONDS] 
Copyright Push Technology 2014
How do you make it fast? 
• Generally in-memory the only way 
• Operate as a gigantic state machine and 
optimize like crazy 
– Go reactive! 
– Talk to Martin 
Copyright Push Technology 2014
Why must it be fast? 
• Not reactive streams! 
• Flow control causes causal paradox 
• Stream processing must keep up 
Copyright Push Technology 2014
How do you make it resilient? 
• Making stateful systems resilient has 
challenges 
• State generally changing extremely quickly 
Copyright Push Technology 2014
Resiliency approaches 
• Save all the things and replay 
– But infinite data?! 
– Sometimes possible because 
append-only 
• Save all the state 
– Assumes there is less of it 
– State is changing rapidly 
– Too rapid to be effective 
Copyright Push Technology 2014
Resiliency approaches 
• Elsewhere checkpoint and record changes 
– Maybe we can record state and things 
– Many commercial systems do 
• No recording - identical parallel systems 
– Synchronization an issue 
– Catch-up an issue 
Copyright Push Technology 2014
How do you scale stream processing? 
• Follow the crowd 
• Distribute processing 
• Multiple input sources 
– If independent 
– Flume 
– Kafka 
Copyright Push Technology 2014
How do you distribute stream processing? 
• DAG of event streams 
– Inputs and outputs are event streams 
– Nodes are operators or groups of operators 
– Nodes can be distributed 
Copyright Push Technology 2014
Apache Storm 
• Toolkit for creating distributed event flows 
• Bolts (operators) and spouts (sources) 
• Composed using a Clojure DSL 
• Storm runs topologies 
– Map-Reduce jobs finish – batch 
– Topologies process forever – continuous 
Copyright Push Technology 2014
Apache Storm – a toolkit for distribution 
(topology 
{"1" (spout-spec twitter-feed-spout)} 
{"2" (bolt-spec {"1"} filter :p "status" )} 
{"3" (spout-spec database :p "retail" )} 
{"4" (bolt-spec {"2"} top-n)} 
{"5" (bolt-spec {"3" "4"} join :p "item" )} 
... 
) 
Copyright Push Technology 2014
How do you reliably distribute? 
• State is now distributed 
– Synchronization all but impossible 
– Deterministic if relative order is preserved 
• Depends on operators and their effect 
• [L. Lamport - “Time, Clocks, and the Ordering of 
Events in a Distributed System.” 1978] 
– In theory a replay of things through the 
network will recover the state 
– Alternative of storing the state for all the 
operators is harder 
Copyright Push Technology 2014
How do you reliably distribute? 
• Different classes of recovery 
– [Hwang et al. – “High-Availability Algorithms 
for Distributed Stream Processing”. 2005] 
• Precise recovery – failure effects hidden perfectly 
• Rollback recovery – no data loss, but outputs 
may be duplicated 
• Gap recovery – data lost during recovery 
• Reliable distribution overlaps distribution 
– Upstream backup, reactive streams? 
Copyright Push Technology 2014
Reactive stream processing 
• Message/event driven 
• Discussed resiliency 
• Continuous queries == responsive 
– Push towards on-line queries 
• Elasticity – harder 
Copyright Push Technology 2014
Stream Processing with Data 
• Time dimension to data problems 
• Data dimension to stream problems 
• JOIN streams to tables 
• Easy when small 
• Large datasets harder 
– Cache join data in memory? 
– Push query into datastore? 
Copyright Push Technology 2014
Stream Processing with Big Data 
• Time dimension to Big Data problems 
– Velocity (vvv) implies stream processing 
• Large dataset problem domain 
• But now the data is distributed! 
Copyright Push Technology 2014
Shortcomings of Big Data 
Copyright Push Technology 2014
Fast Data Architecture 
Copyright Push Technology 2014
Fast Data Architecture 
• Similar to recoverable architectures: 
– Snapshot (queries) + incremental updates 
– Current state = known state + changes 
– Requires static queries - cached results 
• Spark does this quite well 
Copyright Push Technology 2014
Fast data technology 
• Storm – topology deployment 
• Spark – logic queries on RDDs 
• Spark streaming 
– repeating snapshots / micro-batch 
– Fast data-ish 
• Flume – fast ingest of log data 
• Kafka 
– pub-sub messaging as distributed commit log 
• Hadoop streaming 
– create M-R jobs using executable scripts 
• Hive 
• Cloudera Impala 
– MPP SQL query engine on top of Hadoop 
Copyright Push Technology 2014
Summary 
Copyright Push Technology 2014
Future Stream Processing 
• Ease-of-use 
– CQL or Graphical - both have drawbacks 
– Queries get really complicated really quickly 
• Ease-of-use + distribution 
– Real systems challenge 
• Fast data architectures 
• Real-time machine learning 
– Spark ML Library 
– Hadoop Mahout 
• Interactive streaming queries – declarative and 
caching 
– Hive and Spark 
Copyright Push Technology 2014
Copyright Push Technology 2014

More Related Content

What's hot

Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Josh Patterson
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an examplehadooparchbook
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)John Adams
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseJosh Patterson
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Josh Patterson
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JJosh Patterson
 
Accelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAccelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAmazon Web Services
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Nathan Bijnens
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonNathan Bijnens
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systemsnathanmarz
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at DevoxxNathan Bijnens
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
Analyzing Hadoop Using Hadoop
Analyzing Hadoop Using HadoopAnalyzing Hadoop Using Hadoop
Analyzing Hadoop Using HadoopDataWorks Summit
 

What's hot (20)

Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015
 
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an exampleArchitecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
 
Accelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAccelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of Genomics
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
Analyzing Hadoop Using Hadoop
Analyzing Hadoop Using HadoopAnalyzing Hadoop Using Hadoop
Analyzing Hadoop Using Hadoop
 

Viewers also liked

Tutorial in DEBS 2008 - Event Processing Patterns
Tutorial in DEBS 2008 - Event Processing PatternsTutorial in DEBS 2008 - Event Processing Patterns
Tutorial in DEBS 2008 - Event Processing PatternsOpher Etzion
 
Comparative Analysis of Personal Firewalls
Comparative Analysis of Personal FirewallsComparative Analysis of Personal Firewalls
Comparative Analysis of Personal FirewallsAndrej Šimko
 
Installing Complex Event Processing On Linux
Installing Complex Event Processing On LinuxInstalling Complex Event Processing On Linux
Installing Complex Event Processing On LinuxOsama Mustafa
 
Access control attacks by nor liyana binti azman
Access control attacks by nor liyana binti azmanAccess control attacks by nor liyana binti azman
Access control attacks by nor liyana binti azmanHafiza Abas
 
Debs 2011 tutorial on non functional properties of event processing
Debs 2011 tutorial  on non functional properties of event processingDebs 2011 tutorial  on non functional properties of event processing
Debs 2011 tutorial on non functional properties of event processingOpher Etzion
 
CyberLab CCEH Session - 3 Scanning Networks
CyberLab CCEH Session - 3 Scanning NetworksCyberLab CCEH Session - 3 Scanning Networks
CyberLab CCEH Session - 3 Scanning NetworksCyberLab
 
Complex Event Processing with Esper and WSO2 ESB
Complex Event Processing with Esper and WSO2 ESBComplex Event Processing with Esper and WSO2 ESB
Complex Event Processing with Esper and WSO2 ESBPrabath Siriwardena
 
Chapter 12
Chapter 12Chapter 12
Chapter 12cclay3
 
Ceh v8 labs module 03 scanning networks
Ceh v8 labs module 03 scanning networksCeh v8 labs module 03 scanning networks
Ceh v8 labs module 03 scanning networksAsep Sopyan
 
Debs2009 Event Processing Languages Tutorial
Debs2009 Event Processing Languages TutorialDebs2009 Event Processing Languages Tutorial
Debs2009 Event Processing Languages TutorialOpher Etzion
 
Why Data Virtualization Is Good For Big Data Analytics?
Why Data Virtualization Is Good For Big Data Analytics?Why Data Virtualization Is Good For Big Data Analytics?
Why Data Virtualization Is Good For Big Data Analytics?Tyrone Systems
 
Analizadores de Protocolos
Analizadores de ProtocolosAnalizadores de Protocolos
Analizadores de ProtocolosMilton Muñoz
 
Scanning with nmap
Scanning with nmapScanning with nmap
Scanning with nmapcommiebstrd
 
Module 3 Scanning
Module 3   ScanningModule 3   Scanning
Module 3 Scanningleminhvuong
 
Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Ted Won
 
Optimizing Your SOA with Event Processing
Optimizing Your SOA with Event ProcessingOptimizing Your SOA with Event Processing
Optimizing Your SOA with Event ProcessingTim Bass
 

Viewers also liked (20)

Session hijacking
Session hijackingSession hijacking
Session hijacking
 
Tutorial in DEBS 2008 - Event Processing Patterns
Tutorial in DEBS 2008 - Event Processing PatternsTutorial in DEBS 2008 - Event Processing Patterns
Tutorial in DEBS 2008 - Event Processing Patterns
 
Comparative Analysis of Personal Firewalls
Comparative Analysis of Personal FirewallsComparative Analysis of Personal Firewalls
Comparative Analysis of Personal Firewalls
 
Installing Complex Event Processing On Linux
Installing Complex Event Processing On LinuxInstalling Complex Event Processing On Linux
Installing Complex Event Processing On Linux
 
Access control attacks by nor liyana binti azman
Access control attacks by nor liyana binti azmanAccess control attacks by nor liyana binti azman
Access control attacks by nor liyana binti azman
 
Debs 2011 tutorial on non functional properties of event processing
Debs 2011 tutorial  on non functional properties of event processingDebs 2011 tutorial  on non functional properties of event processing
Debs 2011 tutorial on non functional properties of event processing
 
CyberLab CCEH Session - 3 Scanning Networks
CyberLab CCEH Session - 3 Scanning NetworksCyberLab CCEH Session - 3 Scanning Networks
CyberLab CCEH Session - 3 Scanning Networks
 
Complex Event Processing with Esper and WSO2 ESB
Complex Event Processing with Esper and WSO2 ESBComplex Event Processing with Esper and WSO2 ESB
Complex Event Processing with Esper and WSO2 ESB
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 
Ceh v8 labs module 03 scanning networks
Ceh v8 labs module 03 scanning networksCeh v8 labs module 03 scanning networks
Ceh v8 labs module 03 scanning networks
 
Nmap scripting engine
Nmap scripting engineNmap scripting engine
Nmap scripting engine
 
Debs2009 Event Processing Languages Tutorial
Debs2009 Event Processing Languages TutorialDebs2009 Event Processing Languages Tutorial
Debs2009 Event Processing Languages Tutorial
 
Tutoriel esper
Tutoriel esperTutoriel esper
Tutoriel esper
 
Why Data Virtualization Is Good For Big Data Analytics?
Why Data Virtualization Is Good For Big Data Analytics?Why Data Virtualization Is Good For Big Data Analytics?
Why Data Virtualization Is Good For Big Data Analytics?
 
Analizadores de Protocolos
Analizadores de ProtocolosAnalizadores de Protocolos
Analizadores de Protocolos
 
Scanning with nmap
Scanning with nmapScanning with nmap
Scanning with nmap
 
Port Scanning Overview
Port Scanning  OverviewPort Scanning  Overview
Port Scanning Overview
 
Module 3 Scanning
Module 3   ScanningModule 3   Scanning
Module 3 Scanning
 
Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects
 
Optimizing Your SOA with Event Processing
Optimizing Your SOA with Event ProcessingOptimizing Your SOA with Event Processing
Optimizing Your SOA with Event Processing
 

Similar to Reactconf 2014 - Event Stream Processing

A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan AntionAIIM International
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JJosh Patterson
 
Budapest Big Data Meetup Real-time stream processing
Budapest Big Data Meetup Real-time stream processingBudapest Big Data Meetup Real-time stream processing
Budapest Big Data Meetup Real-time stream processingGabor Boros
 
DataIntensiveComputing.pdf
DataIntensiveComputing.pdfDataIntensiveComputing.pdf
DataIntensiveComputing.pdfBrahmam8
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology confluent
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataLuiz Henrique Zambom Santana
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connectAdrian Cockcroft
 
Intuit continuous performance testing for code camp temp
Intuit continuous performance testing for code camp tempIntuit continuous performance testing for code camp temp
Intuit continuous performance testing for code camp tempRamakrishna Kollipara
 
2013 - SVCC - Intuit continuous performance testing
2013 - SVCC - Intuit continuous performance testing2013 - SVCC - Intuit continuous performance testing
2013 - SVCC - Intuit continuous performance testingThirugnanam Subbiah
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Big Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceBig Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceVenu Vasudevan
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambdadarach
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 

Similar to Reactconf 2014 - Event Stream Processing (20)

A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
[AIIM17] It’s Harvest Time in the Information Garden - Dan Antion
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
 
Budapest Big Data Meetup Real-time stream processing
Budapest Big Data Meetup Real-time stream processingBudapest Big Data Meetup Real-time stream processing
Budapest Big Data Meetup Real-time stream processing
 
DataIntensiveComputing.pdf
DataIntensiveComputing.pdfDataIntensiveComputing.pdf
DataIntensiveComputing.pdf
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big Data
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
Intuit continuous performance testing for code camp temp
Intuit continuous performance testing for code camp tempIntuit continuous performance testing for code camp temp
Intuit continuous performance testing for code camp temp
 
2013 - SVCC - Intuit continuous performance testing
2013 - SVCC - Intuit continuous performance testing2013 - SVCC - Intuit continuous performance testing
2013 - SVCC - Intuit continuous performance testing
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Big Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of AdviceBig Data : Bits of History, Words of Advice
Big Data : Bits of History, Words of Advice
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambda
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 

Recently uploaded

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 

Recently uploaded (20)

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 

Reactconf 2014 - Event Stream Processing

  • 1. {MAKING SENSE OF THE FIRE-HOSE IN REAL-TIME} {EVENT PROCESSING} Dr Andy Piper Push Technology Reappt, a Push Technology product offers the enterprise grade Diffusion technology as a service.
  • 2. Time @cobbscomedyclub The past, the present and the future walked Copyright Push Technology 2014 into a bar It was tense
  • 3. Will the real Andy Piper please stand up? – CTO at Push Technology – Ex-BEA/Oracle – Spring contributor (Spring DM) and Author – Standards contributor – OMG, JCP etc – PhD, Cambridge, Distributed Systems ✗ ✗ Copyright Push Technology 2014
  • 4. Agenda in 140 characters What is it - What not? Why? History. Measure infinity. Windows. Queries. Going fast – reliably, distributed, distributed and fast and big Copyright Push Technology 2014
  • 5. What is Event Stream Processing? Copyright Push Technology 2014
  • 6. What is Event Stream Processing? • It’s not stream processing – Typically focused on local parallelism – I have opinions but they get me in trouble Copyright Push Technology 2014
  • 7. What is Event Stream Processing? • Not event passing – Event exchange not processing, e.g. JMS – Stateless Copyright Push Technology 2014
  • 8. What is Event Stream Processing? • Not event mediation (brokering) – Filtering, routing, and enrichment, e.g. ESB – Stateless Copyright Push Technology 2014
  • 9. What is Event Stream Processing? “Event Stream Processing deals with the task of processing streams of event data with the goal of identifying the meaningful pattern within those streams” – Wikipedia Copyright Push Technology 2014
  • 10. What is Event Stream Processing? • ESP is about querying data streams – Looking for something – Haystack won’t stay still! – Answers depend on multiple events – Extremely stateful Copyright Push Technology 2014 Where the interesting questions are!
  • 11. Meta-analogy “Producing thrust with a scramjet has been “Event stream processing is like looking for compared to lighting a match in a hurricane a needle in a haystack in a hurricane” and keeping it burning” - NASA Copyright Push Technology 2014
  • 12. It’s like an inverted database • Data is ‘static’ • Queries are ‘dynamic’ Query Event Copyright Push Technology 2014 • Data is ‘dynamic’ • Queries are ‘static’ RDBMS CEP Data Query
  • 13. Why bother? • Too much data • Time is integral to the questions • Data is moving too fast • Databases assume static datasets Copyright Push Technology 2014 ?
  • 14. History – Two schools of thought • Database and make it time driven • Logic approach with time constraints Copyright Push Technology 2014
  • 15. Stream Processing History • Tapestry – ’92 – Early inverted database (not Apache!) • Materialized views – ‘95 – [A. Gupta and I. S. Mumick. “Maintenance of materialized views: Problems, techniques, and applications.” 1995] • David Luckham coined term CEP – “The Power Events”. 2001 – Logic-based CEP – Company acquired by Avaya • Michael Franklin – Dataflow processing in PostgreSQL – [“TelegraphCQ: continuous dataflow processing.” 2003] • Aurora – ‘03 – [Cherniack et al – “Scalable distributed stream processing.” 2003] • STREAM – ‘03 – [Arasu et al – “STREAM: The Stanford Stream Data Manager.” 2003] • Borealis – ‘05 – [Abadi et al – “The design of the Borealis stream processing engine.” 2005] Copyright Push Technology 2014
  • 16. Some definitions • Tuple – a multi-set of elements ( e1, e2, … en ) – A single tuple is a monad! • Event or Data Stream 푺풏 - any ordered pair 풔, Δ 풏 – 푠 is an unbounded sequence of tuples and – Δ is an unbounded sequence of positive real time intervals – 푠 and Δ are of equal length • Event stream processing transforms event streams into new event streams through queries • Outputs and inputs continuous – Operators are continuous queries Copyright Push Technology 2014
  • 17. How do you measure infinity? Copyright Push Technology 2014 How do you measure an event stream if it’s unbounded?
  • 18. Measuring infinity • Don’t do it – But just event passing – where is the fun in that?! • Synopses – store summary information – Continuous average = running total + items • Windows – define working set – Continuous average over last N items Copyright Push Technology 2014
  • 19. Measuring infinity Copyright Push Technology 2014
  • 20. Types of window • Sliding • Jumping (batching) • Partitioned • Time-based • Others Copyright Push Technology 2014
  • 21. What to do with a working set? • Windows define the scope of interest • Run queries against working set as it changes – Continuous Queries Copyright Push Technology 2014
  • 22. When should you run queries? • Run queries when output is not idempotent • When is that? – Contents of the window changes – maybe? – Time advances – possibly? – Depends on window and query Linking cause and effect in an efficient manner lies at the heart of CEP and is why the answer Copyright Push Technology 2014 is not simply programming
  • 23. How can we define queries on windows? • Describe queries on windows using a SQL-like syntax SELECT AVG(price) FROM stream [ROWS N] • [Arasu et al. – “The CQL Continuous Query Language: Semantic Foundations and Query Execution” 2003] Copyright Push Technology 2014
  • 24. Querying windows • Sliding SELECT * FROM s [ROWS 4 SLIDE 4] • Partitioned SELECT a, b FROM s [PARTITION BY b ROWS 3] • Time-based SELECT * FROM s [RANGE 30 SECONDS] Copyright Push Technology 2014
  • 25. How do you make it fast? • Generally in-memory the only way • Operate as a gigantic state machine and optimize like crazy – Go reactive! – Talk to Martin Copyright Push Technology 2014
  • 26. Why must it be fast? • Not reactive streams! • Flow control causes causal paradox • Stream processing must keep up Copyright Push Technology 2014
  • 27. How do you make it resilient? • Making stateful systems resilient has challenges • State generally changing extremely quickly Copyright Push Technology 2014
  • 28. Resiliency approaches • Save all the things and replay – But infinite data?! – Sometimes possible because append-only • Save all the state – Assumes there is less of it – State is changing rapidly – Too rapid to be effective Copyright Push Technology 2014
  • 29. Resiliency approaches • Elsewhere checkpoint and record changes – Maybe we can record state and things – Many commercial systems do • No recording - identical parallel systems – Synchronization an issue – Catch-up an issue Copyright Push Technology 2014
  • 30. How do you scale stream processing? • Follow the crowd • Distribute processing • Multiple input sources – If independent – Flume – Kafka Copyright Push Technology 2014
  • 31. How do you distribute stream processing? • DAG of event streams – Inputs and outputs are event streams – Nodes are operators or groups of operators – Nodes can be distributed Copyright Push Technology 2014
  • 32. Apache Storm • Toolkit for creating distributed event flows • Bolts (operators) and spouts (sources) • Composed using a Clojure DSL • Storm runs topologies – Map-Reduce jobs finish – batch – Topologies process forever – continuous Copyright Push Technology 2014
  • 33. Apache Storm – a toolkit for distribution (topology {"1" (spout-spec twitter-feed-spout)} {"2" (bolt-spec {"1"} filter :p "status" )} {"3" (spout-spec database :p "retail" )} {"4" (bolt-spec {"2"} top-n)} {"5" (bolt-spec {"3" "4"} join :p "item" )} ... ) Copyright Push Technology 2014
  • 34. How do you reliably distribute? • State is now distributed – Synchronization all but impossible – Deterministic if relative order is preserved • Depends on operators and their effect • [L. Lamport - “Time, Clocks, and the Ordering of Events in a Distributed System.” 1978] – In theory a replay of things through the network will recover the state – Alternative of storing the state for all the operators is harder Copyright Push Technology 2014
  • 35. How do you reliably distribute? • Different classes of recovery – [Hwang et al. – “High-Availability Algorithms for Distributed Stream Processing”. 2005] • Precise recovery – failure effects hidden perfectly • Rollback recovery – no data loss, but outputs may be duplicated • Gap recovery – data lost during recovery • Reliable distribution overlaps distribution – Upstream backup, reactive streams? Copyright Push Technology 2014
  • 36. Reactive stream processing • Message/event driven • Discussed resiliency • Continuous queries == responsive – Push towards on-line queries • Elasticity – harder Copyright Push Technology 2014
  • 37. Stream Processing with Data • Time dimension to data problems • Data dimension to stream problems • JOIN streams to tables • Easy when small • Large datasets harder – Cache join data in memory? – Push query into datastore? Copyright Push Technology 2014
  • 38. Stream Processing with Big Data • Time dimension to Big Data problems – Velocity (vvv) implies stream processing • Large dataset problem domain • But now the data is distributed! Copyright Push Technology 2014
  • 39. Shortcomings of Big Data Copyright Push Technology 2014
  • 40. Fast Data Architecture Copyright Push Technology 2014
  • 41. Fast Data Architecture • Similar to recoverable architectures: – Snapshot (queries) + incremental updates – Current state = known state + changes – Requires static queries - cached results • Spark does this quite well Copyright Push Technology 2014
  • 42. Fast data technology • Storm – topology deployment • Spark – logic queries on RDDs • Spark streaming – repeating snapshots / micro-batch – Fast data-ish • Flume – fast ingest of log data • Kafka – pub-sub messaging as distributed commit log • Hadoop streaming – create M-R jobs using executable scripts • Hive • Cloudera Impala – MPP SQL query engine on top of Hadoop Copyright Push Technology 2014
  • 43. Summary Copyright Push Technology 2014
  • 44. Future Stream Processing • Ease-of-use – CQL or Graphical - both have drawbacks – Queries get really complicated really quickly • Ease-of-use + distribution – Real systems challenge • Fast data architectures • Real-time machine learning – Spark ML Library – Hadoop Mahout • Interactive streaming queries – declarative and caching – Hive and Spark Copyright Push Technology 2014

Editor's Notes

  1. Architect for WebLogic Server Core Architect and then Engineering Director for Oracle Event Processing
  2. Effectively infinite when varying over all time Theory assumes unbounded streams The data is moving Slow moving data can be stored and queried (somewhat inefficiently)
  3. Although conceptually similar to RDBMS query against a small data set, in practice very different Many optimizations based on how the data is changing are possible
  4. Thus the output of the query is another event stream Streams and operators on streams are thus composable Query plans
  5. Can be written in a regular programming language, but often declarative
  6. If you can’t you have to throw things overboard
  7. And performance is a key aspect of event stream processing
  8. E.g. commit log in a database
  9. event streams flow over communication links
  10. event streams flow over communication links
  11. Requires that you use time associated with the tuple, not the current time Some operators make this hard
  12. Requires that you use time associated with the tuple, not the current time Some operators make this hard
  13. Determinism requirements? Independence of tuples within streams? Queries?
  14. For instance calculate all the late running Tube (BART) trains in London (SF) Want to join the stream of time-based location data with a static timetable Effectively working set state Can be held in memory Data locality very important
  15. Lambda architecture As you get results from M-R, because of immutable ops are associative and commutative – easy to add to without losing semantics Break same question into 2 parts for deep data and recent data if associate and commutative and then merge results
  16. Can combine the two! Data for static querying and recovery is the same
  17. Move towards on-line querying