SlideShare a Scribd company logo
1 of 49
Download to read offline
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Introduction to Streaming Analytics
Guido Schmutz
@gschmutz
Guido Schmutz
Working for Trivadis for more than 19 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer, Software Architect for Java, SOA & Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 25 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Introduction to Streaming Analytics
Our company.
Introduction to Streaming Analytics
Trivadis is a market leader in IT consulting, system integration, solution engineering
and the provision of IT services focusing on and
technologies
in Switzerland, Germany, Austria and Denmark. We offer our services in the following
strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
COPENHAGEN
MUNICH
LAUSANNE
BERN
ZURICH
BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
Introduction to Streaming Analytics
14 Trivadis branches and more than
600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable
Experience from more than 1,900
projects per year at over 800
customers
Introduction to Streaming Analytics
Technology on its own won't help you.
You need to know how to use it properly.
Agenda
Introduction to Streaming Analytics
1. Introduction / Motivation
2. Concepts (1 -11)
3. Stream Processing Platforms
Introduction to Streaming Analytics
Introduction / Motivation
Traditional Data Processing - Challenges
• Introduces too much “decision latency”
• Responses are delivered “after the fact”
• Maximum value of the identified situation is lost
• Decision are made on old and stale data
• “Data a Rest”
Introduction to Streaming Analytics
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Hadoop Cluster
Customer 360° View – Flow with Big Data Hub
BI	Tools
Enterprise Data
Warehouse
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Event
Hub
Call
Center
Weather
Data
Mobile
Apps
SQL
Search
Online	&	Mobile	
Apps
Search
Data Flow
NoSQL
Parallel
Processing
Distributed
Filesystem
• Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Introduction to Streaming Analytics
The New Era: Streaming Data Analytics / Fast Data
• Events are analyzed and processed in
real-time as the arrive
• Decisions are timely, contextual and
based on fresh data
• Decision latency is eliminated
• “Data in motion”
Introduction to Streaming Analytics
Hadoop Clusterd
Hadoop Cluster
Hadoop Cluster
Customer Event Hub – taking Velocity into account
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Event
Hub
Event
Hub
Event
Hub
NoSQL
Parallel
Processing
Distributed
Filesystem
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI	Tools
Enterprise Data
Warehouse
Search
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Introduction to Streaming Analytics
When to Stream / When not?
Introduction to Streaming Analytics
Constant	low
Milliseconds	&	under
Low	milliseconds	to	seconds,
delay	in	case	of	failures
10s	of	seconds	of	more,
Re-run	in	case	of	failures
Real-Time Near-Real-Time Batch
Source: adapted from Cloudera
“No free lunch”
Introduction to Streaming Analytics
Constant	low
Milliseconds	&	under
Low	milliseconds	to	seconds,
delay	in	case	of	failures
10s	of	seconds	of	more,
Re-run	in	case	of	failures
Real-Time Near-Real-Time Batch
“Difficult”	architectures,	lower	latency “Easier	architectures”,	higher	latency
Real Time Analytics Use Cases
• Algorithmic Trading
• Online Fraud Detection
• Geo Fencing
• Proximity/Location Tracking
• Intrusion detection systems
• Traffic Management
• Recommendations
• Churn detection
• Internet of Things (IoT) / Intelligence
Sensors
• Social Media/Data Analytics
• Gaming Data Feed
• …
Introduction to Streaming Analytics
Introduction to Streaming Analytics
Concepts (1 – 11)
1) Categories of Stream/Event Processing
Simple Event Processing (SEP)
Event Stream Processing (ESP)
Introduction to Streaming Analytics
1) Categories of Stream/Event Processing
Complex Event Processing (CEP)
Introduction to Streaming Analytics
2) Streaming Model – Continuous Streaming
Ingestion
Event
Source
Event
Source
Event
Source
Introduction to Streaming Analytics
Individual	Event
PPPPPPPPPPPP
• Events	processed	as	they	arrive
• low-latency
• Less	throughput
• fault	tolerance	expensive
2) Streaming Model - Micro-Batch
Ingestion
Event
Source
Event
Source
Event
Source
Introduction to Streaming Analytics
PPPPPP
• Splits	incoming	stream	in	small	
batches
• Could	provide	high(er)	throughput
• Fault	tolerance	easier	to	achieve
• Higher	latency
3) Delivery Guarantees
Introduction to Streaming Analytics
• At most once (fire-and-forget) - means the message is sent, but the sender
doesn’t care if it’s received or lost. Imposes no additional overhead to ensure
message delivery, such as requiring acknowledgments from consumers. Hence, it
is the easiest and most performant behavior to support.
• At least once - means that retransmission of a message will occur until an
acknowledgment is received. Since a delayed acknowledgment from the receiver
could be in flight when the sender retransmits the message, the message may be
received one or more times.
• Exactly once - ensures that a message is received once and only once, and is
never lost and never repeated. The system must implement whatever
mechanisms are required to ensure that a message is received and processed
just once
[ 0 | 1 ]
[ 1+ ]
[ 1 ]
4) API
Introduction to Streaming Analytics
Compositional / Programmatic
• Highly customizable operator based
on basic building blocks
• Manual topology definition and
optimization
Declarative
• High-Level, fluent API
• Higher order function as operators
(filter, mapWithState …)
• Logical plan optimization
Statistical
• Data Scientist friendly
• Dynamic type
SQL
• Query language allowing to use stream
in FROM clause
• Extensions supporting windowing,
pattern matching, spatial, …. operators
5) Windowing
Introduction to Streaming Analytics
Due to size and never-ending nature of it, it’s not
feasible to keep entire stream of data in memory
Computations over multiple events can be
done using windows of data
• Sliding Window - uses eviction and trigger
policies that are based on time: window length
and sliding interval length
• Tumbling Window - eviction policy is always
based on the window being full and the trigger
policy is based on either the count of items in
the window or time
6) Pattern Matching
Introduction to Streaming Analytics
• Streaming Data often contains interesting patterns that only emerge as new
streaming data arrives, e.g.
• Absence Pattern: event A not followed by event B within time window
• Sequence Pattern: event A followed by event B followed by event C
• Increasing Pattern: up trend of a value of a certain attribute
• Decreasing Pattern: down trend of a value of a certain attribute
• …
• Pattern operators allow developers to define complex relationships between
streaming events
7) Back Pressure
Introduction to Streaming Analytics
• Backpressure refers to the situation where a
system is receiving data at a higher rate than it can
process
• Backpressure, if not dealt with correctly, can lead
to exhaustion of resources, or even, in the worst
case, data loss
• A slow consumer should slow down the producer
Other Concepts
Introduction to Streaming Analytics
7) Scalability
• Scale-Up and/or Scale-Out
8) Resource Management
• Proprietary Server, YARN, Mesos,
….
7) Auto-Scaling:
• Does platform support scale-up
and scale-down without stopping
8) In-Flight Modifications
• Are modifications on running topology
supported without redeployment
9) Contributors
• How many active contributors
10) Language Options
• What languages can be used
11) ”Self-Service”
• Business user friendliness
Introduction to Streaming Analytics
Stream Processing Platforms
Stream Processing & Analytics
Complex	Event	Processing
Simple	Event	Processing
Open	Source Closed	Source
Introduction to Streaming Analytics
Event	Stream	Processing
Source: adapted from Tibco
Oracle Stream Analytics
What it does
• Compelling, friendly and visually stunning real time
streaming analytics user experience for Business users to
dynamically create and implement Instant Insight solutions
Key Features
• Analyze simulated or live data feeds to determine event
patterns, correlation, aggregation & filtering
• Pattern library for industry specific solutions
• Streams, References, Maps & Explorations
Benefits
• Accelerated delivery time
• Hides all challenges & complexities of underlying real-time
event-driven infrastructure
Introduction to Streaming Analytics
Oracle Stream Analytics
Oracle	Complex	Event	
Processing	(OCEP)
Oracle	Event	Processing	(OEP)
Oracle	Stream	Explorer	(SX)
Oracle	Event	Processing	
for	Java	Embedded
Oracle	Stream	Analytics	(OSA)
Oracle	Edge	Analytics	(OAE)
BEA	Weblogic Event	Server
Oracle	CQL
Oracle	IoT Cloud	Service
2016
2015
2007
2008
2012
2013
Introduction to Streaming Analytics
OEA
• Filtering
• Correlation
• Aggregation
• Pattern
matching
Devices /
Gateways
Services
Computing Edge Enterprise
“Sea of data”
Macro-event
High-value
Actionable
In-context
EDGE
Analytics
Stream	
Analytics
FOG
• High Volume
• Continuous Streaming
• Extreme Low Latency
• Disparate Sources
• Temporal Processing
• Pattern Matching
• Machine Learning
Oracle Stream Analytics
• High	Volume
• Continuous	Streaming
• Sub-Millisecond	Latency
• Disparate	Sources
• Time-Window	Processing
• Pattern	Matching
• High	Availability	/	Scalability
• Coherence	Integration	
• Geospatial,	Geofencing
• Big	Data	Integration
• Business	Event	Visualization
• Action!
Introduction to Streaming Analytics
Source: Oracle
Oracle Stream Analytics
Introduction to Streaming Analytics
Oracle Stream Analytics
Introduction to Streaming Analytics
Category CEP
Streaming	Model Continuous Streaming
Delivery	
Guarantees
At Least	Once
API Declarative/SQL
Windowing Yes
Pattern	detection Yes
Back	Pressure No
Scalability Scale-up
Resource
Management
OEP server,	Spark	
Streaming
Auto-Scaling no
In-Flight	
modifications
no
Contributors n.a.
Language Options Java,	CQL
”self-service” Yes
Apache Spark
Apache Spark is a fast and general engine for large-scale data processing
• The hot trend in Big Data!
• Originally developed 2009 in UC Berkley’s AMPLab
• Based on 2007 Microsoft Dryad paper
• Written in Scala, supports Java, Python, SQL and R
• Can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x
faster on disk
• One of the largest OSS communities in big data with over 200 contributors in 50+
organizations
• Open Sourced in 2010 – since 2014 part of Apache Software foundation
Introduction to Streaming Analytics
Apache Spark – Stack
Spark	SQL
(Batch	Processing)
Blink	DB
(Approximate
Querying)
Spark	Streaming
(Real-Time)
MLlib,	Spark	R
(Machine	Learning)
GraphX
(Graph	Processing)
Spark	Core	API	and	Execution	Model
Spark
Standalone
MESOS YARN HDFS
Elastic
Search
NoSQL S3
Libraries
Core	Runtime
Cluster	Resource	Managers Data	Stores
Introduction to Streaming Analytics
Apache Spark - Batch vs. Real-Time Processing
Petabytes	of	Data
Gigabytes
Per	Second
Introduction to Streaming Analytics
Apache Spark – Resilient Distributed Dataset (RDD)
RDD
Input Source
• File
• Database
• Stream
• Collection
.count() ->	100
Data
Introduction to Streaming Analytics
Stage 1 – reduceByKey()
Stage 1 – flatMap() + map()
Apache Spark -
Workflow
Input	HDFS	File
HadoopRDD
MappedRDD
ShuffledRDD
Text	File	Output
sc.hapoopFile()
map()
reduceByKey()
sc.saveAsTextFile()
Transformations
(Lazy)
Action	
(Execute	
Transformations)
Master
MappedRDD
P0
P1
P3
ShuffledRDD
P0
MappedRDD
flatMap()
DAG	
Scheduler
Introduction to Streaming Analytics
Apache Spark Streaming
DStream DStream
X	Seconds
Transform
.countByValue()
.reduceByKey()
.join
.map
Introduction to Streaming Analytics
Apache Spark Streaming
time	1 time	2 time	3
message
time	n….
f(message 1)
RDD	@time	1
f(message 2)
f(message n)
….
message 1
RDD	@time	1
message 2
message n
….
result 1
result 2
result n
….
message message message
f(message 1)
RDD	@time	2
f(message 2)
f(message n)
….
message 1
RDD	@time	2
message 2
message n
….
result 1
result 2
result n
….
f(message 1)
RDD	@time	3
f(message 2)
f(message n)
….
message 1
RDD	@time	3
message 2
message n
….
result 1
result 2
result n
….
f(message 1)
RDD	@time	n
f(message 2)
f(message n)
….
message 1
RDD	@time	n
message 2
message n
….
result 1
result 2
result n
….
Input	Stream
Event	DStream
MappedDStream
map()
saveAsHadoopFiles()
Time	Increasing
DStreamTransformation	Lineage
Actions	Trigger	
Spark	Jobs
Adapted	from	Chris	Fregly: http://slidesha.re/11PP7FV
Introduction to Streaming Analytics
Apache Spark Streaming
Introduction to Streaming Analytics
Spark Streaming
Introduction to Streaming Analytics
Category ESP
Streaming	Model Continuous	Streaming
Delivery	
Guarantees
Exactly Once
API Declarative/SQL
Windowing Yes
Pattern	detection No
Back	Pressure Yes
Scalability Scale-out
Resource
Management
YARN,	Mesos,
Standalone
Auto-Scaling yes
In-Flight	
modifications
no
Contributors >	280	contributors
Language Options Java,	Scala, Python
”self-service” No
Apache Storm
A platform for doing analysis on streams of data as they come in, so you can react to
data as it happens.
• highly distributed real-time computation system
• Provides general primitives to do
real-time computation
• To simplify working with queues & workers
• scalable and fault-tolerant
Originated at Backtype, acquired by Twitter in 2011
Open Sourced late 2011
Part of Apache since September 2013
Introduction to Streaming Analytics
Apache Storm
Tuple
• Immutable Set of Key/value pairs
Stream
• an unbounded sequence of tuples that can be processed in parallel by Storm
Topology
• Wires data and functions via a DAG (directed acyclic graph)
• Executes on many machines similar to a MR job in Hadoop
Spout
• Source of data streams (tuples)
• can be run in “reliable” and “unreliable” mode
Bolt
• Consumes 1+ streams and produces new streams
• Complex operations often require multiple
steps and thus multiple bolts
Spout
Spout
Bolt
Bolt
Bolt
Bolt
Source	of	
Stream	B
Subscribes:	A
Emits:		C
Subscribes:	A
Emits:		D
Subscribes:	A	&	B
Emits:		-
Subscribes:	C	&	D
Emits:		-
T T T T T T T T
Introduction to Streaming Analytics
Apache Storm
Introduction to Streaming Analytics
Apache Storm
Introduction to Streaming Analytics
Category ESP
Streaming	Model Event-Streaming
Delivery	
Guarantees
At	most	once /	At	
least	once
API Compositional
Windowing No
Pattern	detection No
Back	Pressure Yes
Scalability Scale-out
Resource
Management
Storm Cluster,	YARN
Auto-Scaling no
In-Flight	
modifications
partial
Contributors > 100	contributors
Language Options Java,	Clojure, Scala,	…
”self-service” No
Introduction to Streaming Analytics
Summary
Comparison
Oracle	Stream Analytics Spark	Streaming Spark	Storm
Category CEP ESP ESP
Streaming	Model Continuous Streaming Continuous	Streaming Event-Streaming
Delivery	Guarantees At Least	Once Exactly Once At	most	once /	At	least	once
API Declarative/SQL Declarative/SQL Compositional
Windowing Yes Yes No
Pattern	detection Yes No No
Back	Pressure No Yes Yes
Scalability Scale-up Scale-out Scale-out
Resource Management OEP server,	Spark	Streaming YARN,	Mesos, Standalone Storm Cluster,	YARN
Auto-Scaling no yes no
In-Flight	modifications no no partial
Contributors n.a. >	280	contributors > 100	contributors
Language Options Java,	CQL Java,	Scala, Python Java,	Clojure, Scala,	…
”self-service” Yes No No
Introduction to Streaming Analytics
Summary
More and more use cases (such as IoT) make Streaming Analytics necessary
Treat events as events! Infrastructures for handling lots of events are available!
Platforms such as Oracle Stream Analytics enable the business to work directly on
streaming data (empower the business analyst) => User Experience of an Excel Sheet
on streaming data
Platform such as Apache Strom and Apache Spark Streaming provide a highly-scalable
and fault-tolerant infrastructure for streaming analytics => Oracle Stream Analytics can
use Spark Streaming as the runtime infrastructure
Platforms such as Kafka provide a high volume event broker infrastructure, a.k.a. Event
Hub
Introduction to Streaming Analytics
Trivadis @ DOAG 2016
Booth: 3rd Floor – next to the escalator
Know how, T-Shirts, Contest and Trivadis Power to go
We look forward to your visit
Because with Trivadis you always win !
Introduction to Streaming Analytics

More Related Content

What's hot

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleDatabricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Customer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer ExperiencesCustomer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer ExperiencesInformatica
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 

What's hot (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Customer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer ExperiencesCustomer-Centric Data Management for Better Customer Experiences
Customer-Centric Data Management for Better Customer Experiences
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 

Viewers also liked

Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...DataStax Academy
 

Viewers also liked (6)

Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Big Data Technology
Big Data TechnologyBig Data Technology
Big Data Technology
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
 

Similar to Introduction to Streaming Analytics Platforms

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Oracle Stream Explorer
Oracle Stream ExplorerOracle Stream Explorer
Oracle Stream ExplorerTrivadis
 
Oracle Stream Explorer - Simplifying Event/Stream Processing
Oracle Stream Explorer - Simplifying Event/Stream ProcessingOracle Stream Explorer - Simplifying Event/Stream Processing
Oracle Stream Explorer - Simplifying Event/Stream ProcessingGuido Schmutz
 
Oracle Stream Explorer Guido Schmutz
Oracle Stream Explorer Guido SchmutzOracle Stream Explorer Guido Schmutz
Oracle Stream Explorer Guido SchmutzDésirée Pfister
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 

Similar to Introduction to Streaming Analytics Platforms (20)

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Oracle Stream Explorer
Oracle Stream ExplorerOracle Stream Explorer
Oracle Stream Explorer
 
Oracle Stream Explorer - Simplifying Event/Stream Processing
Oracle Stream Explorer - Simplifying Event/Stream ProcessingOracle Stream Explorer - Simplifying Event/Stream Processing
Oracle Stream Explorer - Simplifying Event/Stream Processing
 
Oracle Stream Explorer Guido Schmutz
Oracle Stream Explorer Guido SchmutzOracle Stream Explorer Guido Schmutz
Oracle Stream Explorer Guido Schmutz
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
inmation Presentation
inmation Presentationinmation Presentation
inmation Presentation
 

More from Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Introduction to Streaming Analytics Platforms

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Introduction to Streaming Analytics Guido Schmutz @gschmutz
  • 2. Guido Schmutz Working for Trivadis for more than 19 years Oracle ACE Director for Fusion Middleware and SOA Co-Author of different books Consultant, Trainer, Software Architect for Java, SOA & Big Data / Fast Data Member of Trivadis Architecture Board Technology Manager @ Trivadis More than 25 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Introduction to Streaming Analytics
  • 3. Our company. Introduction to Streaming Analytics Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 4. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. Introduction to Streaming Analytics 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  • 5. Introduction to Streaming Analytics Technology on its own won't help you. You need to know how to use it properly.
  • 6. Agenda Introduction to Streaming Analytics 1. Introduction / Motivation 2. Concepts (1 -11) 3. Stream Processing Platforms
  • 7. Introduction to Streaming Analytics Introduction / Motivation
  • 8. Traditional Data Processing - Challenges • Introduces too much “decision latency” • Responses are delivered “after the fact” • Maximum value of the identified situation is lost • Decision are made on old and stale data • “Data a Rest” Introduction to Streaming Analytics
  • 9. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Hadoop Cluster Customer 360° View – Flow with Big Data Hub BI Tools Enterprise Data Warehouse Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Event Hub Call Center Weather Data Mobile Apps SQL Search Online & Mobile Apps Search Data Flow NoSQL Parallel Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing Introduction to Streaming Analytics
  • 10. The New Era: Streaming Data Analytics / Fast Data • Events are analyzed and processed in real-time as the arrive • Decisions are timely, contextual and based on fresh data • Decision latency is eliminated • “Data in motion” Introduction to Streaming Analytics
  • 11. Hadoop Clusterd Hadoop Cluster Hadoop Cluster Customer Event Hub – taking Velocity into account Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Event Hub Event Hub Event Hub NoSQL Parallel Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search Online & Mobile Apps File Import / SQL Import Weather Data Introduction to Streaming Analytics
  • 12. When to Stream / When not? Introduction to Streaming Analytics Constant low Milliseconds & under Low milliseconds to seconds, delay in case of failures 10s of seconds of more, Re-run in case of failures Real-Time Near-Real-Time Batch Source: adapted from Cloudera
  • 13. “No free lunch” Introduction to Streaming Analytics Constant low Milliseconds & under Low milliseconds to seconds, delay in case of failures 10s of seconds of more, Re-run in case of failures Real-Time Near-Real-Time Batch “Difficult” architectures, lower latency “Easier architectures”, higher latency
  • 14. Real Time Analytics Use Cases • Algorithmic Trading • Online Fraud Detection • Geo Fencing • Proximity/Location Tracking • Intrusion detection systems • Traffic Management • Recommendations • Churn detection • Internet of Things (IoT) / Intelligence Sensors • Social Media/Data Analytics • Gaming Data Feed • … Introduction to Streaming Analytics
  • 15. Introduction to Streaming Analytics Concepts (1 – 11)
  • 16. 1) Categories of Stream/Event Processing Simple Event Processing (SEP) Event Stream Processing (ESP) Introduction to Streaming Analytics
  • 17. 1) Categories of Stream/Event Processing Complex Event Processing (CEP) Introduction to Streaming Analytics
  • 18. 2) Streaming Model – Continuous Streaming Ingestion Event Source Event Source Event Source Introduction to Streaming Analytics Individual Event PPPPPPPPPPPP • Events processed as they arrive • low-latency • Less throughput • fault tolerance expensive
  • 19. 2) Streaming Model - Micro-Batch Ingestion Event Source Event Source Event Source Introduction to Streaming Analytics PPPPPP • Splits incoming stream in small batches • Could provide high(er) throughput • Fault tolerance easier to achieve • Higher latency
  • 20. 3) Delivery Guarantees Introduction to Streaming Analytics • At most once (fire-and-forget) - means the message is sent, but the sender doesn’t care if it’s received or lost. Imposes no additional overhead to ensure message delivery, such as requiring acknowledgments from consumers. Hence, it is the easiest and most performant behavior to support. • At least once - means that retransmission of a message will occur until an acknowledgment is received. Since a delayed acknowledgment from the receiver could be in flight when the sender retransmits the message, the message may be received one or more times. • Exactly once - ensures that a message is received once and only once, and is never lost and never repeated. The system must implement whatever mechanisms are required to ensure that a message is received and processed just once [ 0 | 1 ] [ 1+ ] [ 1 ]
  • 21. 4) API Introduction to Streaming Analytics Compositional / Programmatic • Highly customizable operator based on basic building blocks • Manual topology definition and optimization Declarative • High-Level, fluent API • Higher order function as operators (filter, mapWithState …) • Logical plan optimization Statistical • Data Scientist friendly • Dynamic type SQL • Query language allowing to use stream in FROM clause • Extensions supporting windowing, pattern matching, spatial, …. operators
  • 22. 5) Windowing Introduction to Streaming Analytics Due to size and never-ending nature of it, it’s not feasible to keep entire stream of data in memory Computations over multiple events can be done using windows of data • Sliding Window - uses eviction and trigger policies that are based on time: window length and sliding interval length • Tumbling Window - eviction policy is always based on the window being full and the trigger policy is based on either the count of items in the window or time
  • 23. 6) Pattern Matching Introduction to Streaming Analytics • Streaming Data often contains interesting patterns that only emerge as new streaming data arrives, e.g. • Absence Pattern: event A not followed by event B within time window • Sequence Pattern: event A followed by event B followed by event C • Increasing Pattern: up trend of a value of a certain attribute • Decreasing Pattern: down trend of a value of a certain attribute • … • Pattern operators allow developers to define complex relationships between streaming events
  • 24. 7) Back Pressure Introduction to Streaming Analytics • Backpressure refers to the situation where a system is receiving data at a higher rate than it can process • Backpressure, if not dealt with correctly, can lead to exhaustion of resources, or even, in the worst case, data loss • A slow consumer should slow down the producer
  • 25. Other Concepts Introduction to Streaming Analytics 7) Scalability • Scale-Up and/or Scale-Out 8) Resource Management • Proprietary Server, YARN, Mesos, …. 7) Auto-Scaling: • Does platform support scale-up and scale-down without stopping 8) In-Flight Modifications • Are modifications on running topology supported without redeployment 9) Contributors • How many active contributors 10) Language Options • What languages can be used 11) ”Self-Service” • Business user friendliness
  • 26. Introduction to Streaming Analytics Stream Processing Platforms
  • 27. Stream Processing & Analytics Complex Event Processing Simple Event Processing Open Source Closed Source Introduction to Streaming Analytics Event Stream Processing Source: adapted from Tibco
  • 28. Oracle Stream Analytics What it does • Compelling, friendly and visually stunning real time streaming analytics user experience for Business users to dynamically create and implement Instant Insight solutions Key Features • Analyze simulated or live data feeds to determine event patterns, correlation, aggregation & filtering • Pattern library for industry specific solutions • Streams, References, Maps & Explorations Benefits • Accelerated delivery time • Hides all challenges & complexities of underlying real-time event-driven infrastructure Introduction to Streaming Analytics
  • 30. OEA • Filtering • Correlation • Aggregation • Pattern matching Devices / Gateways Services Computing Edge Enterprise “Sea of data” Macro-event High-value Actionable In-context EDGE Analytics Stream Analytics FOG • High Volume • Continuous Streaming • Extreme Low Latency • Disparate Sources • Temporal Processing • Pattern Matching • Machine Learning Oracle Stream Analytics • High Volume • Continuous Streaming • Sub-Millisecond Latency • Disparate Sources • Time-Window Processing • Pattern Matching • High Availability / Scalability • Coherence Integration • Geospatial, Geofencing • Big Data Integration • Business Event Visualization • Action! Introduction to Streaming Analytics Source: Oracle
  • 31. Oracle Stream Analytics Introduction to Streaming Analytics
  • 32. Oracle Stream Analytics Introduction to Streaming Analytics Category CEP Streaming Model Continuous Streaming Delivery Guarantees At Least Once API Declarative/SQL Windowing Yes Pattern detection Yes Back Pressure No Scalability Scale-up Resource Management OEP server, Spark Streaming Auto-Scaling no In-Flight modifications no Contributors n.a. Language Options Java, CQL ”self-service” Yes
  • 33. Apache Spark Apache Spark is a fast and general engine for large-scale data processing • The hot trend in Big Data! • Originally developed 2009 in UC Berkley’s AMPLab • Based on 2007 Microsoft Dryad paper • Written in Scala, supports Java, Python, SQL and R • Can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk • One of the largest OSS communities in big data with over 200 contributors in 50+ organizations • Open Sourced in 2010 – since 2014 part of Apache Software foundation Introduction to Streaming Analytics
  • 34. Apache Spark – Stack Spark SQL (Batch Processing) Blink DB (Approximate Querying) Spark Streaming (Real-Time) MLlib, Spark R (Machine Learning) GraphX (Graph Processing) Spark Core API and Execution Model Spark Standalone MESOS YARN HDFS Elastic Search NoSQL S3 Libraries Core Runtime Cluster Resource Managers Data Stores Introduction to Streaming Analytics
  • 35. Apache Spark - Batch vs. Real-Time Processing Petabytes of Data Gigabytes Per Second Introduction to Streaming Analytics
  • 36. Apache Spark – Resilient Distributed Dataset (RDD) RDD Input Source • File • Database • Stream • Collection .count() -> 100 Data Introduction to Streaming Analytics
  • 37. Stage 1 – reduceByKey() Stage 1 – flatMap() + map() Apache Spark - Workflow Input HDFS File HadoopRDD MappedRDD ShuffledRDD Text File Output sc.hapoopFile() map() reduceByKey() sc.saveAsTextFile() Transformations (Lazy) Action (Execute Transformations) Master MappedRDD P0 P1 P3 ShuffledRDD P0 MappedRDD flatMap() DAG Scheduler Introduction to Streaming Analytics
  • 38. Apache Spark Streaming DStream DStream X Seconds Transform .countByValue() .reduceByKey() .join .map Introduction to Streaming Analytics
  • 39. Apache Spark Streaming time 1 time 2 time 3 message time n…. f(message 1) RDD @time 1 f(message 2) f(message n) …. message 1 RDD @time 1 message 2 message n …. result 1 result 2 result n …. message message message f(message 1) RDD @time 2 f(message 2) f(message n) …. message 1 RDD @time 2 message 2 message n …. result 1 result 2 result n …. f(message 1) RDD @time 3 f(message 2) f(message n) …. message 1 RDD @time 3 message 2 message n …. result 1 result 2 result n …. f(message 1) RDD @time n f(message 2) f(message n) …. message 1 RDD @time n message 2 message n …. result 1 result 2 result n …. Input Stream Event DStream MappedDStream map() saveAsHadoopFiles() Time Increasing DStreamTransformation Lineage Actions Trigger Spark Jobs Adapted from Chris Fregly: http://slidesha.re/11PP7FV Introduction to Streaming Analytics
  • 40. Apache Spark Streaming Introduction to Streaming Analytics
  • 41. Spark Streaming Introduction to Streaming Analytics Category ESP Streaming Model Continuous Streaming Delivery Guarantees Exactly Once API Declarative/SQL Windowing Yes Pattern detection No Back Pressure Yes Scalability Scale-out Resource Management YARN, Mesos, Standalone Auto-Scaling yes In-Flight modifications no Contributors > 280 contributors Language Options Java, Scala, Python ”self-service” No
  • 42. Apache Storm A platform for doing analysis on streams of data as they come in, so you can react to data as it happens. • highly distributed real-time computation system • Provides general primitives to do real-time computation • To simplify working with queues & workers • scalable and fault-tolerant Originated at Backtype, acquired by Twitter in 2011 Open Sourced late 2011 Part of Apache since September 2013 Introduction to Streaming Analytics
  • 43. Apache Storm Tuple • Immutable Set of Key/value pairs Stream • an unbounded sequence of tuples that can be processed in parallel by Storm Topology • Wires data and functions via a DAG (directed acyclic graph) • Executes on many machines similar to a MR job in Hadoop Spout • Source of data streams (tuples) • can be run in “reliable” and “unreliable” mode Bolt • Consumes 1+ streams and produces new streams • Complex operations often require multiple steps and thus multiple bolts Spout Spout Bolt Bolt Bolt Bolt Source of Stream B Subscribes: A Emits: C Subscribes: A Emits: D Subscribes: A & B Emits: - Subscribes: C & D Emits: - T T T T T T T T Introduction to Streaming Analytics
  • 44. Apache Storm Introduction to Streaming Analytics
  • 45. Apache Storm Introduction to Streaming Analytics Category ESP Streaming Model Event-Streaming Delivery Guarantees At most once / At least once API Compositional Windowing No Pattern detection No Back Pressure Yes Scalability Scale-out Resource Management Storm Cluster, YARN Auto-Scaling no In-Flight modifications partial Contributors > 100 contributors Language Options Java, Clojure, Scala, … ”self-service” No
  • 46. Introduction to Streaming Analytics Summary
  • 47. Comparison Oracle Stream Analytics Spark Streaming Spark Storm Category CEP ESP ESP Streaming Model Continuous Streaming Continuous Streaming Event-Streaming Delivery Guarantees At Least Once Exactly Once At most once / At least once API Declarative/SQL Declarative/SQL Compositional Windowing Yes Yes No Pattern detection Yes No No Back Pressure No Yes Yes Scalability Scale-up Scale-out Scale-out Resource Management OEP server, Spark Streaming YARN, Mesos, Standalone Storm Cluster, YARN Auto-Scaling no yes no In-Flight modifications no no partial Contributors n.a. > 280 contributors > 100 contributors Language Options Java, CQL Java, Scala, Python Java, Clojure, Scala, … ”self-service” Yes No No Introduction to Streaming Analytics
  • 48. Summary More and more use cases (such as IoT) make Streaming Analytics necessary Treat events as events! Infrastructures for handling lots of events are available! Platforms such as Oracle Stream Analytics enable the business to work directly on streaming data (empower the business analyst) => User Experience of an Excel Sheet on streaming data Platform such as Apache Strom and Apache Spark Streaming provide a highly-scalable and fault-tolerant infrastructure for streaming analytics => Oracle Stream Analytics can use Spark Streaming as the runtime infrastructure Platforms such as Kafka provide a high volume event broker infrastructure, a.k.a. Event Hub Introduction to Streaming Analytics
  • 49. Trivadis @ DOAG 2016 Booth: 3rd Floor – next to the escalator Know how, T-Shirts, Contest and Trivadis Power to go We look forward to your visit Because with Trivadis you always win ! Introduction to Streaming Analytics