Today's modern data architectures and the their implementations contain an Event Hub. What are the benefits of placing an Event Hub in a Modern Data (Analytics) Architecture? What exactly is an Event Hub and what capabilities should it provide? Why is Apache Kafka the most popular realization of an Event Hub? These and many other questions will be answered in this session. The talk will start with a vendor-neutral definition of the capabilities of an Event Hub. Then the session will highlight the different architecture styles which can be supported using an Event Hub (Kafka), such as Streaming Data Integration, Stream Analytics and Decoupled Event-Driven Applications and how can these be combined into a unified architecture, making the Event Hub the central nervous system of an enterprise architecture. We will end with an overview of the Kafka ecosystem and a placement of the various components onto the Modern Data (Analytics) Architecture.
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
1. BASEL | BERN | BRUGG | BUCHAREST | COPENHAGEN | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR.
GENEVA | HAMBURG | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH
http://guidoschmutz@wordpress.com@gschmutz
Event Hub in Modern Data Architecture
Guido Schmutz
Data Analytics Summit 2020 – Santa Clara
2. BASEL | BERN | BRUGG | BUKAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENF
HAMBURG | KOPENHAGEN | LAUSANNE | MANNHEIM | MÜNCHEN | STUTTGART | WIEN | ZÜRICH
Guido
Working at Trivadis for more than 23 years
Consultant, Trainer, Platform Architect for Java,
Oracle, SOA and Big Data / Fast Data
Oracle Groundbreaker Ambassador & Oracle ACE
Director
@gschmutz guidoschmutz.wordpress.com
184th
edition
3. Agenda
1. What exactly is an Event Hub?
2. Kafka – the most popular Event Hub
3. Event Hub - core building block of a modern data architecture
4. Event Hub – Kafka Alternatives? Cloud Services?
5. Summary
6. Event Hub – Key Capabilities
1. topic semantics (publish/subscribe) –
message can be consumed by 0 – n
consumers
2. queue semantics – messages can be
consumed by exactly one consumer
3. horizontally scalable – throughput
increases with more resources
4. auto-scaling – up and down-scaling
upon load
5. highly available – no single point of
failure
6. Control/handle back-pressure
7. durable – messages may not be lost
8. schema-less – no knowledge on
message content and format
9. Efficient support of Stream and Batch
Consumers (offline and with large
Backlog)
10. (Unlimited) Retention of messages
(long term storage)
11. Guaranteed ordering of messages
12. Support re-consumption of events
13. Access control – control over who can
produce and consume which events
14. interoperable – support for different
clients
8. Kafka as an Event Hub
Kafka Cluster
Consumer 1 Consume 2r
Broker 1 Broker 2 Broker 3
Zookeeper
Ensemble
ZK 1 ZK 2ZK 3
Schema
Registry
Service 1
Management
Control Center
Kafka Manager
KAdmin
Producer 1 Producer 2
kafkacat
Data Retention:
• Never
• Time (TTL) or Size-based
• Log-Compacted based
1
10
12
3
5
6
7
14
8
9
11
12
10
Producer3Producer3
ConsumerConsumer 3
1. topic semantics
2. queue semantics
3. horizontally scalable
4. auto-scaling
5. highly available
6. back-pressure
7. durable
8. schema-less/opaque
9. Stream and Batch Consumers
10. (Unlimited) Retention
11. Guaranteed ordering
12. re-consumption of events
13. Access Control
14. Interoperable
9. • storage
Kafka as an Event Hub
• Horizontally scalable, guaranteed order of
messages
3
10
1. topic semantics
2. queue semantics
3. horizontally scalable
4. auto-scaling
5. highly available
6. back-pressure
7. durable
8. schema-less/opaque
9. Stream and Batch Consumers
10. (Unlimited) Retention
11. Guaranteed ordering
12. re-consumption of events
13. Access Control
14. Interoperable
10. • Consumer receives messages via polling • no single-point-of-failure, high availability
Kafka as an Event Hub
6
5
7
1. topic semantics
2. queue semantics
3. horizontally scalable
4. auto-scaling
5. highly available
6. back-pressure
7. durable
8. schema-less/opaque
9. Stream and Batch Consumers
10. (Unlimited) Retention
11. Guaranteed ordering
12. re-consumption of events
13. Access Control
14. Interoperable
11. Event Hub – Key Capabilities supported by Kafka
1. topic semantics (publish/subscribe) –
message can be consumed by 0 – n
consumers
2. queue semantics – messages can be
consumed by exactly one consumer
3. horizontally scalable – throughput
increases with more resources
4. auto-scaling – up and down-scaling
upon load
5. highly available – no single point of
failure
6. Control/handle back-pressure
7. durable – messages may not be lost
8. schema-less – no knowledge on
message content and format
9. Efficient support of Stream and Batch
Consumers (offline and with large
Backlog)
10. (Unlimited) Retention of messages
(long term storage)
11. Guaranteed ordering of messages
12. Support re-consumption of events
13. Access control – control over who can
produce and consume which events
14. interoperable – support for different
clients
12. Event Hub - core building
block of a modern data
architecture
15. Stream Data Integration – Kafka Connect / StreamSets
• declarative style, simple data flows
• framework is part of Apache Kafka
• Many connectors available
• Single Message Transforms (SMT)
• GUI-based, drag-and drop Data Flow
Pipelines
• Both stream and batch processing (micro-
batching)
• custom sources, sinks, processors
16. Event Hub
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Using Edge Computing and Stream Data Integration
• MQTT as a gateway
to Kafka
17. Event Hub
Stream
Analytics
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Using Stream Analytics
• Stream-to-Stream Joins
• Stream-to-Table Joins
• Time Windowed State Management
• Event Pattern Detection
• Machine Learning Model Execution
(Inference)
[1]
18. Event Hub
Stream
Analytics
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Using Stream Analytics
• Push results back to new topic so other
interested parties can use it too!
19. Stream Analytics - Kafka Streams
• Programmatic API, “just” a Java library
• Native streaming
• fault-tolerant local state
• Fixed, Sliding and Session Windowing
• Stream-Stream / Stream-Table Joins
• At-least-once and exactly-once
• Stream Processing with zero coding using
SQL-like language (now supporting push
and pull queries)
• built on top of Kafka Streams
• interactive (CLI) and headless (cmd file)
trucking_
driver
Kafka Broker
Java Application
Kafka Streams
ksqlDB
trucking_
driver
Kafka Broker
ksqlDB Engine
Kafka Streams
ksqlDB REST
Commands
ksqlDB CLI
push pull
31. Event Hub
Stream
Analytics
Legacy
App
Machine
IIoT
Stream Data
Integration
Legacy Data Sources
CDC
Stream Data
Integration
CDC
Streaming
Visualize
Stream Data
Integration
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Batch Data
Integration
Data Lake /
DWH
Batch
Visualize
Batch
Analytics
(Right-Time) Legacy Systems
Integration
32. Event Hub
Stream
Analytics
Legacy
App
Machine
IIoT
Stream Data
Integration
Legacy Data Sources
CDC
Stream Data
Integration
CDC
Streaming
Visualize
Stream Data
Integration
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Batch Data
Integration
Stream Data
Integration
NoSQL
RDBMS
Data Lake /
DWH
Batch
Visualize
Batch
Analytics
Micro-Batch
Visualize
Persistent Storage Integration
33. Event Hub
Stream
Analytics
Legacy
App
Machine
IIoT
Stream Data
Integration
Legacy Data Sources
CDC
Stream Data
Integration
CDC
Streaming
Visualize
Stream Data
Integration
Micro
service
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Batch Data
Integration
Stream Data
Integration
NoSQL
RDBMS
Data Lake /
DWH
Batch
Visualize
Batch
Analytics
Micro-Batch
Visualize
Event-Driven Apps
(aka. Microservices)
• Microservice participates as both a
consumer and producer of events
34. Event Hub
Stream
Analytics
Legacy
App
Machine
IIoT
Stream Data
Integration
Legacy Data Sources
CDC
Stream Data
Integration
CDC
Streaming
Visualize
Stream Data
Integration
Micro
service
Micro
service
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Batch Data
Integration
Stream Data
Integration
NoSQL
RDBMS
Data Lake /
DWH
Batch
Visualize
Batch
Analytics
Micro-Batch
Visualize
Event-Driven, highly decoupled Apps
(aka. Microservices)
• 2nd microservice consumes events from 1st microservice
• Bootstrap new microservices from event history
• System-wide CQRS
[3]
35. Event Hub
Stream
Analytics
Legacy
App
Machine
IIoT
Stream Data
Integration
Legacy Data Sources
CDC
Stream Data
Integration
CDC
Streaming
Visualize
Stream Data
Integration
Micro
service
Micro
service
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Batch Data
Integration
Stream Data
Integration
NoSQL
RDBMS
Data Lake /
DWH
Batch
Visualize
Batch
Analytics
Micro-Batch
Visualize
Bi-Directional Legacy Systems
Integration
• 2nd microservice consumes events from 1st microservice
• Bootstrap new microservices from event history
• System-wide CQRS
[4]AQ
36. Event Hub
Stream
Analytics
Legacy
App
Machine
IIoT
Stream Data
Integration
Legacy Data Sources
CDC
Stream Data
Integration
CDC
Streaming
Visualize
Stream Data
Integration
Micro
service
Micro
service
Serverless
FaaS
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Batch Data
Integration
Stream Data
Integration
NoSQL
RDBMS
Data Lake /
DWH
Batch
Visualize
Batch
Analytics
Micro-Batch
Visualize
Serverless/Function as a Service (FaaS)
37. Event Hub
Stream
Analytics
Legacy
App
Machine
IIoT
Stream Data
Integration
Legacy Data Sources
CDC
Stream Data
Integration
CDC
Streaming
Visualize
Stream Data
Integration
Micro
service
Micro
service
Serverless
FaaS
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Batch Data
Integration
Stream Data
Integration
NoSQL
RDBMS
Data Lake /
DWH
Batch
Visualize
Batch
Analytics
Micro-Batch
Visualize
Event Hub becomes the central nervous
system for your information!
38. Event Hub
Stream
Analytics
Legacy
App
Machine
IIoT
Stream Data
Integration
Legacy Data Sources
CDC
Stream Data
Integration
CDC
Streaming
Visualize
Stream Data
Integration
Micro
service
Micro
service
Serverless
FaaS
Stream Data
Integration
Stream Data
Integration
Vehicle
Environ
mental
Streaming Data Sources
Shop
Floor
Gateway
Batch Data
Integration
Stream Data
Integration
NoSQL
RDBMS
Data Lake /
DWH
Batch
Visualize
Batch
Analytics
Micro-Batch
Visualize
Event Hub becomes the central nervous
system for your information!
Log as a first-class citizen!
Turning the database
Inside out!
42. Ref Architecture
Service
Event
Stream
Bulk
Data
Flow
Bulk Source
Event Source
Location
DB
Extract
File
Weather
DB
IoT
Data
Mobile
Apps
Social
File Import / SQL Import
Consumer
BI Apps
Data Science
Workbench
Enterprise
App
Enterprise Data
Warehouse
SQL / Search
SQL
“Native” Raw
RDBMS
“SQL” / Search
Service
Event
Hub
Hadoop ClusterdHadoop ClusterBig Data Platform
SQL
Export
Storage
Storage
Raw
Refined/
UsageOpt
Microservice Cluster
Stream Processing Cluster
Stream
Processor
Model /
State
Edge Node
Rules
Event Hub
Storage
Governance
Data Catalog
Rules
Engine
Parallel
Processing
Query
Engine
Microservice Data
{ }
API
Event
Stream
Modern Data Platform
Event Stream
Event Stream
43. Reference
1. Stream Processing Concepts and Frameworks
2. Streaming Visualization
3. Building event-driven (Micro)Services with Apache Kafka
4. Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka