The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years.
This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture.
Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.
Gen AI in Business - Global Trends Report 2024.pdf
Big Data Architectures
1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Big Data Solution Architectures
29.9.2016 – DOAG 2016 Big Data Days
Guido Schmutz
Trivadis
2. Guido Schmutz
Working for Trivadis for more than 19 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer, Software Architect for Java, SOA & Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 25 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
29.9.2016 Big Data Solution Architectures2
3. Agenda
Big Data Solution Architectures3 29.9.2016
1. Introduction
2. Big Data Reference Architectures
• Traditional Big Data
• Event / Stream-Processing
• Lambda Architecture
• Kappa Architecture
• Unified Architecture
3. Big Data Ecosystem – many choices sorted!
5. Why talking about Big Data Architectures
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the
right/matching technologies
Big Data Solution Architectures29.9.20165
6. How to do Big Data? Why is a structure / architecture
important
Big Data Solution Architectures29.9.20166
7. Big Data Ecosystem – many choices sorted!
Big Data Solution Architectures29.9.20167
8. Important Properties for choosing (Big) Data Architecture
Latency
Keep raw and un-interpreted data “forever” ?
Volume, Velocity, Variety, Veracity
Ad-Hoc Query Capabilities needed ?
Robustness & Fault Tolerance
Scalability
…
Big Data Solution Architectures29.9.20169
9. Big Data Reference Architectures -
Traditional Big Data
Big Data Solution Architectures29.9.201610
10. “Traditional Architecture” for Big Data
Data
Ingestion
(Analytical) Data Processing
Result StoreData
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Big Data Solution Architectures
Pulling
Ingestion
Channel
29.9.201611
11. “Traditional Architecture” for Big Data – Hadoop
Technology Mapping
Data
Ingestion
(Analytical) Data Processing
Result StoreData
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Big Data Solution Architectures
Pulling
Ingestion
Channel
29.9.201612
12. “Traditional Architecture” for Big Data – Spark
Technology Mapping
Data
Ingestion
(Analytical) Data Processing
Result StoreData
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Big Data Solution Architectures
Pulling
Ingestion
Channel
29.9.201613
13. “Traditional Architecture” for Big Data – Feeding in High-
Volume Event Streams
Data
Ingestion
(Analytical) Data Processing
Result StoreData
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Big Data Solution Architectures
Pulling
Ingestion
Channel
?
?
29.9.201614
14. Traditional Architecture for Big Data
• Batch Processing - “Data at Rest”
• Not for low latency use cases
• Responses are delivered “after the fact”
• Maximum value of the identified situation is lost
• Decision are made on old and stale data
• Spar Core is a faster alternative to Hadoop Map
Reduce, but still Batch Processing
• Spark Ecosystems offers a lot of additional
advanced analytic capabilities (machine learning,
graph processing, …)
Big Data Solution Architectures29.9.201615
15. Big Data Reference Architectures –
Event/Stream Processing
Big Data Solution Architectures29.9.201616
16. Event / Stream Processing – “Data in Motion”
“Data in motion”
Events are analyzed and processed in real-
time as the arrive
Decisions are timely, contextual and based
on fresh data
Decision latency is eliminated
Big Data Solution Architectures29.9.201617
17. Event / Stream Processing Architecture
Data
Ingestion
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Result Store
Messaging
Result Store
Big Data Solution Architectures
= Data in Motion = Data at Rest
29.9.201618
19. Challenges for Ingesting Sensor Data
Big Data Solution Architectures
Multitude of sensors
Real-Time Streaming
Multiple Firmware versions
Bad Data from damaged sensors
Regulatory Constraints
Data Quality
20 29.9.2016
20. SQL Polling
Change Data Capture
(CDC)
File Stream (File Tailing)
File Stream (Streaming
Appender)
Enabling Continuous Data Ingestion
Sensor Stream
Big Data Solution Architectures21 29.9.2016
21. Event / Stream Processing Architecture – Open Source
Technology Mapping
Data
Ingestion
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Result Store
Messaging
Result Store
Big Data Solution Architectures
= Data in Motion = Data at Rest
29.9.201622
22. Event / Stream Processing Architecture – Oracle
Technology Mapping
Data
Ingestion
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Result Store
Messaging
Result Store
Big Data Solution Architectures
= Data in Motion = Data at Rest
29.9.201623
23. Event / Stream Processing Architecture
The solution for low latency use cases
Process each event separately => low latency
Process events in micro-batches => increases latency but offers better
reliability
Previously known as “Complex Event Processing”
Keep the data moving / Data in Motion instead of Data at Rest => raw events
were not stored
Big Data Solution Architectures29.9.201624
24. Event / Stream Processing Architecture - Keep raw
event data
Data
Ingestion
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Result Store
Messaging
Result Store
(Analytical) Batch Data Processing
Raw Data
(Reservoir)
Big Data Solution Architectures
= Data in Motion = Data at Rest
29.9.201625
25. Big Data Reference Architectures -
Lambda Architecture for Big Data
Big Data Solution Architectures29.9.201626
26. “Lambda Architecture” for Big Data
Data
Ingestion
(Analytical) Batch Data Processing
Batch
compute
Result StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Batch
compute
Messaging
Result Store
Query
Engine
Result Store
Computed
Information
Raw Data
(Reservoir)
Big Data Solution Architectures
= Data in Motion = Data at Rest
Pulling
Ingestion
29.9.201627
27. “Lambda Architecture” for Big Data
Data
Ingestion
(Analytical) Batch Data Processing
Batch
compute
Result StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Batch
compute
Messaging
Result Store
Query
Engine
Result Store
Computed
Information
Raw Data
(Reservoir)
Big Data Solution Architectures
= Data in Motion = Data at Rest
Pulling
Ingestion
29.9.201628
28. Lambda Architecture for Big Data
Combines (Big) Data at Rest with (Fast) Data in Motion
Closes the gap from high-latency batch processing
Keeps the raw information forever
Makes it possible to rerun analytics operations on whole data set if necessary
=> because the old run had an error or
=> because we have found a better algorithm we want to apply
Have to implement functionality twice
• Once for batch
• Once for real-time streaming
Big Data Solution Architectures29.9.201629
29. Big Data Reference Architectures -
„Kappa“ Architecture
Big Data Solution Architectures29.9.201630
30. “Kappa Architecture” for Big Data
Data
Ingestion
“Raw Data Reservoir”
Batch
compute
Data
Sources
Messaging
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Result Store
Messaging
Result Store
Raw Data
(Reservoir)
Computed
Information
Big Data Solution Architectures
= Data in Motion = Data at Rest
29.9.201631
31. Big Data Reference Architectures -
„Unified“ Architecture
Big Data Solution Architectures29.9.201632
32. “Unified Architecture” for Big Data
Data
Ingestion
(Analytical) Batch Data Processing (Calculate
Models of incoming data)
Batch
compute
Result StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Batch
compute
Messaging
Result Store
Query
Engine
Result Store
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Prediction
Models
Big Data Solution Architectures29.9.201633
33. Big Data Ecosystem – many
choices sorted!
Big Data Solution Architectures29.9.201634
34. Building Blocks for (Big) Data Processing
Data
Acquisition
Format
File System
Stream Processing
Batch SQL
Graph DBMS
Document
DBMS
Relational
DBMS
Visualization
IoT
Messaging
Analytics
OLAP DBMS
Query
Federation
Table-Style
DBMS
Key Value
DBMS
Batch Processing
In-Memory
Big Data Solution Architectures29.9.201635
35. Big Data Ecosystem – many choices sorted!
Big Data Solution Architectures29.9.201636
37. Organizing NoSQL Datastores – Different Types
Key Value Store
Big Data Solution Architectures38
Wide-column store
Document store
Graph store
29.9.2016
Key Value
K1 V1
K2 V2
K3 V3
Document
{
k1: v1,
k2: v2,
k3: [v1, v2, v3]
}
Rowkey
CK1
RK1
V1
CK2
V2
CK3
V3
CK4
V4
…
…
CK1
RK2
V1
CK4
V4
CK6
V6
…
…
…
…
…
…
CK3
V3
38. Organizing NoSQL Datastores – and the Products
Key Value Store
Big Data Solution Architectures39
Wide-column store
Document store
Graph store
29.9.2016