More Related Content
Similar to Big Data and Fast Data – Big and Fast Combined, is it Possible?
Similar to Big Data and Fast Data – Big and Fast Combined, is it Possible? (20)
More from Guido Schmutz (20)
Big Data and Fast Data – Big and Fast Combined, is it Possible?
- 1. WELCOME
Big Data and Fast Data –
Big and Fast Combined, is it
Possible?
Guido Schmutz
UKOUG Tech 2013
2.12.2013
BASEL
1
BERN
LAUSANNE
ZÜRICH
DÜSSELDORF
FRANKFURT A.M.
FREIBURG I.BR.
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
HAMBURG
MÜNCHEN
STUTTGART
WIEN
- 2. Guido Schmutz
•
•
Working for Trivadis for more than 16 years
Oracle ACE Director for Fusion Middleware and SOA
•
•
Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA
and EDA
•
•
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
•
More than 20 years of software development
experience
•
Contact: guido.schmutz@trivadis.com
•
•
Blog: http://guidoschmutz.wordpress.com
Twitter: gschmutz
2
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 3. Our company
Trivadis is a market leader in IT consulting, system integration,
solution engineering and the provision of IT services focusing
on
and
technologies in Switzerland,
Germany and Austria.
We offer our services in the following strategic business fields:
OPERATION
Trivadis Services takes over the interacting operation of your IT systems.
2013 © Trivadis
Trivadis – the company
02/12/13
- 4. With over 600 specialists and IT experts in your region
Hamburg
Düsseldorf
Frankfurt
Stuttgart
Freiburg
Wien
München
Basel Brugg
Bern
Zurich
Lausanne
2013 © Trivadis
4
Trivadis – the company
02/12/13
12 Trivadis branches and more than
600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 / EUR 4 million
Financially self-supporting and
sustainably profitable
Experience from more than 1,900
projects per year at over 800
customers
- 5. Agenda
1. Big Data, what is it?
2. Motivation
3. The Lambda Architecture
4. Implementing the Lambda Architecture
5. Summary
5
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 6. Big Data Definition (Gartner et al)
Characteristics of Big Data: Its
Volume, Velocity and Variety in
combination
Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing
Velocity
“Traditional” computing in RDBMS
is not scalable enough.
We search for “linear scalability”
“Only … structured information
is not enough” – “95% of produced data in
unstructured”
+ Veracity (IBM) - information uncertainty
+ Time to action ? – Big Data + Event Processing = Fast Data
6
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 7. Big Data Definition (4 Vs)
Characteristics of Big Data: Its Volume,
Velocity and Variety in combination
+ Time to action ? – Big Data + Event
Processing = Fast Data
7
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 8. Volume Development
100
Sensors:
“internet of
things”
6000
Social Media:
video, audio,
text
4000
60
VoIP:
Skype, MSN,
ICQ, ...
2000
40
20
Enterprise Data:
data dictionary,
ERD, ...
0
2005
2007
2009
2011
Year
8
80
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
2013
2015
0
Aggregate Uncertainty %
Global Data Volume in Exabytes
8000
- 10. Internet Of Things
There are more devices tapping into
the internet than people on earth
How do we prepare our systems/
architecture for the future?
10
2013 © Trivadis
Source: The Economist
Source: Cisco
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 11. Big Data in Context
NoSQL databases
• The storage for Big Data à Polyglot Persistence
Complex Event Processing (CEP)
• An architectural style for Fast Data
Lots of new terms
§ HDFS, Hive, Hadoop, MapReduce, HBase, Pig, Cascading, Flume, Oozie
Not only Open Source
• Oracle Big Data Appliance & Microsoft HD Insight
No longer a clear distinction between Software Development and Business
Intelligence !?
• Java, Python, Clojure, R, … know how needed
• Data Scientists: Natural Language Processing, Statistics, Network Analysis
11
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 12. Big Data Use Cases / Scenarios
General
•
Analyzing social media data for service optimization, sentiment analysis, ...
Retail
§ Personalized travel- and shopping guidance depending on location detection
(mobile, tablets, previous purchases)
Automotive
§ Analyzing telemetric data (e.g. for insurance: „Pay how you drive“, warranty,
recall, warnings etc.)
Finance
§ Fraud detection for payments (real time)
Telco
§ Mobile user location analytics for „behavior mining“
12
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 13. Velocity
§ Velocity requirement examples:
§
§
§
§
§
§
§
§
§
§
§
§
13
Recommendation Engine
Predictive Analytics
Marketing Campaign Analysis
Customer Retention and Churn Analysis
Social Graph Analysis
Capital Markets Analysis
Risk Management
Rogue Trading
Fraud Detection
Retail Banking
Network Monitoring
Research and Development
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 14. Agenda
1. Big Data, what is it?
2. Motivation
3. The Lambda Architecture
4. Implementing the Lambda Architecture
5. Summary
14
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 15. What is a data system?
• A system that manages the storage and querying of data with a
lifetime measured in years encompassing every version of the
application to ever exist, every hardware failure and every human
mistake ever made.
• A data system answers questions based on information that was
acquired in the past
15
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 16. Desired Properties of a (Big) Data System
Robust and fault-tolerant
Low latency reads and updates
Scalable
General
Extensible
Allows ad hoc queries
Minimal maintenance
Debug-able
16
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 17. Complexity in today‘s architecture/systems
Lack of Human Fault Tolerance
Same structure for write/query
Schemas done wrong
17
2013 © Trivadis
Big Data und Fast Data
24. April 2013
- 18. Typical problem in today’s
architecture/systems
Lack of Human Fault Tolerance
Bugs will be deployed to production over the lifetime of a data system
Operational mistakes will be made
Humans are part of the overall system
•
•
Just like hard disks, CPUs, memory, software
design for human error like you design for any other fault
Examples of human error
•
•
•
Deploy a bug that increments counters by two instead of by one
Accidentally delete data from database
Accidental DOS on important internal service
Worst two consequences: data loss or data corruption
As long as an error doesn‘t lose or corrupt good data, you can fix what
went wrong
18
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 19. Lack of Human Fault Tolerance
Mutability
The U and D in CRUD
A mutable system updates the current state of the world
Mutable systems inherently lack human fault-tolerance
Easy to corrupt or lose data
Capturing change traditionally
Name
City
Name
City
Guido
Berne
Guido
Basel
Albert
Zurich
Albert
Zurich
19
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 20. Immutability
Lack of Human Fault Tolerance
An immutable system captures historical records of events
Each event happens at a particular time and is always true
Capturing change by storing events
Name
City
Timestamp
Name
City
Timestamp
Guido
Berne
1.8.1999
Guido
Berne
1.8.1999
Albert
Zurich
10.5.1988
Albert
Zurich
10.5.1988
Guido
Basel
1.4.2013
20
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 21. Immutability
Lack of Human Fault Tolerance
Immutability greatly restricts the range of errors that can cause data loss or
data corruption
Vastly more human fault-tolerant
Much easier to reason about systems based on immutability
Conclusion: Your source of truth should always be immutable
21
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 22. What about traditional/today’s architectures ?
Mutable
Database
Application
(Query)
Source of Truth
Mobile
Web
RIA
Rich Client
RDBMS
NoSQL
NewSQL
Source of Truth
Source of Truth is mutable!
Rather than build systems like this ….
22
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 23. A different kind of architecture with immutable source of truth
… why not building them like this
Immutable
data
View on
Data
Application
(Query)
View on
Data
Mobile
Web
RIA
Rich Client
Source of Truth
HDFS
NoSQL
NewSQL
RDBMS
Source of Truth
23
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 24. How to create the views on the Immutable data?
On the fly ?
Immutable
data
View
Query
Materialized, i.e. Pre-computed ?
Pre-
Computed
Views
Immutable
data
24
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Query
- 25. Data = the most raw information
Data is information which is not derived from anywhere else
• The most raw form of information
• from which everything else is derived
Questions on data can be answered by running functions that take data
as input
The most general purpose data system can answer questions by running
functions that take the entire dataset as input
query = function (all data)
The lambda architecture provides a general purpose approach for
implementing arbitrary functions on an arbitrary datasets
25
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 26. Data = the most raw information
Favorite Product List Changes
1.2.13
10.3.13
11..3.13
11.3.13
12.3.13
14.4.13
15.4.13
20.4.13
Add
Add
Add
Remove
Add
Add
Add
Remove
iPAD 64GB
Sony RX-100
Canon GX-10
Sony RX-100
Nikon S-100
BoseQC-15
MacBook Pro 15
Canon GX10
derive
Raw information => data
26
Current
Product
Count
Current Favorite
Product List
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
derive
Information => derived
4
- 27. Big Data and Batch Processing
Incoming
Data
Immutable
data
Batch
View
?
?
Query
How to compute the batch views ?
How to compute queries from the views ?
27
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 28. Big Data and Batch Processing
But we are not done yet …
batch-processed data
non-processed data
now
time
now
time
Fully processed data
Last full
Time for
batch period batch job
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20
§ Using only batch processing, leaves you always with a portion of nonprocessed data.
28
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 29. Big Data and Batch Processing
Stream 1
Stream 2
Event
HDFS
Hadoop Distributed File System
Hadoop cluster
Map/Reduce in Pig
Data Store optimized
for appending large
results
Queries
29
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 31. Immutable data
Adding Real-Time Processing
Views
Data Stream
Favorite Product List Changes
1.2.13
10.3.13
11..3.13
11.3.13
12.3.13
14.4.13
15.4.13
20.4.13
Now
incoming
iPAD 64GB
Sony RX-100
Canon GX-10
Sony RX-100
Nikon S-100
BoseQC-15
MacBook Pro 15
Canon GX10
Canon Scanner
compute
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
Query
Current
Product
Count
5
Stream of
Favorite Product List Changes
Add
31
Add
Add
Add
Remove
Add
Add
Add
Remove
Add
Current Favorite
Product List
Canon Scanner
compute
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Now
Canon Scanner
- 32. Big Data and Real Time Processing
blended view for end user
batch processing
worked fine here
(e.g. Hadoop)
real time processing
works here
now
time
Fully processed data
Last full
Time for
batch period batch job
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20
32
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 33. Agenda
1. Big Data, what is it?
2. Motivation
3. The Lambda Architecture
4. Implementing the Lambda Architecture
5. Summary
33
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 34. Lambda Architecture
Batch Layer
Serving Layer
Immutable
data
Batch
View
B
Incoming
Data
C
D
A
G
Speed Layer
Data
Stream
E
34
Realtime
View
F
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Query
- 35. Lambda Architecture
A. All data is sent to both the batch and speed layer
B.
Master data set is an immutable, append-only set of data
C.
Batch layer pre-computes query functions from scratch, result is called Batch
Views. Batch layer constantly re-computes the batch views.
D.
Batch views are indexed and stored in a scalable database to get particular
values very quickly. Swaps in new batch views when they are available
E.
Speed layer compensates for the high latency of updates to the Batch Views
F.
Uses fast incremental algorithms and read/write databases to produce realtime views
G. Queries are resolved by getting results from both batch and real-time views
35
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 36. Layered Architecture
Batch Layer
Speed Layer
Stores the immutable constantly growing dataset
Computes arbitrary views from this dataset using BigData
technologies (can take hours)
Can be always recreated
Computes the views from the constant stream of data it receives
Needed to compensate for the high latency of the batch layer
Incremental model and views are transient
Serving Layer
Responsible for indexing and exposing the pre-computed batch
views so that they can be queried
Exposes the incremented real-time views
Merges the batch and the real-time views into a consistent result
36
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 37. Agenda
1. Big Data, what is it?
2. Motivation
3. The Lambda Architecture
4. Implementing the Lambda Architecture
5. Summary
37
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 39. Lambda Architecture in Action
Implementation in ongoing Proof-of-concept (after completion of phase 1)
Precompute
Precomputed
Views
information
All data
Batch
recompute
Incoming
Data
batch view
batch view
Speed Layer
Process stream
Incremented
information
Realtime
increment
39
Serving Layer
Merge
Batch Layer
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
real time view
real time view
query
- 40. Lambda Architecture in Action
Twitter Horsebird Client (hbc)
•
Twitter Java API over Streaming API
Spring Framework
•
Popular Java Framework used to modularize
part of the logic (sensor and serving layer)
Apache Kafka
•
Simple messaging framework based on file
system to distribute information to both batch
and speed layer
Apache Avro
•
40
•
Distribution of Apache Hadoop: HDFS,
MapReduce, Hive, Flume, Pig, Impala
Cloudera Impala
•
distributed query execution engine that runs
against data stored in HDFS and HBase
Apache Zookeeper
•
Distributed, highly available coordination service.
Provides primitives such as distributed locks
Apache Storm & Trident
Serialization system for efficient cross-language
RPC and persistent data storage
JSON
•
Cloudera Distribution
•
distributed, fault-tolerant realtime computation
system
Apache Cassandra
open standard format that uses humanreadable text to transmit data objects consisting
of attribute–value pairs.
•
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
distributed database management system
designed to handle large amounts of data across
many commodity servers, providing high
availability with no single point of failure
- 41. Lambda Architecture with Oracle Product Stack
Incoming
Data
Views
information
Batch
recompute
Oracle Data Integrator
Speed Layer
Incremented
Oracle
Process stream Event Processing
information
Oracle Event Processing
Oracle GoldenGate
Oracle GoldenGate
Oracle Service Bus
41
Oracle Coherence
Oracle RDBMS
batch view
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Oracle Endeca OBIEE
All data
batch view
Oracle NoSQL
Merge
Precompute
Oracle BigData Appliance
Precomputed
Serving Layer
Oracle Web Logic Server Oracle ADF
Batch Layer
Oracle Big Data
Connectors
Possible implementation with Oracle Product stack
real time view
Oracle Coherence
Oracle NoSQL
real time view
query
- 42. Agenda
1. Big Data, what is it?
2. Motivation
3. The Lambda Architecture
4. Implementing the Lambda Architecture
5. Summary
42
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 43. Summary – The lambda architecture
§ The Lambda Architecture
§ Can discard batch views and real-time views and recreate everything from
scratch
§ Mistakes corrected via re-computation
§ Data storage layer optimized independently from query resolution layer
§ Still in a very early …. But a very interesting idea!
- Today a zoo of technologies are needed => Operations won‘t like it
§ The technology/implementation
§ Different query language for batch and real time
§ An abstraction over batch and speed layer needed
- Cascading and Trident are already similar
§ Not everything works out-of-the-box and together
§ Industry standards needed!
43
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
- 44. THANK YOU.
Trivadis AG
Guido Schmutz
Europa-Strasse 5
CH-8095 Glattbrugg
info@trivadis.com
www.trivadis.com
BASEL
44
BERN
LAUSANNE
ZÜRICH
DÜSSELDORF
FRANKFURT A.M.
FREIBURG I.BR.
2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
HAMBURG
MÜNCHEN
STUTTGART
WIEN