SlideShare a Scribd company logo
1 of 44
Download to read offline
WELCOME

Big Data and Fast Data –
Big and Fast Combined, is it
Possible?
Guido Schmutz
UKOUG Tech 2013
2.12.2013

BASEL

1

BERN

LAUSANNE

ZÜRICH

DÜSSELDORF

FRANKFURT A.M.

FREIBURG I.BR.

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

HAMBURG

MÜNCHEN

STUTTGART

WIEN

Guido Schmutz
• 
• 

Working for Trivadis for more than 16 years
Oracle ACE Director for Fusion Middleware and SOA

• 
• 

Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA
and EDA

• 
• 

Member of Trivadis Architecture Board
Technology Manager @ Trivadis

• 

More than 20 years of software development 

experience

• 

Contact: guido.schmutz@trivadis.com

• 
• 

Blog: http://guidoschmutz.wordpress.com
Twitter: gschmutz

2

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Our company
Trivadis is a market leader in IT consulting, system integration,
solution engineering and the provision of IT services focusing
on
and
technologies in Switzerland,
Germany and Austria.
We offer our services in the following strategic business fields:

OPERATION

Trivadis Services takes over the interacting operation of your IT systems.
2013 © Trivadis
Trivadis – the company
02/12/13
With over 600 specialists and IT experts in your region

Hamburg

Düsseldorf

Frankfurt

Stuttgart

Freiburg

Wien
München

Basel Brugg
Bern
Zurich
Lausanne

2013 © Trivadis

4

Trivadis – the company
02/12/13

12 Trivadis branches and more than
600 employees
 
200 Service Level Agreements
 
Over 4,000 training participants
 
Research and development budget:
CHF 5.0 / EUR 4 million
 
Financially self-supporting and
sustainably profitable
 
Experience from more than 1,900
projects per year at over 800
customers
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary

5

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Big Data Definition (Gartner et al)

Characteristics of Big Data: Its
Volume, Velocity and Variety in
combination

Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing

Velocity
“Traditional” computing in RDBMS 

is not scalable enough. 

We search for “linear scalability”

“Only … structured information 

is not enough” – “95% of produced data in
unstructured”

+ Veracity (IBM) - information uncertainty
+ Time to action ? – Big Data + Event Processing = Fast Data
6

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Big Data Definition (4 Vs)

Characteristics of Big Data: Its Volume,
Velocity and Variety in combination

+ Time to action ? – Big Data + Event
Processing = Fast Data
7

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Volume Development
100
Sensors:
“internet of
things”

6000

Social Media:
video, audio,
text

4000

60

VoIP:
Skype, MSN,
ICQ, ...

2000

40

20

Enterprise Data:
data dictionary,
ERD, ...

0
2005

2007

2009

2011
Year

8

80

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

2013

2015

0

Aggregate Uncertainty %

Global Data Volume in Exabytes

8000
9

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Internet Of Things
There are more devices tapping into
the internet than people on earth
How do we prepare our systems/
architecture for the future?

10

2013 © Trivadis

Source: The Economist

Source: Cisco

Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Big Data in Context
NoSQL databases
•  The storage for Big Data à Polyglot Persistence

Complex Event Processing (CEP)
•  An architectural style for Fast Data

Lots of new terms
§  HDFS, Hive, Hadoop, MapReduce, HBase, Pig, Cascading, Flume, Oozie

Not only Open Source
•  Oracle Big Data Appliance & Microsoft HD Insight

No longer a clear distinction between Software Development and Business
Intelligence !?
•  Java, Python, Clojure, R, … know how needed
•  Data Scientists: Natural Language Processing, Statistics, Network Analysis
11

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Big Data Use Cases / Scenarios
General
• 

Analyzing social media data for service optimization, sentiment analysis, ...

Retail
§  Personalized travel- and shopping guidance depending on location detection
(mobile, tablets, previous purchases)

Automotive
§  Analyzing telemetric data (e.g. for insurance: „Pay how you drive“, warranty,
recall, warnings etc.)

Finance
§  Fraud detection for payments (real time)

Telco
§  Mobile user location analytics for „behavior mining“
12

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Velocity
§  Velocity requirement examples:
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 
§ 

13

Recommendation Engine
Predictive Analytics
Marketing Campaign Analysis
Customer Retention and Churn Analysis
Social Graph Analysis
Capital Markets Analysis
Risk Management
Rogue Trading
Fraud Detection
Retail Banking
Network Monitoring
Research and Development

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary

14

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
What is a data system?
•  A system that manages the storage and querying of data with a
lifetime measured in years encompassing every version of the
application to ever exist, every hardware failure and every human
mistake ever made.
•  A data system answers questions based on information that was
acquired in the past

15

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Desired Properties of a (Big) Data System
Robust and fault-tolerant
Low latency reads and updates
Scalable
General
Extensible
Allows ad hoc queries
Minimal maintenance
Debug-able

16

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Complexity in today‘s architecture/systems

Lack of Human Fault Tolerance
Same structure for write/query
Schemas done wrong

17

2013 © Trivadis
Big Data und Fast Data
24. April 2013
Typical problem in today’s

architecture/systems

Lack of Human Fault Tolerance

Bugs will be deployed to production over the lifetime of a data system
Operational mistakes will be made
Humans are part of the overall system
• 
• 

Just like hard disks, CPUs, memory, software
design for human error like you design for any other fault

Examples of human error
• 
• 
• 

Deploy a bug that increments counters by two instead of by one
Accidentally delete data from database
Accidental DOS on important internal service

Worst two consequences: data loss or data corruption
As long as an error doesn‘t lose or corrupt good data, you can fix what
went wrong
18

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Lack of Human Fault Tolerance

Mutability
The U and D in CRUD

A mutable system updates the current state of the world
Mutable systems inherently lack human fault-tolerance
Easy to corrupt or lose data
Capturing change traditionally
Name

City

Name

City

Guido

Berne

Guido

Basel

Albert

Zurich

Albert

Zurich

19

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Immutability

Lack of Human Fault Tolerance

An immutable system captures historical records of events
Each event happens at a particular time and is always true

Capturing change by storing events
Name

City

Timestamp

Name

City

Timestamp

Guido

Berne

1.8.1999

Guido

Berne

1.8.1999

Albert

Zurich

10.5.1988

Albert

Zurich

10.5.1988

Guido

Basel

1.4.2013

20

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Immutability

Lack of Human Fault Tolerance

Immutability greatly restricts the range of errors that can cause data loss or
data corruption
Vastly more human fault-tolerant
Much easier to reason about systems based on immutability
Conclusion: Your source of truth should always be immutable

21

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
What about traditional/today’s architectures ? 


Mutable
Database

Application
(Query)

Source of Truth

Mobile
Web
RIA
Rich Client

RDBMS
NoSQL
NewSQL
Source of Truth

Source of Truth is mutable!

Rather than build systems like this ….
22

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
A different kind of architecture with immutable source of truth
… why not building them like this

Immutable
data

View on
Data

Application
(Query)

View on
Data

Mobile
Web
RIA
Rich Client

Source of Truth

HDFS
NoSQL
NewSQL
RDBMS
Source of Truth
23

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
How to create the views on the Immutable data?
On the fly ?
Immutable
data

View

Query

Materialized, i.e. Pre-computed ?
Pre-

Computed

Views

Immutable
data

24

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

Query
Data = the most raw information
Data is information which is not derived from anywhere else
•  The most raw form of information
•  from which everything else is derived

Questions on data can be answered by running functions that take data
as input
The most general purpose data system can answer questions by running
functions that take the entire dataset as input
query = function (all data)
The lambda architecture provides a general purpose approach for
implementing arbitrary functions on an arbitrary datasets

25

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Data = the most raw information

Favorite Product List Changes
1.2.13
10.3.13
11..3.13
11.3.13
12.3.13
14.4.13
15.4.13
20.4.13

Add
Add
Add
Remove
Add
Add
Add
Remove

iPAD 64GB
Sony RX-100
Canon GX-10
Sony RX-100
Nikon S-100
BoseQC-15
MacBook Pro 15
Canon GX10

derive

Raw information => data

26

Current
Product
Count

Current Favorite 

Product List

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15

derive

Information => derived

4
Big Data and Batch Processing

Incoming
Data

Immutable
data

Batch
View

?

?

Query

How to compute the batch views ?
How to compute queries from the views ?

27

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Big Data and Batch Processing
But we are not done yet …
batch-processed data

non-processed data

now
time
now
time

Fully processed data

Last full
Time for

batch period batch job
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20

§  Using only batch processing, leaves you always with a portion of nonprocessed data.
28

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Big Data and Batch Processing
Stream 1
Stream 2

Event

HDFS

Hadoop Distributed File System

Hadoop cluster
Map/Reduce in Pig

Data Store optimized
for appending large
results

Queries

29

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Adding Real-Time Processing

Immutable
data

Batch
Views

Incoming
Data

Query

Data
Stream

?

Realtime
Views

How to compute real-time views
30

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

How to compute queries 

from the views ?
Immutable data

Adding Real-Time Processing

Views
Data Stream

Favorite Product List Changes
1.2.13
10.3.13
11..3.13
11.3.13
12.3.13
14.4.13
15.4.13
20.4.13
Now
incoming

iPAD 64GB
Sony RX-100
Canon GX-10
Sony RX-100
Nikon S-100
BoseQC-15
MacBook Pro 15
Canon GX10
Canon Scanner

compute

iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15

Query

Current
Product
Count
5

Stream of
Favorite Product List Changes
Add

31

Add
Add
Add
Remove
Add
Add
Add
Remove
Add

Current Favorite 

Product List

Canon Scanner

compute

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

Now

Canon Scanner
Big Data and Real Time Processing

blended view for end user

batch processing

worked fine here
(e.g. Hadoop)

real time processing

works here
now
time

Fully processed data

Last full
Time for

batch period batch job
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20

32

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary

33

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Lambda Architecture
Batch Layer

Serving Layer

Immutable
data

Batch
View

B
Incoming
Data

C

D

A

G

Speed Layer
Data
Stream
E

34

Realtime
View
F

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

Query
Lambda Architecture

A.  All data is sent to both the batch and speed layer
B. 

Master data set is an immutable, append-only set of data

C. 

Batch layer pre-computes query functions from scratch, result is called Batch
Views. Batch layer constantly re-computes the batch views.

D. 

Batch views are indexed and stored in a scalable database to get particular
values very quickly. Swaps in new batch views when they are available

E. 

Speed layer compensates for the high latency of updates to the Batch Views

F. 

Uses fast incremental algorithms and read/write databases to produce realtime views

G.  Queries are resolved by getting results from both batch and real-time views

35

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Layered Architecture

Batch Layer

Speed Layer

Stores the immutable constantly growing dataset
Computes arbitrary views from this dataset using BigData
technologies (can take hours)
Can be always recreated
Computes the views from the constant stream of data it receives
Needed to compensate for the high latency of the batch layer
Incremental model and views are transient

Serving Layer

Responsible for indexing and exposing the pre-computed batch
views so that they can be queried
Exposes the incremented real-time views
Merges the batch and the real-time views into a consistent result

36

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary

37

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Lambda Architecture

Precompute

Precomputed
Views
information

All data

Batch
recompute

Incoming
Data

Serving Layer
batch view
batch view
Merge

Batch Layer

Speed Layer
Process stream

Incremented
information

Realtime
increment

query

real time view
real time view

Source: Marz, N. & Warren, J. (2013) Big Data. Manning.

38

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Lambda Architecture in Action
Implementation in ongoing Proof-of-concept (after completion of phase 1)

Precompute

Precomputed
Views
information

All data

Batch
recompute

Incoming
Data

batch view
batch view

Speed Layer
Process stream

Incremented
information

Realtime
increment

39

Serving Layer

Merge

Batch Layer

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

real time view
real time view

query
Lambda Architecture in Action
Twitter Horsebird Client (hbc)
• 

Twitter Java API over Streaming API

Spring Framework
• 

Popular Java Framework used to modularize
part of the logic (sensor and serving layer)

Apache Kafka
• 

Simple messaging framework based on file
system to distribute information to both batch
and speed layer

Apache Avro
• 

40

• 

Distribution of Apache Hadoop: HDFS,
MapReduce, Hive, Flume, Pig, Impala

Cloudera Impala
• 

distributed query execution engine that runs
against data stored in HDFS and HBase

Apache Zookeeper
• 

Distributed, highly available coordination service.
Provides primitives such as distributed locks

Apache Storm & Trident

Serialization system for efficient cross-language
RPC and persistent data storage

JSON
• 

Cloudera Distribution

• 

distributed, fault-tolerant realtime computation
system

Apache Cassandra

open standard format that uses humanreadable text to transmit data objects consisting
of attribute–value pairs.

• 

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

distributed database management system
designed to handle large amounts of data across
many commodity servers, providing high
availability with no single point of failure
Lambda Architecture with Oracle Product Stack

Incoming
Data

Views

information

Batch
recompute
Oracle Data Integrator

Speed Layer

Incremented
Oracle
Process stream Event Processing
information
Oracle Event Processing
Oracle GoldenGate

Oracle GoldenGate

Oracle Service Bus

41

Oracle Coherence
Oracle RDBMS
batch view

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

Oracle Endeca OBIEE

All data

batch view
Oracle NoSQL

Merge

Precompute
Oracle BigData Appliance
Precomputed

Serving Layer

Oracle Web Logic Server Oracle ADF

Batch Layer

Oracle Big Data

Connectors

Possible implementation with Oracle Product stack

real time view

Oracle Coherence
Oracle NoSQL

real time view

query
Agenda
1.  Big Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Summary

42

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
Summary – The lambda architecture
§  The Lambda Architecture
§  Can discard batch views and real-time views and recreate everything from
scratch
§  Mistakes corrected via re-computation
§  Data storage layer optimized independently from query resolution layer
§  Still in a very early …. But a very interesting idea!
-  Today a zoo of technologies are needed => Operations won‘t like it

§  The technology/implementation
§  Different query language for batch and real time
§  An abstraction over batch and speed layer needed
-  Cascading and Trident are already similar
§  Not everything works out-of-the-box and together
§  Industry standards needed!

43

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013
THANK YOU.

Trivadis AG
Guido Schmutz
Europa-Strasse 5

CH-8095 Glattbrugg
info@trivadis.com

www.trivadis.com

BASEL

44

BERN

LAUSANNE

ZÜRICH

DÜSSELDORF

FRANKFURT A.M.

FREIBURG I.BR.

2013 © Trivadis
Big Data and Fast Data – Big and Fast Combined, is it Possible?
2.12.2013

HAMBURG

MÜNCHEN

STUTTGART

WIEN


More Related Content

What's hot

Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunitiesMohammed Guller
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache sparkMohammed Guller
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence DevelopmentManojKumarR41
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataMohammed Guller
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingMinhazul Arefin
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopDavid Yahalom
 
2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZChris Dagdigian
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&DChris Dagdigian
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data FrameworkseXascale Infolab
 
How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersAkmal Chaudhri
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsSateeshreddy N
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big DataRobert Keahey
 

What's hot (20)

Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Fixing data science & Accelerating Artificial Super Intelligence Development
 Fixing data science & Accelerating Artificial Super Intelligence Development Fixing data science & Accelerating Artificial Super Intelligence Development
Fixing data science & Accelerating Artificial Super Intelligence Development
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&D
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
 
How to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contendersHow to build streaming data applications - evaluating the top contenders
How to build streaming data applications - evaluating the top contenders
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data Analytics
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
 

Viewers also liked

Big Data Visualization Problem in IT Management
Big Data Visualization Problem in IT ManagementBig Data Visualization Problem in IT Management
Big Data Visualization Problem in IT Managementbigdataviz_bay
 
Creating a Delivery Unit at Government Level
Creating a Delivery Unit at Government Level Creating a Delivery Unit at Government Level
Creating a Delivery Unit at Government Level Sajjad Ahmed
 
fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14N Masahiro
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyRohit Kulkarni
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaarRegunath B
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and KafkaN Masahiro
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 

Viewers also liked (14)

Big Data Visualization Problem in IT Management
Big Data Visualization Problem in IT ManagementBig Data Visualization Problem in IT Management
Big Data Visualization Problem in IT Management
 
Creating a Delivery Unit at Government Level
Creating a Delivery Unit at Government Level Creating a Delivery Unit at Government Level
Creating a Delivery Unit at Government Level
 
Learning styles
Learning stylesLearning styles
Learning styles
 
fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and Kafka
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 

Similar to Big Data and Fast Data – Big and Fast Combined, is it Possible?

Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Guido Schmutz
 
Expanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesExpanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesTom Kirby
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challengesBee_Ware
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
NETWORK AND MANAGEMENT OF BIG DATA 2Data An.docx
NETWORK AND MANAGEMENT OF BIG DATA 2Data An.docxNETWORK AND MANAGEMENT OF BIG DATA 2Data An.docx
NETWORK AND MANAGEMENT OF BIG DATA 2Data An.docxvannagoforth
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
The truth is out there
The truth is out thereThe truth is out there
The truth is out thereMike Davis
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
The Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance LandscapeThe Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance LandscapeTrillium Software
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationDenodo
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Turn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSTurn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSAmazon Web Services
 
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)Denodo
 

Similar to Big Data and Fast Data – Big and Fast Combined, is it Possible? (20)

Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?
 
Expanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challengesExpanded top ten_big_data_security_and_privacy_challenges
Expanded top ten_big_data_security_and_privacy_challenges
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challenges
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
NETWORK AND MANAGEMENT OF BIG DATA 2Data An.docx
NETWORK AND MANAGEMENT OF BIG DATA 2Data An.docxNETWORK AND MANAGEMENT OF BIG DATA 2Data An.docx
NETWORK AND MANAGEMENT OF BIG DATA 2Data An.docx
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
The truth is out there
The truth is out thereThe truth is out there
The truth is out there
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
Big Data - CRM's Promise Land
Big Data - CRM's Promise LandBig Data - CRM's Promise Land
Big Data - CRM's Promise Land
 
The Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance LandscapeThe Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance Landscape
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Turn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSTurn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWS
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
 

More from Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Recently uploaded

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 

Big Data and Fast Data – Big and Fast Combined, is it Possible?

  • 1. WELCOME Big Data and Fast Data – Big and Fast Combined, is it Possible? Guido Schmutz UKOUG Tech 2013 2.12.2013 BASEL 1 BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 HAMBURG MÜNCHEN STUTTGART WIEN

  • 2. Guido Schmutz •  •  Working for Trivadis for more than 16 years Oracle ACE Director for Fusion Middleware and SOA •  •  Co-Author of different books Consultant, Trainer Software Architect for Java, Oracle, SOA and EDA •  •  Member of Trivadis Architecture Board Technology Manager @ Trivadis •  More than 20 years of software development 
 experience •  Contact: guido.schmutz@trivadis.com •  •  Blog: http://guidoschmutz.wordpress.com Twitter: gschmutz 2 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 3. Our company Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria. We offer our services in the following strategic business fields: OPERATION Trivadis Services takes over the interacting operation of your IT systems. 2013 © Trivadis Trivadis – the company 02/12/13
  • 4. With over 600 specialists and IT experts in your region Hamburg Düsseldorf Frankfurt Stuttgart Freiburg Wien München Basel Brugg Bern Zurich Lausanne 2013 © Trivadis 4 Trivadis – the company 02/12/13 12 Trivadis branches and more than 600 employees   200 Service Level Agreements   Over 4,000 training participants   Research and development budget: CHF 5.0 / EUR 4 million   Financially self-supporting and sustainably profitable   Experience from more than 1,900 projects per year at over 800 customers
  • 5. Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 5 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 6. Big Data Definition (Gartner et al) Characteristics of Big Data: Its Volume, Velocity and Variety in combination Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing Velocity “Traditional” computing in RDBMS 
 is not scalable enough. 
 We search for “linear scalability” “Only … structured information 
 is not enough” – “95% of produced data in unstructured” + Veracity (IBM) - information uncertainty + Time to action ? – Big Data + Event Processing = Fast Data 6 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 7. Big Data Definition (4 Vs) Characteristics of Big Data: Its Volume, Velocity and Variety in combination + Time to action ? – Big Data + Event Processing = Fast Data 7 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 8. Volume Development 100 Sensors: “internet of things” 6000 Social Media: video, audio, text 4000 60 VoIP: Skype, MSN, ICQ, ... 2000 40 20 Enterprise Data: data dictionary, ERD, ... 0 2005 2007 2009 2011 Year 8 80 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 2013 2015 0 Aggregate Uncertainty % Global Data Volume in Exabytes 8000
  • 9. 9 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 10. Internet Of Things There are more devices tapping into the internet than people on earth How do we prepare our systems/ architecture for the future? 10 2013 © Trivadis Source: The Economist Source: Cisco Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 11. Big Data in Context NoSQL databases •  The storage for Big Data à Polyglot Persistence Complex Event Processing (CEP) •  An architectural style for Fast Data Lots of new terms §  HDFS, Hive, Hadoop, MapReduce, HBase, Pig, Cascading, Flume, Oozie Not only Open Source •  Oracle Big Data Appliance & Microsoft HD Insight No longer a clear distinction between Software Development and Business Intelligence !? •  Java, Python, Clojure, R, … know how needed •  Data Scientists: Natural Language Processing, Statistics, Network Analysis 11 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 12. Big Data Use Cases / Scenarios General •  Analyzing social media data for service optimization, sentiment analysis, ... Retail §  Personalized travel- and shopping guidance depending on location detection (mobile, tablets, previous purchases) Automotive §  Analyzing telemetric data (e.g. for insurance: „Pay how you drive“, warranty, recall, warnings etc.) Finance §  Fraud detection for payments (real time) Telco §  Mobile user location analytics for „behavior mining“ 12 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 13. Velocity §  Velocity requirement examples: §  §  §  §  §  §  §  §  §  §  §  §  13 Recommendation Engine Predictive Analytics Marketing Campaign Analysis Customer Retention and Churn Analysis Social Graph Analysis Capital Markets Analysis Risk Management Rogue Trading Fraud Detection Retail Banking Network Monitoring Research and Development 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 14. Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 14 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 15. What is a data system? •  A system that manages the storage and querying of data with a lifetime measured in years encompassing every version of the application to ever exist, every hardware failure and every human mistake ever made. •  A data system answers questions based on information that was acquired in the past 15 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 16. Desired Properties of a (Big) Data System Robust and fault-tolerant Low latency reads and updates Scalable General Extensible Allows ad hoc queries Minimal maintenance Debug-able 16 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 17. Complexity in today‘s architecture/systems Lack of Human Fault Tolerance Same structure for write/query Schemas done wrong 17 2013 © Trivadis Big Data und Fast Data 24. April 2013
  • 18. Typical problem in today’s
 architecture/systems Lack of Human Fault Tolerance Bugs will be deployed to production over the lifetime of a data system Operational mistakes will be made Humans are part of the overall system •  •  Just like hard disks, CPUs, memory, software design for human error like you design for any other fault Examples of human error •  •  •  Deploy a bug that increments counters by two instead of by one Accidentally delete data from database Accidental DOS on important internal service Worst two consequences: data loss or data corruption As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong 18 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 19. Lack of Human Fault Tolerance Mutability The U and D in CRUD A mutable system updates the current state of the world Mutable systems inherently lack human fault-tolerance Easy to corrupt or lose data Capturing change traditionally Name City Name City Guido Berne Guido Basel Albert Zurich Albert Zurich 19 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 20. Immutability Lack of Human Fault Tolerance An immutable system captures historical records of events Each event happens at a particular time and is always true Capturing change by storing events Name City Timestamp Name City Timestamp Guido Berne 1.8.1999 Guido Berne 1.8.1999 Albert Zurich 10.5.1988 Albert Zurich 10.5.1988 Guido Basel 1.4.2013 20 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 21. Immutability Lack of Human Fault Tolerance Immutability greatly restricts the range of errors that can cause data loss or data corruption Vastly more human fault-tolerant Much easier to reason about systems based on immutability Conclusion: Your source of truth should always be immutable 21 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 22. What about traditional/today’s architectures ? 
 Mutable Database Application (Query) Source of Truth Mobile Web RIA Rich Client RDBMS NoSQL NewSQL Source of Truth Source of Truth is mutable! Rather than build systems like this …. 22 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 23. A different kind of architecture with immutable source of truth … why not building them like this Immutable data View on Data Application (Query) View on Data Mobile Web RIA Rich Client Source of Truth HDFS NoSQL NewSQL RDBMS Source of Truth 23 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 24. How to create the views on the Immutable data? On the fly ? Immutable data View Query Materialized, i.e. Pre-computed ? Pre-
 Computed
 Views Immutable data 24 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 Query
  • 25. Data = the most raw information Data is information which is not derived from anywhere else •  The most raw form of information •  from which everything else is derived Questions on data can be answered by running functions that take data as input The most general purpose data system can answer questions by running functions that take the entire dataset as input query = function (all data) The lambda architecture provides a general purpose approach for implementing arbitrary functions on an arbitrary datasets 25 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 26. Data = the most raw information Favorite Product List Changes 1.2.13 10.3.13 11..3.13 11.3.13 12.3.13 14.4.13 15.4.13 20.4.13 Add Add Add Remove Add Add Add Remove iPAD 64GB Sony RX-100 Canon GX-10 Sony RX-100 Nikon S-100 BoseQC-15 MacBook Pro 15 Canon GX10 derive Raw information => data 26 Current Product Count Current Favorite 
 Product List 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 derive Information => derived 4
  • 27. Big Data and Batch Processing Incoming Data Immutable data Batch View ? ? Query How to compute the batch views ? How to compute queries from the views ? 27 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 28. Big Data and Batch Processing But we are not done yet … batch-processed data non-processed data now time now time Fully processed data Last full Time for
 batch period batch job Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20 §  Using only batch processing, leaves you always with a portion of nonprocessed data. 28 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 29. Big Data and Batch Processing Stream 1 Stream 2 Event HDFS Hadoop Distributed File System Hadoop cluster Map/Reduce in Pig Data Store optimized for appending large results Queries 29 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 30. Adding Real-Time Processing Immutable data Batch Views Incoming Data Query Data Stream ? Realtime Views How to compute real-time views 30 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 How to compute queries 
 from the views ?
  • 31. Immutable data Adding Real-Time Processing Views Data Stream Favorite Product List Changes 1.2.13 10.3.13 11..3.13 11.3.13 12.3.13 14.4.13 15.4.13 20.4.13 Now incoming iPAD 64GB Sony RX-100 Canon GX-10 Sony RX-100 Nikon S-100 BoseQC-15 MacBook Pro 15 Canon GX10 Canon Scanner compute iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 Query Current Product Count 5 Stream of Favorite Product List Changes Add 31 Add Add Add Remove Add Add Add Remove Add Current Favorite 
 Product List Canon Scanner compute 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 Now Canon Scanner
  • 32. Big Data and Real Time Processing blended view for end user batch processing
 worked fine here (e.g. Hadoop) real time processing
 works here now time Fully processed data Last full Time for
 batch period batch job Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20 32 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 33. Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 33 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 34. Lambda Architecture Batch Layer Serving Layer Immutable data Batch View B Incoming Data C D A G Speed Layer Data Stream E 34 Realtime View F 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 Query
  • 35. Lambda Architecture A.  All data is sent to both the batch and speed layer B.  Master data set is an immutable, append-only set of data C.  Batch layer pre-computes query functions from scratch, result is called Batch Views. Batch layer constantly re-computes the batch views. D.  Batch views are indexed and stored in a scalable database to get particular values very quickly. Swaps in new batch views when they are available E.  Speed layer compensates for the high latency of updates to the Batch Views F.  Uses fast incremental algorithms and read/write databases to produce realtime views G.  Queries are resolved by getting results from both batch and real-time views 35 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 36. Layered Architecture Batch Layer Speed Layer Stores the immutable constantly growing dataset Computes arbitrary views from this dataset using BigData technologies (can take hours) Can be always recreated Computes the views from the constant stream of data it receives Needed to compensate for the high latency of the batch layer Incremental model and views are transient Serving Layer Responsible for indexing and exposing the pre-computed batch views so that they can be queried Exposes the incremented real-time views Merges the batch and the real-time views into a consistent result 36 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 37. Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 37 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 38. Lambda Architecture Precompute Precomputed Views information All data Batch recompute Incoming Data Serving Layer batch view batch view Merge Batch Layer Speed Layer Process stream Incremented information Realtime increment query real time view real time view Source: Marz, N. & Warren, J. (2013) Big Data. Manning. 38 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 39. Lambda Architecture in Action Implementation in ongoing Proof-of-concept (after completion of phase 1) Precompute Precomputed Views information All data Batch recompute Incoming Data batch view batch view Speed Layer Process stream Incremented information Realtime increment 39 Serving Layer Merge Batch Layer 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 real time view real time view query
  • 40. Lambda Architecture in Action Twitter Horsebird Client (hbc) •  Twitter Java API over Streaming API Spring Framework •  Popular Java Framework used to modularize part of the logic (sensor and serving layer) Apache Kafka •  Simple messaging framework based on file system to distribute information to both batch and speed layer Apache Avro •  40 •  Distribution of Apache Hadoop: HDFS, MapReduce, Hive, Flume, Pig, Impala Cloudera Impala •  distributed query execution engine that runs against data stored in HDFS and HBase Apache Zookeeper •  Distributed, highly available coordination service. Provides primitives such as distributed locks Apache Storm & Trident Serialization system for efficient cross-language RPC and persistent data storage JSON •  Cloudera Distribution •  distributed, fault-tolerant realtime computation system Apache Cassandra open standard format that uses humanreadable text to transmit data objects consisting of attribute–value pairs. •  2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure
  • 41. Lambda Architecture with Oracle Product Stack Incoming Data Views information Batch recompute Oracle Data Integrator Speed Layer Incremented Oracle Process stream Event Processing information Oracle Event Processing Oracle GoldenGate Oracle GoldenGate Oracle Service Bus 41 Oracle Coherence Oracle RDBMS batch view 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 Oracle Endeca OBIEE All data batch view Oracle NoSQL Merge Precompute Oracle BigData Appliance Precomputed Serving Layer Oracle Web Logic Server Oracle ADF Batch Layer Oracle Big Data
 Connectors Possible implementation with Oracle Product stack real time view Oracle Coherence Oracle NoSQL real time view query
  • 42. Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 42 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 43. Summary – The lambda architecture §  The Lambda Architecture §  Can discard batch views and real-time views and recreate everything from scratch §  Mistakes corrected via re-computation §  Data storage layer optimized independently from query resolution layer §  Still in a very early …. But a very interesting idea! -  Today a zoo of technologies are needed => Operations won‘t like it §  The technology/implementation §  Different query language for batch and real time §  An abstraction over batch and speed layer needed -  Cascading and Trident are already similar §  Not everything works out-of-the-box and together §  Industry standards needed! 43 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • 44. THANK YOU. Trivadis AG Guido Schmutz Europa-Strasse 5
 CH-8095 Glattbrugg info@trivadis.com
 www.trivadis.com BASEL 44 BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 HAMBURG MÜNCHEN STUTTGART WIEN