SlideShare a Scribd company logo
1 of 37
May 16, 2014
1
Team: Undergrad Tim Hegeman, … Grad Yong Guo, Mihai Capota, Bogdan Ghit
Researchers Marcin Biczak, Otto Visser Staff Henk Sips, Dick Epema
Collaborators* Ana Lucia Varbanescu (UvA, Ams), Claudio Martella (VU, Giraph), KIT,
Intel Research Labs, IBM TJ Watson, SAP, Google Inc. MV, Salesforce SF, …
* Not their fault for any mistakes in this presentation. Or so they wish.
Big Data in the Cloud: Enabling the
Fourth Paradigm by Matching SMEs
with Datacenters
Alexandru Iosup
Delft University of Technology
The Netherlands
2nd ISO/IEC JTC 1 Study Group on Big Data, Amsterdam
(We are here)
60km
35mi
founded 1842
pop: 13,000
Data at the Core of Our Society:
The LinkedIn Example
2
Feb 2012
100M Mar 2011, 69M May 2010
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/
via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
A very good resource for matchmaking
workforce and prospective employers
Vital for your company’s life,
as your Head of HR would tell you
Vital for the prospective employees
Data at the Core of Our Society:
The LinkedIn Example
3
Feb 2012
100M Mar 2011, 69M May 2010
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/
via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
3-4 new users
every second
Great, if you can
process this graph:
opinion mining,
hub detection, etc.
Data at the Core of Our Society:
The LinkedIn Example
4
Feb 2012
100M Mar 2011, 69M May 2010
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/
via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
but fewer visitors
(and page views)
139/277 million
questions of customer
retention, so
time-based analytics
3-4 new users
every second
Great, if you can
process this graph:
opinion mining,
hub detection, etc.
LinkedIn Is Part of the
“Data Deluge”
May 2014 5
Sources: IDC, EMC.
Data Deluge =
data generated
by humans and
devices (IoT)
• Interacting
• Understanding
• Deciding
• Creating
The Data Deluge Is
A Challenge for Tech
But Good for Us[ers]
• All human knowledge
• Until 2005: 150 Exa-Bytes
• 2010: 1,200 Exa-Bytes
• Online gaming (Consumer)
• 2002: 20TB/year/game
• 2008: 1.4PB/year/game (only stats)
• Public archives (Science)
• 2006: GBs/archive
• 2011: TBs/year/archive
6
Dataset
Size
Year
1GB
10GB
100GB
1TB
1TB/yr
P2PTA
GTA
‘09 ‘10 ‘11‘06
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/
via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
The Challenge: The Three “V”s of Big Data
When You Can, Keep and Process Everything
• Volume
• More data vs. better models
• Exponential growth + iterative models
• Scalable storage and distributed queries
• Velocity
• Speed of the feedback loop
• Gain competitive advantage: fast recommendations
• Analysis in near-real time to extract value
• Variety
• The data can become messy: text, video, audio, etc.
• Difficult to integrate into applications
2011-2012 7
Adapted from: Doug Laney, “3D data management”, META Group/Gartner report,
Feb 2001. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-
Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
Too big, too fast,
does not comply
with traditional DB
* New queries later
The Opportunity, via a Detour (An Anecdotal Example)
The Overwhelming Growth of Knowledge
“When 12 men founded the
Royal Society in 1660, it was
possible for an educated
person to encompass all of
scientific knowledge. […] In
the last 50 years, such has
been the pace of scientific
advance that even the best
scientists cannot keep up
with discoveries at frontiers
outside their own field.”
Tony Blair,
PM Speech, May 2002
1997
2001
1993
1997
Number of
Publications
Data: King,The scientific impact of nations,Nature’04.
Professionals already know
they don’t know [it all]
The Opportunity, via a Detour
• Thousand years ago:
science was empirical describing natural phenomena
• Last few hundred years:
theoretical branch using models, generalizations
• Last few decades:
a computational branch simulating complex phenomena
• Today (the Fourth Paradigm):
data exploration
unify theory, experiment, and simulation
• Data captured by instruments or generated by simulator
• Processed by software
• Information/Knowledge stored in computer
• Scientist analyzes results using data management and statistics
9
Source: Jim Gray and “The Fourth Paradigm”,
http://research.microsoft.com/en-us/collaboration/fourthparadigm/
2
2
2
.
3
4
a
cG
a
a












From Hypothesis to Data
The Fourth Paradigm is suitable for
professionals who already know they
don’t know [enough to formulate good
hypotheses], yet need to deliver quickly
The Vision: Everyone Is a Scientist!
(the Fourth Paradigm)
• Data as individual right, enabling private lifestyle and
modern societal services
• Data as workhorse in creating services for SMEs
(~60% gross value added, for many years)
May 2014 10
Sources: European Commission Annual Reports 2012 & 2013, ECORYS,
Eurostat, National Statistical Offices, DIW, DIW econ, London Economics.
EC reasons to address Big Data challenges
>500 million people
>85 million employees
>3 trillion euros / year gross value added
Can We Afford This Vision, with the Current
Technology and Resources? (An Anecdote)
May 2014 11
Time magazine reported that it
takes 0.0002kWh to stream 1
minute of video from the
YouTube data centre…
Based on Jay Walker’s recent
TED talk, 0.01kWh of energy is
consumed on average in
downloading 1MB over the
Internet.
The average Internet device
energy consumption is around
0.001kWh for 1 minute of video
streaming
For 1.6B downloads of this 17MB
file and streaming for 4 minutes
gives the overall energy for this
one pop video in one year…
Source: Ian Bitterlin and Jon Summers, UoL, UK.
312GWh = more than some countries in a year,
36MW of 24/7/365 diesel, 100M liters of Oil,
80,000 cars running for a year, ...
Can We Afford This Vision, with the
Current Technology and Resources?
• Not with the current technology (in this presentation)
• Not with the current resources (energy, human, …)
May 2014 12
Sources: DatacenterDynamics and Jon Summers, UoL, UK.
Global power
consumption
EC
Breakdown of EC
power consumption
Our Big Data Team, PDS Group at TU Delft
(http://www.pds.ewi.tudelft.nl/)
May 16, 2014 13
Dick Epema
TU Delft
Big Data & Clouds
Res. management
Systems
Alexandru Iosup
TU Delft
Big Data & Clouds
Res. management
Systems, Benchmarking
Bogdan Ghit
TU Delft
Systems
Workloads
Mihai Capota
TU Delft
Big Data apps
Benchmarking
Ana Lucia Varbanescu
U. Amsterdam
Graph processing
Benchmarking
Yong Guo
TU Delft
Graph processing
Benchmarking
Marcin Biczak
TU Delft
Big Data & Clouds
Performance & Development
Claudio Martella
VU Amsterdam
Graph processing
Agenda
1. Big Data, Our Vision, Our Team
2. Big Data on Clouds
1. The Big Data ecosystem
2. Understanding workloads
3. Benchmarking
4. How can clouds help?
Elastic systems
3. Summary
2012-2013 14
Elastic Systems
Modeling
Benchmarking
Ecosystem
The Current Technology
Big Data = Systems of Systems
2012-2013 15
Hive
MapReduce Model
Hadoop/
YARN
HDFS
Adapted from: Dagstuhl Seminar on Information Management in the Cloud,
http://www.dagstuhl.de/program/calendar/partlist/?semnr=11321&SUOG
Storage Engine
Execution Engine
High-Level Language
Programming Model
Asterix
B-tree
Algebrix
Hyracks
AQL
Dremel
Service
Tree
SQL PigJAQL
PACT
MPI/
Erlang
L
F
S
Nephele DryadHaloop
DryadLINQScope
Pregel
CosmosFS
Azure
Engine
Tera
Data
Engine
Azure
Data
Store
Tera
Data
Store
VoldemortGFS
BigQueryFlume
Flume
Engine
S3
Dataflow
Giraph
SawzallMeteor
* Plus Zookeeper, CDN, etc.
The Problem:
Monolithic Systems
• Monolithic
• Integrated stack
(can still learn from decades of sw.eng.)
• Fixed set of homogeneous resources
(we forgot 2 decades of distrib.sys.)
• Execution engines do not coexist
(we’re running now MPI inside Hadoop Maps,
Hadoop jobs inside MPI processes, etc.)
• Little performance information is exposed
(we forgot 4 decades of par.sys.)
• …
Stuck in stacks!
2012-2013 16
Hive
MapReduce Model
Hadoop/
YARN
HDFS
Storage Engine
Execution Engine
High-Level Language
Programming Model
A. L. Varbanescu and A. Iosup, On Many-Task Big Data Processing: from
GPUs to Clouds. Proc. of SC|12 (MTAGS).
http://www.pds.ewi.tudelft.nl/~iosup/many-tasks-big-data-vision13mtags_v100.pdf
Instead…
Many-Task Big-Data Processing on Heterogeneous
Resources: from GPUs to Clouds
1. Take Big-Data Processing applications
2. Split into Many Tasks
3. Each of the tasks parallelized to match resources
4. Execute each Task on the most efficient resource
5. Exploiting the massive parallelism available now and
increasing in the combination multi-core CPUs & GPUs
6. Using the set of resources provided by local clusters
7. And exploiting the efficient elasticity of IaaS clouds
2012-2013 17
A. L. Varbanescu and A. Iosup, On Many-Task Big Data Processing: from
GPUs to Clouds. Proc. of SC|12 (MTAGS).
http://www.pds.ewi.tudelft.nl/~iosup/many-tasks-big-data-vision13mtags_v100.pdf
Agenda
1. Big Data, Our Vision, Our Team
2. Big Data on Clouds
1. The Big Data ecosystem
2. Understanding workloads
3. Benchmarking
4. How can clouds help?
Elastic systems
3. Summary
2012-2013 20
Elastic Systems
Modeling
Benchmarking
Ecosystem
Statistical MapReduce Models From
Long-Term Usage Traces
• Real traces
• Yahoo
• Google
• 2 x Social Network Provider
May 16, 2014 21
Q0
de Ruiter and Iosup. A workload model for MapReduce.
MSc thesis at TU Delft. Jun 2012. Available online via
TU Delft Library, http://library.tudelft.nl .
MapReduce Is Now Part of Workflows
Use Case: Monitoring Large-Scale Distributed
Computing System with 160M users
Inter-query
dependencies
Hegeman, Ghit, Capotã, Hidders, Epema, Iosup. The BTWorld Use
Case for Big Data Analytics: Description, MapReduce Logical
Workflow, and Empirical Evaluation.IEEE BigData’13
logscale
Diverse queries
New queries during project
Agenda
1. Big Data, Our Vision, Our Team
2. Big Data on Clouds
1. The Big Data ecosystem
2. Understanding workloads
3. Benchmarking
4. How can clouds help?
Elastic systems
3. Summary
2012-2013 26
Elastic Systems
Modeling
Benchmarking
Ecosystem
Performance: Our Team Also Includes...
May 16, 2014 27
Alexandru Iosup
TU Delft
Performance modeling
Performance evauluation
Ana Lucia Varbanescu
U. Amsterdam
Performance modeling
Parallel systems
Multi-core systems
Jianbin Fang
TU Delft
Parallel systems
Multi-core systems
Tianhe/Xeon Phi
Jie Shen
TU Delft
Performance evaluation
Parallel systems
Multi-core systems
Benchmarking MapReduce Systems
May 2014 28
Non-linear
scaling
Provide a platform for collaborative research efforts in
the areas of computer benchmarking and quantitative
system analysis
Provide metrics, tools and benchmarks for evaluating
early prototypes and research results as well as full-
blown implementations
Foster interactions and collaborations btw. industry and
academia
Mission Statement
The Research Group of the
Standard Performance Evaluation Corporation
SPEC Research Group (RG)
More information: http://research.spec.org
Ad: Join us!
May 16, 2014 37
Benchmarking suite
Platforms and Process
• Platforms
• Process
• Evaluate baseline (out of the box) and tuned performance
• Evaluate performance on fixed-size system
• Future: evaluate performance on elastic-size system
• Evaluate scalability
YARN
Giraph
Guo, Biczak, Varbanescu, Iosup, Martella, Willke.
How Well do Graph-Processing Platforms Perform?
An Empirical Performance Evaluation and Analysis
http://bit.ly/10hYdIU
Guo, Biczak, Varbanescu, Iosup, Martella, Willke. Benchmarking Graph-
Processing Platforms: A Vision. Proc. of ICPE 2014.
May 16, 2014 39
BFS: results for all platforms, all data sets
• No platform runs fastest for every graph
• Not all platforms can process all graphs
• Hadoop is the worst performer
Guo, Biczak, Varbanescu, Iosup, Martella, Willke.
How Well do Graph-Processing Platforms Perform?
An Empirical Performance Evaluation and Analysis
http://bit.ly/10hYdIU
May 16, 2014 40
Giraph: results for
all algorithms, all data sets
• Storing the whole graph in memory helps Giraph perform well
• Giraph may crash when graphs or number of messages large
Guo, Biczak, Varbanescu, Iosup, Martella, Willke.
How Well do Graph-Processing Platforms Perform?
An Empirical Performance Evaluation and Analysis
http://bit.ly/10hYdIU
Agenda
1. Big Data, Our Vision, Our Team
2. Big Data on Clouds
1. The Big Data ecosystem
2. Understanding workloads
3. Benchmarking
4. How can clouds help?
Elastic systems
3. Summary
2012-2013 45
Elastic Systems
Modeling
Benchmarking
Ecosystem
Elasticity: Our Team Elastically Includes ...
May 16, 2014 46
Alexandru Iosup
TU Delft
Provisioning
Allocation
Elasticity
Portfolio Scheduling
Isolation
Multi-Tenancy
Athanasios Antoniou
TU Delft
Provisioning
Allocation
Isolation
Utility
Orna Agmon-Ben Yehuda
Technion
Elasticity, Utility
David Villegas
FIU/IBM
Elasticity, Utility
Kefeng Deng
NUDT
Portfolio Scheduling
Cloud Computing, the useful IT service
“Use only when you want! Pay only for what you use!”
May 16, 2014 47
IaaS Cloud Computing:
Energy-Efficient IT Infrastructure Service
Elasticity, Performance and Cost-Awareness
Why Dynamic Data Processing Clusters?
• Improve resource utilization
 Grow when the workload is too heavy
 Shrink when resources are idle
• Fairness across multiple
data processing clusters
 Redistribute idle resources
 Allocate resources for new MR clusters
49
cluster
Ghit and Epema. Resource Management for Dynamic
MapReduce Clusters in Multicluster Systems.
MTAGS 2012. Best Paper Award.
Isolation
• Performance
• Failure
• Data
• Version
Elastic MapReduce, TUD version
• Two types of nodes
• Core nodes: compute and data storage (DataNode)
• Transient nodes: only compute / + data storage
55
Ghit and Epema. Resource Management for Dynamic
MapReduce Clusters in Multicluster Systems.
MTAGS 2012. Best Paper Award.
Timeline
Sgrow
Sshrink
Sgrow
Sshrink
Performance of Resizing using
Static, Transient, and Core Nodes
57
+20x+20x
Sort + WordCount
(50 jobs, 1-50GB)
better
B. Ghit, N. Yigitbasi, A. Iosup, and D. Epema.
Balanced Resource Allocations Across Multiple
Dynamic MapReduce Clusters, SIGMETRICS 2014.
20 x
Big Data processing: possible to get better
performance using elastic data processing
+
we understand how for many scenarios
(key is balanced allocations)
Agenda
1. Big Data, Our Vision, Our Team
2. Big Data on Clouds
1. The Big Data ecosystem
2. Understanding workloads
3. Benchmarking
4. How can clouds help?
Elastic systems
3. Summary
2012-2013 62
Elastic Systems
Modeling
Benchmarking
Ecosystem
May 16, 2014 63
Conclusion Take-Home Message
• Big Data is necessary but grand challenge
• Big Data = Systems of Systems
• Big data programming models have ecosystems
• Stuck in stacks!
• Many trade-offs, many programming models, many problems
• Towards a Generic Big-Data Processing System
• Looking at the Execution Engine—thrilling moment for this!
• Predictability challenges: Understanding workload (modeling) and
performance (benchmarking)
• Performance challenges: distrib/parallel from the beginning
• Elasticity challenges: elastic data processing, portfolio scheduling, etc.
• etc.
May 16, 2014 64
Thank you for your attention! Questions?
Suggestions? Observations?
Alexandru Iosup
A.Iosup@tudelft.nl
http://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”)
Parallel and Distributed Systems Group
Delft University of Technology
- http://www.st.ewi.tudelft.nl/~iosup/research.html
- http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html
- http://www.pds.ewi.tudelft.nl/
More Info:
Do not hesitate
to contact me…

More Related Content

What's hot

Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Semantic Web: The Inside Story
Semantic Web: The Inside StorySemantic Web: The Inside Story
Semantic Web: The Inside StoryJames Hendler
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USCSri Ambati
 
Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceUniversity of Washington
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data CenterGilles Fedak
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
BioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesBioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for RealJames Hendler
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsUniversity of Washington
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)James Hendler
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015Jonathan Woodward
 
Damss scurt v2 dss an evolving class ...
Damss scurt  v2 dss an evolving class ...Damss scurt  v2 dss an evolving class ...
Damss scurt v2 dss an evolving class ...ISSIP
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide WebJames Hendler
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersSudha Jamthe
 
Taming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureTaming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
 
2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
 

What's hot (20)

Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Semantic Web: The Inside Story
Semantic Web: The Inside StorySemantic Web: The Inside Story
Semantic Web: The Inside Story
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
 
Broad Data
Broad DataBroad Data
Broad Data
 
Visual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory ScienceVisual Data Analytics in the Cloud for Exploratory Science
Visual Data Analytics in the Cloud for Exploratory Science
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
BioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesBioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the Trenches
 
End-to-End eScience
End-to-End eScienceEnd-to-End eScience
End-to-End eScience
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD Models
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
 
Damss scurt v2 dss an evolving class ...
Damss scurt  v2 dss an evolving class ...Damss scurt  v2 dss an evolving class ...
Damss scurt v2 dss an evolving class ...
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide Web
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
Machine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business LeadersMachine Learning Introduction for Digital Business Leaders
Machine Learning Introduction for Digital Business Leaders
 
Taming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureTaming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged Infrastructure
 
2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation
 

Viewers also liked

Linked indiscussion2014 crucialdialoguemodel
Linked indiscussion2014 crucialdialoguemodelLinked indiscussion2014 crucialdialoguemodel
Linked indiscussion2014 crucialdialoguemodelJohan Roels
 
The fourth paradigm in safety
The fourth paradigm in safetyThe fourth paradigm in safety
The fourth paradigm in safetyJohan Roels
 
Dsd int 2014 - data science symposium - 4th paradigm a research perspective, ...
Dsd int 2014 - data science symposium - 4th paradigm a research perspective, ...Dsd int 2014 - data science symposium - 4th paradigm a research perspective, ...
Dsd int 2014 - data science symposium - 4th paradigm a research perspective, ...Deltares
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016Jisc
 
Cybersecurity 1. intro to cybersecurity
Cybersecurity 1. intro to cybersecurityCybersecurity 1. intro to cybersecurity
Cybersecurity 1. intro to cybersecuritysommerville-videos
 
Introduction to Cyber Security
Introduction to Cyber SecurityIntroduction to Cyber Security
Introduction to Cyber SecurityStephen Lahanas
 
Network Security
Network SecurityNetwork Security
Network SecurityMAJU
 
Cyber security presentation
Cyber security presentationCyber security presentation
Cyber security presentationBijay Bhandari
 
Top Cyber Security Trends for 2016
Top Cyber Security Trends for 2016Top Cyber Security Trends for 2016
Top Cyber Security Trends for 2016Imperva
 
Cyber security
Cyber securityCyber security
Cyber securitySiblu28
 
Cyber crime and security ppt
Cyber crime and security pptCyber crime and security ppt
Cyber crime and security pptLipsita Behera
 

Viewers also liked (14)

Linked indiscussion2014 crucialdialoguemodel
Linked indiscussion2014 crucialdialoguemodelLinked indiscussion2014 crucialdialoguemodel
Linked indiscussion2014 crucialdialoguemodel
 
The Fourth Paradigm Book
The Fourth Paradigm BookThe Fourth Paradigm Book
The Fourth Paradigm Book
 
Data Science and the Fourth Paradigm by Torben Bach Pedersen
Data Science and the Fourth Paradigm by Torben Bach PedersenData Science and the Fourth Paradigm by Torben Bach Pedersen
Data Science and the Fourth Paradigm by Torben Bach Pedersen
 
The fourth paradigm in safety
The fourth paradigm in safetyThe fourth paradigm in safety
The fourth paradigm in safety
 
Dsd int 2014 - data science symposium - 4th paradigm a research perspective, ...
Dsd int 2014 - data science symposium - 4th paradigm a research perspective, ...Dsd int 2014 - data science symposium - 4th paradigm a research perspective, ...
Dsd int 2014 - data science symposium - 4th paradigm a research perspective, ...
 
Introduction to Cyber Security
Introduction to Cyber SecurityIntroduction to Cyber Security
Introduction to Cyber Security
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
Cybersecurity 1. intro to cybersecurity
Cybersecurity 1. intro to cybersecurityCybersecurity 1. intro to cybersecurity
Cybersecurity 1. intro to cybersecurity
 
Introduction to Cyber Security
Introduction to Cyber SecurityIntroduction to Cyber Security
Introduction to Cyber Security
 
Network Security
Network SecurityNetwork Security
Network Security
 
Cyber security presentation
Cyber security presentationCyber security presentation
Cyber security presentation
 
Top Cyber Security Trends for 2016
Top Cyber Security Trends for 2016Top Cyber Security Trends for 2016
Top Cyber Security Trends for 2016
 
Cyber security
Cyber securityCyber security
Cyber security
 
Cyber crime and security ppt
Cyber crime and security pptCyber crime and security ppt
Cyber crime and security ppt
 

Similar to Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Datacenters

Soderstrom
SoderstromSoderstrom
SoderstromNASAPMC
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
Ibm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 shortIbm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 shortISSIP
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle Kimberly Hoffman
 
Spohrer smarter service systems transdisciplinarity 20141121 v7
Spohrer smarter service systems transdisciplinarity  20141121 v7Spohrer smarter service systems transdisciplinarity  20141121 v7
Spohrer smarter service systems transdisciplinarity 20141121 v7ISSIP
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Carlo Vaccari
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...Rida Qayyum
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
BSC and Integrating Persistent Data and Parallel Programming Models
BSC and Integrating Persistent Data and Parallel Programming ModelsBSC and Integrating Persistent Data and Parallel Programming Models
BSC and Integrating Persistent Data and Parallel Programming Modelsinside-BigData.com
 
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...AlmereDataCapital
 

Similar to Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Datacenters (20)

Soderstrom
SoderstromSoderstrom
Soderstrom
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Ibm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 shortIbm and innovation overview 20150326 v15 short
Ibm and innovation overview 20150326 v15 short
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Rdaeu russia_fg_1_july2014_final
Rdaeu  russia_fg_1_july2014_finalRdaeu  russia_fg_1_july2014_final
Rdaeu russia_fg_1_july2014_final
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
Big data mining
Big data miningBig data mining
Big data mining
 
Spohrer smarter service systems transdisciplinarity 20141121 v7
Spohrer smarter service systems transdisciplinarity  20141121 v7Spohrer smarter service systems transdisciplinarity  20141121 v7
Spohrer smarter service systems transdisciplinarity 20141121 v7
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
BSC and Integrating Persistent Data and Parallel Programming Models
BSC and Integrating Persistent Data and Parallel Programming ModelsBSC and Integrating Persistent Data and Parallel Programming Models
BSC and Integrating Persistent Data and Parallel Programming Models
 
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
Maurice Bouwhuis (SARA/Vancis) - Hoe big data te begrijpen door ze te visuali...
 

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Recently uploaded (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Datacenters

  • 1. May 16, 2014 1 Team: Undergrad Tim Hegeman, … Grad Yong Guo, Mihai Capota, Bogdan Ghit Researchers Marcin Biczak, Otto Visser Staff Henk Sips, Dick Epema Collaborators* Ana Lucia Varbanescu (UvA, Ams), Claudio Martella (VU, Giraph), KIT, Intel Research Labs, IBM TJ Watson, SAP, Google Inc. MV, Salesforce SF, … * Not their fault for any mistakes in this presentation. Or so they wish. Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Datacenters Alexandru Iosup Delft University of Technology The Netherlands 2nd ISO/IEC JTC 1 Study Group on Big Data, Amsterdam (We are here) 60km 35mi founded 1842 pop: 13,000
  • 2. Data at the Core of Our Society: The LinkedIn Example 2 Feb 2012 100M Mar 2011, 69M May 2010 Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/ A very good resource for matchmaking workforce and prospective employers Vital for your company’s life, as your Head of HR would tell you Vital for the prospective employees
  • 3. Data at the Core of Our Society: The LinkedIn Example 3 Feb 2012 100M Mar 2011, 69M May 2010 Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/ 3-4 new users every second Great, if you can process this graph: opinion mining, hub detection, etc.
  • 4. Data at the Core of Our Society: The LinkedIn Example 4 Feb 2012 100M Mar 2011, 69M May 2010 Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/ but fewer visitors (and page views) 139/277 million questions of customer retention, so time-based analytics 3-4 new users every second Great, if you can process this graph: opinion mining, hub detection, etc.
  • 5. LinkedIn Is Part of the “Data Deluge” May 2014 5 Sources: IDC, EMC. Data Deluge = data generated by humans and devices (IoT) • Interacting • Understanding • Deciding • Creating
  • 6. The Data Deluge Is A Challenge for Tech But Good for Us[ers] • All human knowledge • Until 2005: 150 Exa-Bytes • 2010: 1,200 Exa-Bytes • Online gaming (Consumer) • 2002: 20TB/year/game • 2008: 1.4PB/year/game (only stats) • Public archives (Science) • 2006: GBs/archive • 2011: TBs/year/archive 6 Dataset Size Year 1GB 10GB 100GB 1TB 1TB/yr P2PTA GTA ‘09 ‘10 ‘11‘06 Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
  • 7. The Challenge: The Three “V”s of Big Data When You Can, Keep and Process Everything • Volume • More data vs. better models • Exponential growth + iterative models • Scalable storage and distributed queries • Velocity • Speed of the feedback loop • Gain competitive advantage: fast recommendations • Analysis in near-real time to extract value • Variety • The data can become messy: text, video, audio, etc. • Difficult to integrate into applications 2011-2012 7 Adapted from: Doug Laney, “3D data management”, META Group/Gartner report, Feb 2001. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data- Management-Controlling-Data-Volume-Velocity-and-Variety.pdf Too big, too fast, does not comply with traditional DB * New queries later
  • 8. The Opportunity, via a Detour (An Anecdotal Example) The Overwhelming Growth of Knowledge “When 12 men founded the Royal Society in 1660, it was possible for an educated person to encompass all of scientific knowledge. […] In the last 50 years, such has been the pace of scientific advance that even the best scientists cannot keep up with discoveries at frontiers outside their own field.” Tony Blair, PM Speech, May 2002 1997 2001 1993 1997 Number of Publications Data: King,The scientific impact of nations,Nature’04. Professionals already know they don’t know [it all]
  • 9. The Opportunity, via a Detour • Thousand years ago: science was empirical describing natural phenomena • Last few hundred years: theoretical branch using models, generalizations • Last few decades: a computational branch simulating complex phenomena • Today (the Fourth Paradigm): data exploration unify theory, experiment, and simulation • Data captured by instruments or generated by simulator • Processed by software • Information/Knowledge stored in computer • Scientist analyzes results using data management and statistics 9 Source: Jim Gray and “The Fourth Paradigm”, http://research.microsoft.com/en-us/collaboration/fourthparadigm/ 2 2 2 . 3 4 a cG a a             From Hypothesis to Data The Fourth Paradigm is suitable for professionals who already know they don’t know [enough to formulate good hypotheses], yet need to deliver quickly
  • 10. The Vision: Everyone Is a Scientist! (the Fourth Paradigm) • Data as individual right, enabling private lifestyle and modern societal services • Data as workhorse in creating services for SMEs (~60% gross value added, for many years) May 2014 10 Sources: European Commission Annual Reports 2012 & 2013, ECORYS, Eurostat, National Statistical Offices, DIW, DIW econ, London Economics. EC reasons to address Big Data challenges >500 million people >85 million employees >3 trillion euros / year gross value added
  • 11. Can We Afford This Vision, with the Current Technology and Resources? (An Anecdote) May 2014 11 Time magazine reported that it takes 0.0002kWh to stream 1 minute of video from the YouTube data centre… Based on Jay Walker’s recent TED talk, 0.01kWh of energy is consumed on average in downloading 1MB over the Internet. The average Internet device energy consumption is around 0.001kWh for 1 minute of video streaming For 1.6B downloads of this 17MB file and streaming for 4 minutes gives the overall energy for this one pop video in one year… Source: Ian Bitterlin and Jon Summers, UoL, UK. 312GWh = more than some countries in a year, 36MW of 24/7/365 diesel, 100M liters of Oil, 80,000 cars running for a year, ...
  • 12. Can We Afford This Vision, with the Current Technology and Resources? • Not with the current technology (in this presentation) • Not with the current resources (energy, human, …) May 2014 12 Sources: DatacenterDynamics and Jon Summers, UoL, UK. Global power consumption EC Breakdown of EC power consumption
  • 13. Our Big Data Team, PDS Group at TU Delft (http://www.pds.ewi.tudelft.nl/) May 16, 2014 13 Dick Epema TU Delft Big Data & Clouds Res. management Systems Alexandru Iosup TU Delft Big Data & Clouds Res. management Systems, Benchmarking Bogdan Ghit TU Delft Systems Workloads Mihai Capota TU Delft Big Data apps Benchmarking Ana Lucia Varbanescu U. Amsterdam Graph processing Benchmarking Yong Guo TU Delft Graph processing Benchmarking Marcin Biczak TU Delft Big Data & Clouds Performance & Development Claudio Martella VU Amsterdam Graph processing
  • 14. Agenda 1. Big Data, Our Vision, Our Team 2. Big Data on Clouds 1. The Big Data ecosystem 2. Understanding workloads 3. Benchmarking 4. How can clouds help? Elastic systems 3. Summary 2012-2013 14 Elastic Systems Modeling Benchmarking Ecosystem
  • 15. The Current Technology Big Data = Systems of Systems 2012-2013 15 Hive MapReduce Model Hadoop/ YARN HDFS Adapted from: Dagstuhl Seminar on Information Management in the Cloud, http://www.dagstuhl.de/program/calendar/partlist/?semnr=11321&SUOG Storage Engine Execution Engine High-Level Language Programming Model Asterix B-tree Algebrix Hyracks AQL Dremel Service Tree SQL PigJAQL PACT MPI/ Erlang L F S Nephele DryadHaloop DryadLINQScope Pregel CosmosFS Azure Engine Tera Data Engine Azure Data Store Tera Data Store VoldemortGFS BigQueryFlume Flume Engine S3 Dataflow Giraph SawzallMeteor * Plus Zookeeper, CDN, etc.
  • 16. The Problem: Monolithic Systems • Monolithic • Integrated stack (can still learn from decades of sw.eng.) • Fixed set of homogeneous resources (we forgot 2 decades of distrib.sys.) • Execution engines do not coexist (we’re running now MPI inside Hadoop Maps, Hadoop jobs inside MPI processes, etc.) • Little performance information is exposed (we forgot 4 decades of par.sys.) • … Stuck in stacks! 2012-2013 16 Hive MapReduce Model Hadoop/ YARN HDFS Storage Engine Execution Engine High-Level Language Programming Model A. L. Varbanescu and A. Iosup, On Many-Task Big Data Processing: from GPUs to Clouds. Proc. of SC|12 (MTAGS). http://www.pds.ewi.tudelft.nl/~iosup/many-tasks-big-data-vision13mtags_v100.pdf
  • 17. Instead… Many-Task Big-Data Processing on Heterogeneous Resources: from GPUs to Clouds 1. Take Big-Data Processing applications 2. Split into Many Tasks 3. Each of the tasks parallelized to match resources 4. Execute each Task on the most efficient resource 5. Exploiting the massive parallelism available now and increasing in the combination multi-core CPUs & GPUs 6. Using the set of resources provided by local clusters 7. And exploiting the efficient elasticity of IaaS clouds 2012-2013 17 A. L. Varbanescu and A. Iosup, On Many-Task Big Data Processing: from GPUs to Clouds. Proc. of SC|12 (MTAGS). http://www.pds.ewi.tudelft.nl/~iosup/many-tasks-big-data-vision13mtags_v100.pdf
  • 18. Agenda 1. Big Data, Our Vision, Our Team 2. Big Data on Clouds 1. The Big Data ecosystem 2. Understanding workloads 3. Benchmarking 4. How can clouds help? Elastic systems 3. Summary 2012-2013 20 Elastic Systems Modeling Benchmarking Ecosystem
  • 19. Statistical MapReduce Models From Long-Term Usage Traces • Real traces • Yahoo • Google • 2 x Social Network Provider May 16, 2014 21 Q0 de Ruiter and Iosup. A workload model for MapReduce. MSc thesis at TU Delft. Jun 2012. Available online via TU Delft Library, http://library.tudelft.nl .
  • 20. MapReduce Is Now Part of Workflows Use Case: Monitoring Large-Scale Distributed Computing System with 160M users Inter-query dependencies Hegeman, Ghit, Capotã, Hidders, Epema, Iosup. The BTWorld Use Case for Big Data Analytics: Description, MapReduce Logical Workflow, and Empirical Evaluation.IEEE BigData’13 logscale Diverse queries New queries during project
  • 21. Agenda 1. Big Data, Our Vision, Our Team 2. Big Data on Clouds 1. The Big Data ecosystem 2. Understanding workloads 3. Benchmarking 4. How can clouds help? Elastic systems 3. Summary 2012-2013 26 Elastic Systems Modeling Benchmarking Ecosystem
  • 22. Performance: Our Team Also Includes... May 16, 2014 27 Alexandru Iosup TU Delft Performance modeling Performance evauluation Ana Lucia Varbanescu U. Amsterdam Performance modeling Parallel systems Multi-core systems Jianbin Fang TU Delft Parallel systems Multi-core systems Tianhe/Xeon Phi Jie Shen TU Delft Performance evaluation Parallel systems Multi-core systems
  • 23. Benchmarking MapReduce Systems May 2014 28 Non-linear scaling
  • 24. Provide a platform for collaborative research efforts in the areas of computer benchmarking and quantitative system analysis Provide metrics, tools and benchmarks for evaluating early prototypes and research results as well as full- blown implementations Foster interactions and collaborations btw. industry and academia Mission Statement The Research Group of the Standard Performance Evaluation Corporation SPEC Research Group (RG) More information: http://research.spec.org Ad: Join us!
  • 25. May 16, 2014 37 Benchmarking suite Platforms and Process • Platforms • Process • Evaluate baseline (out of the box) and tuned performance • Evaluate performance on fixed-size system • Future: evaluate performance on elastic-size system • Evaluate scalability YARN Giraph Guo, Biczak, Varbanescu, Iosup, Martella, Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis http://bit.ly/10hYdIU Guo, Biczak, Varbanescu, Iosup, Martella, Willke. Benchmarking Graph- Processing Platforms: A Vision. Proc. of ICPE 2014.
  • 26. May 16, 2014 39 BFS: results for all platforms, all data sets • No platform runs fastest for every graph • Not all platforms can process all graphs • Hadoop is the worst performer Guo, Biczak, Varbanescu, Iosup, Martella, Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis http://bit.ly/10hYdIU
  • 27. May 16, 2014 40 Giraph: results for all algorithms, all data sets • Storing the whole graph in memory helps Giraph perform well • Giraph may crash when graphs or number of messages large Guo, Biczak, Varbanescu, Iosup, Martella, Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis http://bit.ly/10hYdIU
  • 28. Agenda 1. Big Data, Our Vision, Our Team 2. Big Data on Clouds 1. The Big Data ecosystem 2. Understanding workloads 3. Benchmarking 4. How can clouds help? Elastic systems 3. Summary 2012-2013 45 Elastic Systems Modeling Benchmarking Ecosystem
  • 29. Elasticity: Our Team Elastically Includes ... May 16, 2014 46 Alexandru Iosup TU Delft Provisioning Allocation Elasticity Portfolio Scheduling Isolation Multi-Tenancy Athanasios Antoniou TU Delft Provisioning Allocation Isolation Utility Orna Agmon-Ben Yehuda Technion Elasticity, Utility David Villegas FIU/IBM Elasticity, Utility Kefeng Deng NUDT Portfolio Scheduling
  • 30. Cloud Computing, the useful IT service “Use only when you want! Pay only for what you use!” May 16, 2014 47
  • 31. IaaS Cloud Computing: Energy-Efficient IT Infrastructure Service
  • 32. Elasticity, Performance and Cost-Awareness Why Dynamic Data Processing Clusters? • Improve resource utilization  Grow when the workload is too heavy  Shrink when resources are idle • Fairness across multiple data processing clusters  Redistribute idle resources  Allocate resources for new MR clusters 49 cluster Ghit and Epema. Resource Management for Dynamic MapReduce Clusters in Multicluster Systems. MTAGS 2012. Best Paper Award. Isolation • Performance • Failure • Data • Version
  • 33. Elastic MapReduce, TUD version • Two types of nodes • Core nodes: compute and data storage (DataNode) • Transient nodes: only compute / + data storage 55 Ghit and Epema. Resource Management for Dynamic MapReduce Clusters in Multicluster Systems. MTAGS 2012. Best Paper Award. Timeline Sgrow Sshrink Sgrow Sshrink
  • 34. Performance of Resizing using Static, Transient, and Core Nodes 57 +20x+20x Sort + WordCount (50 jobs, 1-50GB) better B. Ghit, N. Yigitbasi, A. Iosup, and D. Epema. Balanced Resource Allocations Across Multiple Dynamic MapReduce Clusters, SIGMETRICS 2014. 20 x Big Data processing: possible to get better performance using elastic data processing + we understand how for many scenarios (key is balanced allocations)
  • 35. Agenda 1. Big Data, Our Vision, Our Team 2. Big Data on Clouds 1. The Big Data ecosystem 2. Understanding workloads 3. Benchmarking 4. How can clouds help? Elastic systems 3. Summary 2012-2013 62 Elastic Systems Modeling Benchmarking Ecosystem
  • 36. May 16, 2014 63 Conclusion Take-Home Message • Big Data is necessary but grand challenge • Big Data = Systems of Systems • Big data programming models have ecosystems • Stuck in stacks! • Many trade-offs, many programming models, many problems • Towards a Generic Big-Data Processing System • Looking at the Execution Engine—thrilling moment for this! • Predictability challenges: Understanding workload (modeling) and performance (benchmarking) • Performance challenges: distrib/parallel from the beginning • Elasticity challenges: elastic data processing, portfolio scheduling, etc. • etc.
  • 37. May 16, 2014 64 Thank you for your attention! Questions? Suggestions? Observations? Alexandru Iosup A.Iosup@tudelft.nl http://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”) Parallel and Distributed Systems Group Delft University of Technology - http://www.st.ewi.tudelft.nl/~iosup/research.html - http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html - http://www.pds.ewi.tudelft.nl/ More Info: Do not hesitate to contact me…