SlideShare a Scribd company logo
1 of 38
Cloud and Big Data
Sebastien Goasguen,
January 29th
@sebgoa
A view on Big Data
http://www.economist.com/node/15557443?story_id=15557443
SKA
How did we get there ?
A natural evolution
New Distributed systems for:
Large scale datasets
• From scientific instruments
• From Web apps logs

Complex datasets
• Not necessarily large.

Object stores
• S3 clones
BigData and map-reduce
• While BigData is often associated with HDFS,
Map-Reduce is the algorithm used to
parallelize data processing.
• BigData ≠ Map-Reduce ≠ HDFS
• Map-reduce is a way to express
embarrassingly parallel work easily.
• You can do Map-Reduce without HDFS.
• e.g Basho map-reduce on riackCS
A really quick view on Clouds
Open Source IaaS
Today
BigData at
peak
History
2003 –Google File System
2005 – Hadoop
2006 – Hadoop enters ASF incubator (Feb)
2006 – S3 launched
2007 – Paper on Amazon Dynamo
2009 – EMR launched
2013 – CloudStack as a ASF TLP (March)
2013 – Spark/Mesos enters ASF incubator
The Apache Software Foundation
Apache Software Foundation
35 projects in incubation:
•
•
•

12 Hadoop related
~30% Big Data related
Spark

117 top level projects:
•
•
•
•
•

~16 cloud or bigdata +10%
Deltacloud, Libcloud, Whirr, jclouds
Hadoop, couchdb, cassandra, mesos
Bigtop, accumulo, lucene, UIMA
CloudStack
Hadoop Ecosystem

+ Up-coming next generation BD
systems
Big Data and Cloud (Stack)s
Clouds and BigData
• Object store + compute IaaS to build EC2+S3
clone
• BigData solutions as storage backends for
image catalogue and large scale instance
storage.

• BigData solutions as workloads to CloudStack
based clouds.
EC2, S3 clone
• An open source IaaS with an EC2
wrapper e.g Opennebula
• Deploy a S3 compatible object store –
separately- e.g riakCS
• Two independent distributed systems
deployed

Cloud = EC2 + S3
Big Data
as IaaS backend
“Big Data” solutions can be used as secondary
storage
.
Example
• Open source IaaS + EC2 wrapper, e.g
CloudStack
• Deploy S3 compatible object store, e.g
riakCS or Ceph or glusterFS
• Use S3 as image store
• Your EC2 service is a customer to your
S3 service
• Logstash + elasticsearch for
logs/monitoring
Even use Bare Metal
Big Data as a Workload to the Cloud
Mesos, Spark are EC2 native

o ec2_deploy.py
o ec2_deploy.sh
o…
Tools
“PaaS”
Dev Pipeline
Conclusions
• Big Data is “catching up”
• Tackle the big three head on:
• BigData, Cloud and DevOps
• Add a big data backend to your cloud
from the start
• Provide Big Data services on your cloud
Still
behind !
Final Thoughts

Who manages my data transfers ?
Event
ApacheCON + CloudStack Collaboration
Conference
Denver April 7-11th.

Cloud and Big Data
Get Involved with Apache
CloudStack
Web: http://cloudstack.apache.org/
Mailing Lists: cloudstack.apache.org/mailing-lists.html
IRC: irc.freenode.net: 6667 #cloudstack #cloudstack-dev
Twitter: @cloudstack
LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859
If it didn’t happen on the mailing list, it didn’t happen.

More Related Content

What's hot

Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Datainside-BigData.com
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineDataWorks Summit
 
Reblaze Case Study on GCP
Reblaze Case Study on GCPReblaze Case Study on GCP
Reblaze Case Study on GCPIdan Tohami
 
SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020Nathan Skousen
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureIBM Analytics
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1DataStax
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemDatabricks
 
Unleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupUnleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupSergio Zenatti Filho
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with GimelAlluxio, Inc.
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleMark Kerzner
 
DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingMark Kerzner
 
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksDatabricks
 

What's hot (20)

Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Data
 
Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data Pipeline
 
Reblaze Case Study on GCP
Reblaze Case Study on GCPReblaze Case Study on GCP
Reblaze Case Study on GCP
 
SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
Scaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark EcosystemScaling Privacy in a Spark Ecosystem
Scaling Privacy in a Spark Ecosystem
 
Unleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupUnleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User Group
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streaming
 
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
 

Viewers also liked

How the IoT market may change our digital life thanks to the Data Tsunami it ...
How the IoT market may change our digital life thanks to the Data Tsunami it ...How the IoT market may change our digital life thanks to the Data Tsunami it ...
How the IoT market may change our digital life thanks to the Data Tsunami it ...Dataconomy Media
 
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013RightScale
 
Cloud and Machine Learning in real world business
Cloud and Machine Learning in real world businessCloud and Machine Learning in real world business
Cloud and Machine Learning in real world businessDae Kim
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computingViet-Trung TRAN
 
DevOps Oxford- DevOps + BigData @ RealTime
DevOps Oxford- DevOps + BigData @ RealTimeDevOps Oxford- DevOps + BigData @ RealTime
DevOps Oxford- DevOps + BigData @ RealTimeAndy Pritchard
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStackNati Shalom
 
Pragmatic approach to Microservice Architecture: Role of Middleware
Pragmatic approach to Microservice Architecture: Role of MiddlewarePragmatic approach to Microservice Architecture: Role of Middleware
Pragmatic approach to Microservice Architecture: Role of MiddlewareAsanka Abeysinghe
 
Tik kelompok 1
Tik kelompok 1Tik kelompok 1
Tik kelompok 1kunsuibasi
 
สรุปการเรียน Meditation พี่หมาน
สรุปการเรียน Meditation พี่หมานสรุปการเรียน Meditation พี่หมาน
สรุปการเรียน Meditation พี่หมานTanaphon Tanasri
 
Ejercicios de sistemas de ecuaciones
Ejercicios de sistemas de ecuacionesEjercicios de sistemas de ecuaciones
Ejercicios de sistemas de ecuacionesvelasquezariana3
 
Endocrine consultant South San Francisco CA
Endocrine consultant South San Francisco CAEndocrine consultant South San Francisco CA
Endocrine consultant South San Francisco CAmedicalcenterfordiabetes
 
Lo sviluppo della relazione e della comunicazione v
Lo sviluppo della relazione e della comunicazione vLo sviluppo della relazione e della comunicazione v
Lo sviluppo della relazione e della comunicazione vimartini
 
Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...
Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...
Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...Prakash Shrivastava
 

Viewers also liked (18)

How the IoT market may change our digital life thanks to the Data Tsunami it ...
How the IoT market may change our digital life thanks to the Data Tsunami it ...How the IoT market may change our digital life thanks to the Data Tsunami it ...
How the IoT market may change our digital life thanks to the Data Tsunami it ...
 
DevOps in the clouds
DevOps in the cloudsDevOps in the clouds
DevOps in the clouds
 
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
 
Cloud and Machine Learning in real world business
Cloud and Machine Learning in real world businessCloud and Machine Learning in real world business
Cloud and Machine Learning in real world business
 
Orange Data Centre and Cloud
Orange Data Centre and CloudOrange Data Centre and Cloud
Orange Data Centre and Cloud
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
DevOps Oxford- DevOps + BigData @ RealTime
DevOps Oxford- DevOps + BigData @ RealTimeDevOps Oxford- DevOps + BigData @ RealTime
DevOps Oxford- DevOps + BigData @ RealTime
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStack
 
Pragmatic approach to Microservice Architecture: Role of Middleware
Pragmatic approach to Microservice Architecture: Role of MiddlewarePragmatic approach to Microservice Architecture: Role of Middleware
Pragmatic approach to Microservice Architecture: Role of Middleware
 
Blog vine b2
Blog vine b2Blog vine b2
Blog vine b2
 
Tik kelompok 1
Tik kelompok 1Tik kelompok 1
Tik kelompok 1
 
สรุปการเรียน Meditation พี่หมาน
สรุปการเรียน Meditation พี่หมานสรุปการเรียน Meditation พี่หมาน
สรุปการเรียน Meditation พี่หมาน
 
Ejercicios de sistemas de ecuaciones
Ejercicios de sistemas de ecuacionesEjercicios de sistemas de ecuaciones
Ejercicios de sistemas de ecuaciones
 
Endocrine consultant South San Francisco CA
Endocrine consultant South San Francisco CAEndocrine consultant South San Francisco CA
Endocrine consultant South San Francisco CA
 
Lo sviluppo della relazione e della comunicazione v
Lo sviluppo della relazione e della comunicazione vLo sviluppo della relazione e della comunicazione v
Lo sviluppo della relazione e della comunicazione v
 
Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...
Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...
Wallpaper Retailers in Delhi, Residential Wallpaper Retailers in Delhi, Desig...
 

Similar to Cloud and Big Data trends

On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...Radhika Puthiyetath
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...PROIDEA
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
 
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
 
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)Amazon Web Services
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database ChoicesLynn Langit
 
PSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL ServerPSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL ServerMark Kromer
 
Big Data Analytics .pptx
Big Data Analytics .pptxBig Data Analytics .pptx
Big Data Analytics .pptxpriti jadhao
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureDenodo
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxVanshGupta597842
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConfQubole
 

Similar to Cloud and Big Data trends (20)

CloudStack and BigData
CloudStack and BigDataCloudStack and BigData
CloudStack and BigData
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
AWS re:Invent 2016: Extending Hadoop and Spark to the AWS Cloud (GPST304)
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database Choices
 
PSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL ServerPSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL Server
 
Big Data Analytics .pptx
Big Data Analytics .pptxBig Data Analytics .pptx
Big Data Analytics .pptx
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 

More from Sebastien Goasguen

Kubernetes Native Serverless solution: Kubeless
Kubernetes Native Serverless solution: KubelessKubernetes Native Serverless solution: Kubeless
Kubernetes Native Serverless solution: KubelessSebastien Goasguen
 
On Docker and its use for LHC at CERN
On Docker and its use for LHC at CERNOn Docker and its use for LHC at CERN
On Docker and its use for LHC at CERNSebastien Goasguen
 
CloudStack Conference Public Clouds Use Cases
CloudStack Conference Public Clouds Use CasesCloudStack Conference Public Clouds Use Cases
CloudStack Conference Public Clouds Use CasesSebastien Goasguen
 
Kubernetes on CloudStack with coreOS
Kubernetes on CloudStack with coreOSKubernetes on CloudStack with coreOS
Kubernetes on CloudStack with coreOSSebastien Goasguen
 
Moving from Publican to Read The Docs
Moving from Publican to Read The DocsMoving from Publican to Read The Docs
Moving from Publican to Read The DocsSebastien Goasguen
 
SDN: Network Agility in the Cloud
SDN: Network Agility in the CloudSDN: Network Agility in the Cloud
SDN: Network Agility in the CloudSebastien Goasguen
 
CloudStack / Saltstack lightning talk at DevOps Amsterdam
CloudStack / Saltstack lightning talk at DevOps AmsterdamCloudStack / Saltstack lightning talk at DevOps Amsterdam
CloudStack / Saltstack lightning talk at DevOps AmsterdamSebastien Goasguen
 
Apache CloudStack Google Summer of Code
Apache CloudStack Google Summer of CodeApache CloudStack Google Summer of Code
Apache CloudStack Google Summer of CodeSebastien Goasguen
 
Intro to CloudStack Build a Cloud Day
Intro to CloudStack Build a Cloud DayIntro to CloudStack Build a Cloud Day
Intro to CloudStack Build a Cloud DaySebastien Goasguen
 

More from Sebastien Goasguen (20)

Kubernetes Sealed secrets
Kubernetes Sealed secretsKubernetes Sealed secrets
Kubernetes Sealed secrets
 
Kubernetes Native Serverless solution: Kubeless
Kubernetes Native Serverless solution: KubelessKubernetes Native Serverless solution: Kubeless
Kubernetes Native Serverless solution: Kubeless
 
Serverless on Kubernetes
Serverless on KubernetesServerless on Kubernetes
Serverless on Kubernetes
 
Kubernetes kubecon-roundup
Kubernetes kubecon-roundupKubernetes kubecon-roundup
Kubernetes kubecon-roundup
 
Docker and CloudStack
Docker and CloudStackDocker and CloudStack
Docker and CloudStack
 
On Docker and its use for LHC at CERN
On Docker and its use for LHC at CERNOn Docker and its use for LHC at CERN
On Docker and its use for LHC at CERN
 
CloudStack Conference Public Clouds Use Cases
CloudStack Conference Public Clouds Use CasesCloudStack Conference Public Clouds Use Cases
CloudStack Conference Public Clouds Use Cases
 
Kubernetes on CloudStack with coreOS
Kubernetes on CloudStack with coreOSKubernetes on CloudStack with coreOS
Kubernetes on CloudStack with coreOS
 
Apache Libcloud
Apache LibcloudApache Libcloud
Apache Libcloud
 
Moving from Publican to Read The Docs
Moving from Publican to Read The DocsMoving from Publican to Read The Docs
Moving from Publican to Read The Docs
 
SDN: Network Agility in the Cloud
SDN: Network Agility in the CloudSDN: Network Agility in the Cloud
SDN: Network Agility in the Cloud
 
Build a Cloud Day Paris
Build a Cloud Day ParisBuild a Cloud Day Paris
Build a Cloud Day Paris
 
CloudStack / Saltstack lightning talk at DevOps Amsterdam
CloudStack / Saltstack lightning talk at DevOps AmsterdamCloudStack / Saltstack lightning talk at DevOps Amsterdam
CloudStack / Saltstack lightning talk at DevOps Amsterdam
 
CloudStack Clients and Tools
CloudStack Clients and ToolsCloudStack Clients and Tools
CloudStack Clients and Tools
 
CloudMonkey
CloudMonkeyCloudMonkey
CloudMonkey
 
Intro to CloudStack API
Intro to CloudStack APIIntro to CloudStack API
Intro to CloudStack API
 
Apache CloudStack Google Summer of Code
Apache CloudStack Google Summer of CodeApache CloudStack Google Summer of Code
Apache CloudStack Google Summer of Code
 
DevCloud and CloudMonkey
DevCloud and CloudMonkeyDevCloud and CloudMonkey
DevCloud and CloudMonkey
 
Git 101 for CloudStack
Git 101 for CloudStackGit 101 for CloudStack
Git 101 for CloudStack
 
Intro to CloudStack Build a Cloud Day
Intro to CloudStack Build a Cloud DayIntro to CloudStack Build a Cloud Day
Intro to CloudStack Build a Cloud Day
 

Recently uploaded

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Recently uploaded (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Cloud and Big Data trends

  • 1. Cloud and Big Data Sebastien Goasguen, January 29th @sebgoa
  • 2. A view on Big Data
  • 4. SKA
  • 5.
  • 6.
  • 7.
  • 8. How did we get there ?
  • 10. New Distributed systems for: Large scale datasets • From scientific instruments • From Web apps logs Complex datasets • Not necessarily large. Object stores • S3 clones
  • 11. BigData and map-reduce • While BigData is often associated with HDFS, Map-Reduce is the algorithm used to parallelize data processing. • BigData ≠ Map-Reduce ≠ HDFS • Map-reduce is a way to express embarrassingly parallel work easily. • You can do Map-Reduce without HDFS. • e.g Basho map-reduce on riackCS
  • 12. A really quick view on Clouds
  • 13.
  • 14.
  • 16. Today
  • 18. History 2003 –Google File System 2005 – Hadoop 2006 – Hadoop enters ASF incubator (Feb) 2006 – S3 launched 2007 – Paper on Amazon Dynamo 2009 – EMR launched 2013 – CloudStack as a ASF TLP (March) 2013 – Spark/Mesos enters ASF incubator
  • 19. The Apache Software Foundation
  • 21. 35 projects in incubation: • • • 12 Hadoop related ~30% Big Data related Spark 117 top level projects: • • • • • ~16 cloud or bigdata +10% Deltacloud, Libcloud, Whirr, jclouds Hadoop, couchdb, cassandra, mesos Bigtop, accumulo, lucene, UIMA CloudStack
  • 22. Hadoop Ecosystem + Up-coming next generation BD systems
  • 23. Big Data and Cloud (Stack)s
  • 24. Clouds and BigData • Object store + compute IaaS to build EC2+S3 clone • BigData solutions as storage backends for image catalogue and large scale instance storage. • BigData solutions as workloads to CloudStack based clouds.
  • 25. EC2, S3 clone • An open source IaaS with an EC2 wrapper e.g Opennebula • Deploy a S3 compatible object store – separately- e.g riakCS • Two independent distributed systems deployed Cloud = EC2 + S3
  • 26. Big Data as IaaS backend “Big Data” solutions can be used as secondary storage .
  • 27. Example • Open source IaaS + EC2 wrapper, e.g CloudStack • Deploy S3 compatible object store, e.g riakCS or Ceph or glusterFS • Use S3 as image store • Your EC2 service is a customer to your S3 service • Logstash + elasticsearch for logs/monitoring
  • 28. Even use Bare Metal
  • 29. Big Data as a Workload to the Cloud
  • 30. Mesos, Spark are EC2 native o ec2_deploy.py o ec2_deploy.sh o…
  • 31. Tools
  • 34. Conclusions • Big Data is “catching up” • Tackle the big three head on: • BigData, Cloud and DevOps • Add a big data backend to your cloud from the start • Provide Big Data services on your cloud
  • 36. Final Thoughts Who manages my data transfers ?
  • 37. Event ApacheCON + CloudStack Collaboration Conference Denver April 7-11th. Cloud and Big Data
  • 38. Get Involved with Apache CloudStack Web: http://cloudstack.apache.org/ Mailing Lists: cloudstack.apache.org/mailing-lists.html IRC: irc.freenode.net: 6667 #cloudstack #cloudstack-dev Twitter: @cloudstack LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859 If it didn’t happen on the mailing list, it didn’t happen.

Editor's Notes

  1. Walmart, 1m customer transactions every hour, db of 2.5 PB in 2010 http://www.economist.com/node/15557443?story_id=15557443
  2. Square Kilometer Array 10-500 TB per second ….1 exabyte per dayFacebook June 2012, 100 PB hadoop cluster, ½ PB per day = 180 PB per year -> ~350 PB now ?CERN ~20 PB EOS
  3. 250k cables war and peace 450k words, 260M worlds in cable gate = 500x war and peace
  4. 200 Million pages, 4 TB