SlideShare a Scribd company logo
1 of 95
Download to read offline
Cloud Big Data
Architectures
Lynn Langit
QCon Sao Paulo, Brazil 2016
About this Workshop
Real-world Cloud Scenarios w/AWS, Azure and GCP
1.  Big Data Solution Types
2.  Data Pipelines
3.  ETL and Visualization
4.  Bonus…(if time allows)
Save
ALL
of your Data
“What is the ACTUAL Cost of
✘  Saving all Data
✘  Using newer technologies
✘  Going beyond Relational
About this Workshop
Real-world Cloud Scenarios w/AWS, Azure and GCP
1.  When to use which type of Big Data Solution
2.  The new world of Data Pipelines
3.  ETL and Visualization Practicalities
4.  Bonus…(if time allows)
1.
Big Data – Yes!
But what kind?
Pattern 1
✘ Which type(s) of Big Data work best?
-- when to use Hadoop
-- when to use NoSQL
and which type, i.e. key-value, document, graph, etc.
-- when to use Big Relational
and what type of workload for hot, warm or cold data
Choice…
is good,
right?
“When do I use…?
✘  Hadoop
✘  NoSQL
✘  Big Relational
Size Matters
One Vendor’s View
I don’t
Want
Text
here
Where is Hadoop Used?
Hadoop is your LAST CHOICE
✘ Volume
✘ 10 TB or greater to start
✘ Growth of 25% YOY
✘ Where FROM
✘ Where TO
✘ Velocity and Variety
✘ Spark over HIVE
✘ Kafka and Samsa
✘ Veracity
✘ Pay, train and hire team
✘ Top $$$ for talent
✘ IF you can find it
✘ WATCH OUT for Cloud
Vendors who promise
‘easy access’
✘ Complexity of ecosystem
✘ Cloudera knows best
“When do I use…?
✘  Hadoop
✘  NoSQL
✘  Big Relational
225NoSQL Database Types to Choose From
Let’s review some NoSQL concepts
Key-Value
Redis, Riak, Aerospike
Graph
Neo4j
Document
MongoDB
Wide-Column
Cassandra, HBase
“
Key Questions - Storage
✘ Volume – how much now, what growth rate?
✘ Variety – what type(s) of data? ‘rectangular’, ‘graph’, ‘k-v’, etc…
✘ Velocity – batches, streams, both, what ingest rate?
✘ Veracity – current state (quality) of data, amount of duplication of
data stores, existence of authoritative (master) data management?
21
✘ Open Source is Free ✘ Not Free
§  Rapid iteration, innovation
§  Can start up for free (on premise)
§  Can ‘rent’ for cheap or free on the cloud
§  Can use with the command line for free
§  Some vendors offer free online training
§  Ex. www.neo4j.org
§  Constant releases
§  Can be deceptively hard to set up (time is
money)
§  Don’t forget to turn it off if on the cloud!
§  GUI tools, support, training cost $$$
§  Ex. www.neo4j.com
NoSQL Example
Practice
Applying Concepts - NoSQL
NoSQL Applied
Log Files
•  ???
Product
Catalogs
•  ???
Social
Games
•  ???
Social
aggregators
•  ???
Line-of-
Business
•  ???
NoSQL Applied
Log Files
•  Columnstore
•  HBase
Product
Catalogs
•  Key/Value
•  Redis
Social
Games
•  Document
•  MongoDB
Social
aggregators
•  Graph
•  Neo4j
Line-of-
Business
•  RDBMS
•  SQL Server
More than NoSQL
NoSQL
✘  Non-relational
✘  Can be optimized in-
memory
✘  Eventually consistent
✘  Schema on Read
✘  Example: Aerospike
NewSQL
✘  Relational plus more
✘  Often in-memory
✘  Some kind of SQL-layer
✘  Schema on Write
✘  Example: MemSQL
U-SQL
✘  What???
✘  Microsoft’s universal SQL
language
✘  Example: Azure Data Lake
Focus
How Best to Store your Data?
Complexity Scalability
Developer
Cost
RDBMS easy medium low
NoSQL medium big high
Hadoop hard huge very high
Real World Big Data -- When do I use what?
RDBMS
65%
NoSQL
30%
Hadoop
5%
“Do the Cloud Vendors
Understand
Big Data Realities?
Cloud Big Data Vendors - Storage
AWS
✘  5-10X market share of next
competitor
✘  Most complete offering
✘  Most mature offering
✘  Notable: Big Relational
GCP
✘  Lean, mean and cheap
✘  Fastest player
✘  Requires top developers
✘  Notable: Query as a
Service
Azure
✘  Catching up
✘  Best tooling integration
✘  Notable: On-premise
integration
Place your screenshot here
AWS Console
17 Data services
Place your screenshot here
GCP Console
8 Data Services
Place your screenshot here
Azure Console
15 Data Services
Cloud Offerings – Big Data
AWS Google Microsoft
Managed RDBMS RDS Aurora Cloud SQL Azure SQL
Data Warehouse Redshift BigQuery Azure SQL Data
Warehouse
NoSQL buckets S3
Glacier
Cloud Storage
Nearline
Azure Blobs
StorSimple
NoSQL Key-Value
NoSQL Wide Column
DynamoDB Big Table
Cloud Datastore
Azure Tables
NoSQL Document
NoSQL Graph
MongoDB on EC2
Neo4j on EC2
MongoDB on GCE
Neo4j on GCE
DocumentDB
Neo4j on Azure
Hadoop Elastic MapReduce DataProc Data Lake
HDInsight
Practice
Applying Concepts – Real Cost of Storage Types
Cloud NoSQL Applied – AWS
Log Files
Product
Catalogs
Social
Games
Social
aggregators
Line-of-
Business
Cloud NoSQL Applied – AWS
Log Files
•  Stream or
Hadoop
•  Kinesis or EMR
Product
Catalogs
•  Key/Value
•  DynamoDB
Social
Games
•  Document
•  MongoDB
Social
aggregators
•  Graph
•  Neo4j
Line-of-
Business
•  RDBMS
•  RDS
???The fastest growing cloud-based Big Data products are…
RelationalThe fastest growing cloud-based Big Data products are…
“When do I use…?
✘  Hadoop
✘  NoSQL
✘  Big Relational
Practice
Applying Concepts – Real Cost of Storage Types
Reasons to use Big Relational Cloud Services
Developers DevOps Cloud Vendors – AWS
Developers DevOps Cloud Vendors – GCP
Reasons to use Big Relational Cloud Services
Developers
Most know RDBMS query patterns
Many know basic administration
DevOps
Most know RDBMS administration
Many know basic RDBMS queries
Many know query optimization
Cloud Vendors - AWS
Aurora – RDBMS up to 64 TB
Redshift - $ 1k USD / 1 TB / year
Rich partner ecosystem – ETL
Integration with AWS products
Developers
Most know coding language
patterns to interact with RDBMS
systems
DevOps
Familiar RDBMS security patterns
Familiar auditing
Partner tooling integration
Cloud Vendors - GCP
Big Query – familiar SQL queries
No hassle streaming ingest
No hassle pay-as-you-go
Zero administration
My top Big Data Cloud Services
ETL is 75% of all Big Data Projects
Surveying, cleaning and loading
data is the majority of the billable
time for new Big Data projects.
About this Workshop
Real-world Cloud Scenarios w/AWS, Azure and GCP
1.  When to use which type of Big Data Solution
2.  The new world of Data Pipelines
3.  ETL and Visualization Practicalities
4.  Bonus…(if time allows)
2.
Data Pipelines
Build vs. Buy
Pattern 2
✘ How to build optimized cloud-based data pipelines?
-- Cloud-based ETL tools and processes
-- includes load-testing patterns and security practices
-- including connecting between different vendor clouds
Key Questions – Ingestion and ETL
✘ Volume – how much and how fast, now and future?
✘ Variety – what type(s) or data, any pre-processing needed?
✘ Velocity – batches or steaming?
✘ Veracity – verification on ingest needed? new data needed?
Together
How does your data pipeline flow?
“Considering…
✘  Initial Load/Transform
✘  Data Quality
✘  Batch vs. Stream
Pipeline Phases
Phase 0
Eval Current Data - Quality & Quantity
Phase 1
Get New Data - Free or Premium
Phase 2
Build MVP & Forecast volume and growth
Phase 3
Load test at scale
Phase 4
Deploy – secure, audit and monitor
Cloud Big Data Vendors - ETL
AWS
✘  5X market share of next
competitor
✘  Notable: Many, strong ETL
Partners
GCP
✘  Lean, mean and cheap
✘  Fastest player
✘  Notable: DataFlow requires
Java or Python developers
Azure
✘  Difficulty with scale
✘  Best tooling integration
✘  Notable: Nothing
How Best to Ingest and ETL your Data?
Complexity Scalability
Developer
Cost
RDBMS medium medium low
NoSQL medium big high
Hadoop hard huge very high
“Considering…
✘  Initial Load/Transform
✘  Data Quality
✘  Batch vs. Stream
Building a Streaming Pipeline
Stream Interval Window
“Near Real-time Streams
Load Test All The Things
Key Questions - Streaming
✘ Volume – how much data now and predicted over next 12 months?
✘ Variety – what types of data now and future?
✘ Velocity – volume of input data / time now and near future?
✘ Veracity – volume of EXISTING data now
Cloud Big Data Vendors - Streaming
AWS
✘  5X market share of next
competitor
✘  Most complete offering
✘  Most mature offering
✘  Notable: Kinesis Firehose
GCP
✘  Lean, mean and cheap
✘  Fastest player
✘  Requires top developers
✘  Notable: DataFlow flexible
Azure
✘  Catching up
✘  Best tooling integration
✘  Notable: Stream Analytics
integration with other
products
Place your screenshot here
AWS Console
17 Data services
Place your screenshot here
GCP Console
8 Data Services
Place your screenshot here
Azure Console
15 Data Services
Cloud Offerings – Data and Pipelines
AWS Google Microsoft
Managed RDBMS RDS Aurora Cloud SQL Azure SQL
Data Warehouse Redshift BigQuery Azure SQL Data Warehouse
NoSQL buckets S3
Glacier
Cloud Storage
Nearline
Azure Blobs
StorSimple
NoSQL Key-Value
NoSQL Wide Column
DynamoDB Big Table
Cloud Datastore
Azure Tables
Streaming or ML Kinesis
AWS Machine Learning
DataFlow
Google Machine Learning
StreamInsight
Azure ML
NoSQL Document
NoSQL Graph
MongoDB on EC2
Neo4j on EC2
MongoDB on GCE
Neo4j on GCE
DocumentDB
Neo4j on Azure
Hadoop Elastic MapReduce DataProc Data Lake
HDInsight
Cloud ETL Data Pipelines DataFlow Azure Data Pipeline
How Best to Stream your Data?
Complexity Scalability
Developer
Cost
Batches easy medium low
Windows difficult big high
Real-time very difficult huge high
Practice
Applying Concepts
Designing Cloud Data Pipelines
Log Files
Product
Catalogs
Social
Games
Social
aggregators
Line-of-
Business
About this Workshop
Real-world Cloud Scenarios w/AWS, Azure and GCP
1.  When to use which type of Big Data Solution
2.  The new world of Data Pipelines
3.  ETL and Visualization Practicalities
4.  Bonus…(if time allows)
3.
Making Sense of Data
Analytics and Presentation
Pattern 3
✘ How best to Query and Visualize
-- When to use business analytics vs. predictive analytics (machine
learning)
-- how best to present data to clients - partner visualization products or
roll your own
Making Sense of Data
Machine
Learning
Reports Presentation
Key Questions - Query
✘ Volume
✘ Variety
✘ Velocity
✘ Veracity
Graphs
What is nature of your questions?
Cloud Big Data Vendors - Query
AWS
✘  5X market share of next
competitor
✘  Most complete offering
✘  Most mature offering
✘  Notable: Big Relational
GCP
✘  Lean, mean and cheap
✘  Fastest player
✘  Notable: Flexible, powerful
machine learning
Azure
✘  WATCH OUT – Cost!
✘  Notable: Developer Tooling
Query Languages
SQL
Everyone knows it
But how well do they know it?
NoSQL Vendor Language
Too many to list
How will you learn it?
Cypher
Query language for graph
databases
The future?
ORM
Good, bad or horrible?
Again, how well do they know it?
HIVE
Shown in too many vendor demos
Really hard to make performant
Machine Learning Queries
SciPy, NumPy or Python
R Language
Julie Language
Many more…
Practice
Applying Concepts – Understanding D3
How Best to Query your Data?
Business
Analytics
Predictive
Analytics
Developer
Cost
RDBMS
NoSQL
Hadoop
How Best to Query your Data?
Business
Analytics
Predictive
Analytics
Developer
Cost
RDBMS easy medium low
NoSQL hard very hard very high
Hadoop hard hard very high
Machine Learning aka Predictive Analytics
AWS
ML for developers
GUI-based
GCP
3 Flavors of ML
Python-based languages
Azure
ML for Data Scientists
R Language
Presentation
If you can’t see it, it’s not worth it.
Dashboards
✘  More than KPIs
✘  Mobile
✘  Alerts
✘  Data Stories
Innovation in Data Visualization
Reports
✘  Level of Detail
✘  Meaningful Taxonomies
✘  Fast enough
✘  Drill for Data
D3
The language of Data Visualization
Cloud Big Data Vendors - Visualization
AWS
✘  Most complete offering
✘  Notable: Partners &
QuickSight
GCP
✘  Big Query Partners
✘  Notable: New Dashboards
Azure
✘  Integrated
✘  Notable: PowerBI
About this Workshop
Real-world Cloud Scenarios w/AWS, Azure and GCP
1.  When to use which type of Big Data Solution
2.  The new world of Data Pipelines
3.  ETL and Visualization Practicalities
4.  Bonus…(if time allows)
4.
About IoT
It’s happening now
Place your screenshot
here
Data Generation Device
IoT
is
Big Data
Realized
235,000,000,000 $The IoT Market
2017By the year
20 Billion devicesAnd a lot of users
IoT all the Things
Cloud Big Data Vendors - IoT
AWS
✘  First to market
✘  Most complete offering
✘  Most mature offering
✘  Notable: AWS IoT Rules
GCP
✘  Still in Beta
✘  Fastest player
✘  Requires top developers
✘  Notable: Weave
Azure
✘  Catching up
✘  Best tooling integration
✘  Notable: Device Mgmt.
Save
ALL
of your Data
The Next Generation…
‘brigada!
Any questions?
You can find me at
@lynnlangit

More Related Content

What's hot

Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeTorsten Steinbach
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks
 
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...Databricks
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Spark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksSpark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksDustin Vannoy
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story Roman Chukh
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Guido Schmutz
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloudJeff Hung
 

What's hot (20)

Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Spark Streaming with Azure Databricks
Spark Streaming with Azure DatabricksSpark Streaming with Azure Databricks
Spark Streaming with Azure Databricks
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
[DataCon.TW 2017] Data Lake: centralize in on-prem vs. decentralize on cloud
 

Viewers also liked

Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computingViet-Trung TRAN
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformSQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformLynn Langit
 
Azure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowAzure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowRightScale
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
 
Compare Clouds: Aws vs Azure vs Google vs SoftLayer
Compare Clouds: Aws vs Azure vs Google vs SoftLayerCompare Clouds: Aws vs Azure vs Google vs SoftLayer
Compare Clouds: Aws vs Azure vs Google vs SoftLayerRightScale
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond RelationalLynn Langit
 
Using Premium Data - for Business Analysts
Using Premium Data - for Business AnalystsUsing Premium Data - for Business Analysts
Using Premium Data - for Business AnalystsLynn Langit
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataData Con LA
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
 
How the IoT market may change our digital life thanks to the Data Tsunami it ...
How the IoT market may change our digital life thanks to the Data Tsunami it ...How the IoT market may change our digital life thanks to the Data Tsunami it ...
How the IoT market may change our digital life thanks to the Data Tsunami it ...Dataconomy Media
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation BriefBoni Bruno
 
Aws vs. azure key parameters for decision making
Aws vs. azure   key parameters for decision makingAws vs. azure   key parameters for decision making
Aws vs. azure key parameters for decision makingAspire Systems
 
A tale of two clouds
A tale of two cloudsA tale of two clouds
A tale of two cloudsAndrew Siemer
 
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013RightScale
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
Azure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectAzure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectEastBanc Tachnologies
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the CloudDATAVERSITY
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015Krishna-Kumar
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData, Inc.
 
AWS vs AZURE : Public Cloud Comparison
AWS vs AZURE : Public Cloud ComparisonAWS vs AZURE : Public Cloud Comparison
AWS vs AZURE : Public Cloud ComparisonInApp
 

Viewers also liked (20)

Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformSQL Server on Google Cloud Platform
SQL Server on Google Cloud Platform
 
Azure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to KnowAzure vs AWS Best Practices: What You Need to Know
Azure vs AWS Best Practices: What You Need to Know
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 
Compare Clouds: Aws vs Azure vs Google vs SoftLayer
Compare Clouds: Aws vs Azure vs Google vs SoftLayerCompare Clouds: Aws vs Azure vs Google vs SoftLayer
Compare Clouds: Aws vs Azure vs Google vs SoftLayer
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
Using Premium Data - for Business Analysts
Using Premium Data - for Business AnalystsUsing Premium Data - for Business Analysts
Using Premium Data - for Business Analysts
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueData
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with Isilon
 
How the IoT market may change our digital life thanks to the Data Tsunami it ...
How the IoT market may change our digital life thanks to the Data Tsunami it ...How the IoT market may change our digital life thanks to the Data Tsunami it ...
How the IoT market may change our digital life thanks to the Data Tsunami it ...
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
Aws vs. azure key parameters for decision making
Aws vs. azure   key parameters for decision makingAws vs. azure   key parameters for decision making
Aws vs. azure key parameters for decision making
 
A tale of two clouds
A tale of two cloudsA tale of two clouds
A tale of two clouds
 
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
Tips For a Successful Cloud Proof-of-Concept - RightScale Compute 2013
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Azure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your projectAzure and/or AWS: How to Choose the best cloud platform for your project
Azure and/or AWS: How to Choose the best cloud platform for your project
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the Cloud
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
AWS vs AZURE : Public Cloud Comparison
AWS vs AZURE : Public Cloud ComparisonAWS vs AZURE : Public Cloud Comparison
AWS vs AZURE : Public Cloud Comparison
 

Similar to Cloud Big Data Architectures

AWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesAWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesTobyWilman
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Database Choices
Database ChoicesDatabase Choices
Database ChoicesLynn Langit
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaHelen Rogers
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options ComparedSergey Bushik
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxthando80
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GooglePatrick Pierson
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPDaniel Zivkovic
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database ChoicesLynn Langit
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 

Similar to Cloud Big Data Architectures (20)

AWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesAWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - Slides
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Database Choices
Database ChoicesDatabase Choices
Database Choices
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database Choices
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 

More from Lynn Langit

VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWSLynn Langit
 
Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless ArchitecturesLynn Langit
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids ProgrammingLynn Langit
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on DockerLynn Langit
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina LanguageLynn Langit
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsLynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesLynn Langit
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data PipelinesLynn Langit
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids ProgrammingLynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless RealityLynn Langit
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsLynn Langit
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSLynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless RealityLynn Langit
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for BioinformaticsLynn Langit
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsLynn Langit
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformLynn Langit
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL ServerLynn Langit
 

More from Lynn Langit (20)

VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
 
Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless Architectures
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on Docker
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina Language
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa Skills
 
Practical cloud
Practical cloudPractical cloud
Practical cloud
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examples
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids Programming
 
Practical Cloud
Practical CloudPractical Cloud
Practical Cloud
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWS
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 

Recently uploaded

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Recently uploaded (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Cloud Big Data Architectures

  • 1. Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016
  • 2. About this Workshop Real-world Cloud Scenarios w/AWS, Azure and GCP 1.  Big Data Solution Types 2.  Data Pipelines 3.  ETL and Visualization 4.  Bonus…(if time allows)
  • 4. “What is the ACTUAL Cost of ✘  Saving all Data ✘  Using newer technologies ✘  Going beyond Relational
  • 5. About this Workshop Real-world Cloud Scenarios w/AWS, Azure and GCP 1.  When to use which type of Big Data Solution 2.  The new world of Data Pipelines 3.  ETL and Visualization Practicalities 4.  Bonus…(if time allows)
  • 6. 1. Big Data – Yes! But what kind?
  • 7. Pattern 1 ✘ Which type(s) of Big Data work best? -- when to use Hadoop -- when to use NoSQL and which type, i.e. key-value, document, graph, etc. -- when to use Big Relational and what type of workload for hot, warm or cold data
  • 9. “When do I use…? ✘  Hadoop ✘  NoSQL ✘  Big Relational
  • 11.
  • 12. One Vendor’s View I don’t Want Text here
  • 13.
  • 15. Hadoop is your LAST CHOICE ✘ Volume ✘ 10 TB or greater to start ✘ Growth of 25% YOY ✘ Where FROM ✘ Where TO ✘ Velocity and Variety ✘ Spark over HIVE ✘ Kafka and Samsa ✘ Veracity ✘ Pay, train and hire team ✘ Top $$$ for talent ✘ IF you can find it ✘ WATCH OUT for Cloud Vendors who promise ‘easy access’ ✘ Complexity of ecosystem ✘ Cloudera knows best
  • 16. “When do I use…? ✘  Hadoop ✘  NoSQL ✘  Big Relational
  • 17. 225NoSQL Database Types to Choose From
  • 18. Let’s review some NoSQL concepts Key-Value Redis, Riak, Aerospike Graph Neo4j Document MongoDB Wide-Column Cassandra, HBase
  • 19.
  • 20. Key Questions - Storage ✘ Volume – how much now, what growth rate? ✘ Variety – what type(s) of data? ‘rectangular’, ‘graph’, ‘k-v’, etc… ✘ Velocity – batches, streams, both, what ingest rate? ✘ Veracity – current state (quality) of data, amount of duplication of data stores, existence of authoritative (master) data management?
  • 21. 21 ✘ Open Source is Free ✘ Not Free §  Rapid iteration, innovation §  Can start up for free (on premise) §  Can ‘rent’ for cheap or free on the cloud §  Can use with the command line for free §  Some vendors offer free online training §  Ex. www.neo4j.org §  Constant releases §  Can be deceptively hard to set up (time is money) §  Don’t forget to turn it off if on the cloud! §  GUI tools, support, training cost $$$ §  Ex. www.neo4j.com NoSQL Example
  • 23. NoSQL Applied Log Files •  ??? Product Catalogs •  ??? Social Games •  ??? Social aggregators •  ??? Line-of- Business •  ???
  • 24. NoSQL Applied Log Files •  Columnstore •  HBase Product Catalogs •  Key/Value •  Redis Social Games •  Document •  MongoDB Social aggregators •  Graph •  Neo4j Line-of- Business •  RDBMS •  SQL Server
  • 25. More than NoSQL NoSQL ✘  Non-relational ✘  Can be optimized in- memory ✘  Eventually consistent ✘  Schema on Read ✘  Example: Aerospike NewSQL ✘  Relational plus more ✘  Often in-memory ✘  Some kind of SQL-layer ✘  Schema on Write ✘  Example: MemSQL U-SQL ✘  What??? ✘  Microsoft’s universal SQL language ✘  Example: Azure Data Lake
  • 26. Focus
  • 27. How Best to Store your Data? Complexity Scalability Developer Cost RDBMS easy medium low NoSQL medium big high Hadoop hard huge very high
  • 28. Real World Big Data -- When do I use what? RDBMS 65% NoSQL 30% Hadoop 5%
  • 29. “Do the Cloud Vendors Understand Big Data Realities?
  • 30. Cloud Big Data Vendors - Storage AWS ✘  5-10X market share of next competitor ✘  Most complete offering ✘  Most mature offering ✘  Notable: Big Relational GCP ✘  Lean, mean and cheap ✘  Fastest player ✘  Requires top developers ✘  Notable: Query as a Service Azure ✘  Catching up ✘  Best tooling integration ✘  Notable: On-premise integration
  • 31. Place your screenshot here AWS Console 17 Data services
  • 32. Place your screenshot here GCP Console 8 Data Services
  • 33. Place your screenshot here Azure Console 15 Data Services
  • 34. Cloud Offerings – Big Data AWS Google Microsoft Managed RDBMS RDS Aurora Cloud SQL Azure SQL Data Warehouse Redshift BigQuery Azure SQL Data Warehouse NoSQL buckets S3 Glacier Cloud Storage Nearline Azure Blobs StorSimple NoSQL Key-Value NoSQL Wide Column DynamoDB Big Table Cloud Datastore Azure Tables NoSQL Document NoSQL Graph MongoDB on EC2 Neo4j on EC2 MongoDB on GCE Neo4j on GCE DocumentDB Neo4j on Azure Hadoop Elastic MapReduce DataProc Data Lake HDInsight
  • 35. Practice Applying Concepts – Real Cost of Storage Types
  • 36. Cloud NoSQL Applied – AWS Log Files Product Catalogs Social Games Social aggregators Line-of- Business
  • 37. Cloud NoSQL Applied – AWS Log Files •  Stream or Hadoop •  Kinesis or EMR Product Catalogs •  Key/Value •  DynamoDB Social Games •  Document •  MongoDB Social aggregators •  Graph •  Neo4j Line-of- Business •  RDBMS •  RDS
  • 38. ???The fastest growing cloud-based Big Data products are…
  • 39. RelationalThe fastest growing cloud-based Big Data products are…
  • 40. “When do I use…? ✘  Hadoop ✘  NoSQL ✘  Big Relational
  • 41. Practice Applying Concepts – Real Cost of Storage Types
  • 42. Reasons to use Big Relational Cloud Services Developers DevOps Cloud Vendors – AWS Developers DevOps Cloud Vendors – GCP
  • 43. Reasons to use Big Relational Cloud Services Developers Most know RDBMS query patterns Many know basic administration DevOps Most know RDBMS administration Many know basic RDBMS queries Many know query optimization Cloud Vendors - AWS Aurora – RDBMS up to 64 TB Redshift - $ 1k USD / 1 TB / year Rich partner ecosystem – ETL Integration with AWS products Developers Most know coding language patterns to interact with RDBMS systems DevOps Familiar RDBMS security patterns Familiar auditing Partner tooling integration Cloud Vendors - GCP Big Query – familiar SQL queries No hassle streaming ingest No hassle pay-as-you-go Zero administration
  • 44. My top Big Data Cloud Services
  • 45. ETL is 75% of all Big Data Projects Surveying, cleaning and loading data is the majority of the billable time for new Big Data projects.
  • 46. About this Workshop Real-world Cloud Scenarios w/AWS, Azure and GCP 1.  When to use which type of Big Data Solution 2.  The new world of Data Pipelines 3.  ETL and Visualization Practicalities 4.  Bonus…(if time allows)
  • 48. Pattern 2 ✘ How to build optimized cloud-based data pipelines? -- Cloud-based ETL tools and processes -- includes load-testing patterns and security practices -- including connecting between different vendor clouds
  • 49. Key Questions – Ingestion and ETL ✘ Volume – how much and how fast, now and future? ✘ Variety – what type(s) or data, any pre-processing needed? ✘ Velocity – batches or steaming? ✘ Veracity – verification on ingest needed? new data needed?
  • 50. Together How does your data pipeline flow?
  • 51. “Considering… ✘  Initial Load/Transform ✘  Data Quality ✘  Batch vs. Stream
  • 52. Pipeline Phases Phase 0 Eval Current Data - Quality & Quantity Phase 1 Get New Data - Free or Premium Phase 2 Build MVP & Forecast volume and growth Phase 3 Load test at scale Phase 4 Deploy – secure, audit and monitor
  • 53. Cloud Big Data Vendors - ETL AWS ✘  5X market share of next competitor ✘  Notable: Many, strong ETL Partners GCP ✘  Lean, mean and cheap ✘  Fastest player ✘  Notable: DataFlow requires Java or Python developers Azure ✘  Difficulty with scale ✘  Best tooling integration ✘  Notable: Nothing
  • 54. How Best to Ingest and ETL your Data? Complexity Scalability Developer Cost RDBMS medium medium low NoSQL medium big high Hadoop hard huge very high
  • 55. “Considering… ✘  Initial Load/Transform ✘  Data Quality ✘  Batch vs. Stream
  • 56. Building a Streaming Pipeline Stream Interval Window
  • 57.
  • 58. “Near Real-time Streams Load Test All The Things
  • 59. Key Questions - Streaming ✘ Volume – how much data now and predicted over next 12 months? ✘ Variety – what types of data now and future? ✘ Velocity – volume of input data / time now and near future? ✘ Veracity – volume of EXISTING data now
  • 60. Cloud Big Data Vendors - Streaming AWS ✘  5X market share of next competitor ✘  Most complete offering ✘  Most mature offering ✘  Notable: Kinesis Firehose GCP ✘  Lean, mean and cheap ✘  Fastest player ✘  Requires top developers ✘  Notable: DataFlow flexible Azure ✘  Catching up ✘  Best tooling integration ✘  Notable: Stream Analytics integration with other products
  • 61. Place your screenshot here AWS Console 17 Data services
  • 62. Place your screenshot here GCP Console 8 Data Services
  • 63. Place your screenshot here Azure Console 15 Data Services
  • 64. Cloud Offerings – Data and Pipelines AWS Google Microsoft Managed RDBMS RDS Aurora Cloud SQL Azure SQL Data Warehouse Redshift BigQuery Azure SQL Data Warehouse NoSQL buckets S3 Glacier Cloud Storage Nearline Azure Blobs StorSimple NoSQL Key-Value NoSQL Wide Column DynamoDB Big Table Cloud Datastore Azure Tables Streaming or ML Kinesis AWS Machine Learning DataFlow Google Machine Learning StreamInsight Azure ML NoSQL Document NoSQL Graph MongoDB on EC2 Neo4j on EC2 MongoDB on GCE Neo4j on GCE DocumentDB Neo4j on Azure Hadoop Elastic MapReduce DataProc Data Lake HDInsight Cloud ETL Data Pipelines DataFlow Azure Data Pipeline
  • 65. How Best to Stream your Data? Complexity Scalability Developer Cost Batches easy medium low Windows difficult big high Real-time very difficult huge high
  • 67. Designing Cloud Data Pipelines Log Files Product Catalogs Social Games Social aggregators Line-of- Business
  • 68. About this Workshop Real-world Cloud Scenarios w/AWS, Azure and GCP 1.  When to use which type of Big Data Solution 2.  The new world of Data Pipelines 3.  ETL and Visualization Practicalities 4.  Bonus…(if time allows)
  • 69. 3. Making Sense of Data Analytics and Presentation
  • 70. Pattern 3 ✘ How best to Query and Visualize -- When to use business analytics vs. predictive analytics (machine learning) -- how best to present data to clients - partner visualization products or roll your own
  • 71. Making Sense of Data Machine Learning Reports Presentation
  • 72. Key Questions - Query ✘ Volume ✘ Variety ✘ Velocity ✘ Veracity
  • 73. Graphs What is nature of your questions?
  • 74.
  • 75. Cloud Big Data Vendors - Query AWS ✘  5X market share of next competitor ✘  Most complete offering ✘  Most mature offering ✘  Notable: Big Relational GCP ✘  Lean, mean and cheap ✘  Fastest player ✘  Notable: Flexible, powerful machine learning Azure ✘  WATCH OUT – Cost! ✘  Notable: Developer Tooling
  • 76. Query Languages SQL Everyone knows it But how well do they know it? NoSQL Vendor Language Too many to list How will you learn it? Cypher Query language for graph databases The future? ORM Good, bad or horrible? Again, how well do they know it? HIVE Shown in too many vendor demos Really hard to make performant Machine Learning Queries SciPy, NumPy or Python R Language Julie Language Many more…
  • 77. Practice Applying Concepts – Understanding D3
  • 78. How Best to Query your Data? Business Analytics Predictive Analytics Developer Cost RDBMS NoSQL Hadoop
  • 79. How Best to Query your Data? Business Analytics Predictive Analytics Developer Cost RDBMS easy medium low NoSQL hard very hard very high Hadoop hard hard very high
  • 80. Machine Learning aka Predictive Analytics AWS ML for developers GUI-based GCP 3 Flavors of ML Python-based languages Azure ML for Data Scientists R Language
  • 81. Presentation If you can’t see it, it’s not worth it.
  • 82. Dashboards ✘  More than KPIs ✘  Mobile ✘  Alerts ✘  Data Stories Innovation in Data Visualization Reports ✘  Level of Detail ✘  Meaningful Taxonomies ✘  Fast enough ✘  Drill for Data
  • 83. D3 The language of Data Visualization
  • 84.
  • 85. Cloud Big Data Vendors - Visualization AWS ✘  Most complete offering ✘  Notable: Partners & QuickSight GCP ✘  Big Query Partners ✘  Notable: New Dashboards Azure ✘  Integrated ✘  Notable: PowerBI
  • 86. About this Workshop Real-world Cloud Scenarios w/AWS, Azure and GCP 1.  When to use which type of Big Data Solution 2.  The new world of Data Pipelines 3.  ETL and Visualization Practicalities 4.  Bonus…(if time allows)
  • 88. Place your screenshot here Data Generation Device
  • 90. 235,000,000,000 $The IoT Market 2017By the year 20 Billion devicesAnd a lot of users
  • 91. IoT all the Things
  • 92. Cloud Big Data Vendors - IoT AWS ✘  First to market ✘  Most complete offering ✘  Most mature offering ✘  Notable: AWS IoT Rules GCP ✘  Still in Beta ✘  Fastest player ✘  Requires top developers ✘  Notable: Weave Azure ✘  Catching up ✘  Best tooling integration ✘  Notable: Device Mgmt.
  • 95. ‘brigada! Any questions? You can find me at @lynnlangit