Storing, accessing, and analyzing large amounts of data from diverse sources and making it easily accessible to deliver actionable insights for users can be challenging for data driven organizations. The solution for customers is to optimize scaling and create a unified interface to simplify analysis. Qubole helps customers simplify their big data analytics with speed and scalability, while providing data analysts and scientists self-service access on the AWS Cloud. Join Qubole and AWS to discuss how Auto Scaling and Amazon EC2 Spot pricing can enable customers to efficiently turn data into insights. We'll talk about best practices for migrating from an on-premises Big Data architecture to the AWS Cloud.
Join us to learn:
• Learn how to more easily create elastic Hadoop, Spark, and other Big Data clusters for dynamic, large-scale workloads
• Best practices for Auto Scaling and Amazon EC2 Spot instances for cost optimization of Big Data workloads
• Best practices for deploying or migrating to Big Data on the
AWS Cloud
Who should attend: IT Administrators, IT Architects, Data Warehouse Developers, Database Administrators, Business Analysts and Data Architects
3. Data is growing
of new data will be
created every second
for every human being
on the planet by 2020
http://www.whizpr.be/upload/medialab/21/c
ompany/Media_Presentation_2012_DigiUn
iverseFINAL1.pdf
1.7MB
compound annual
growth rate of 58%
surpassing $1 billion by
2020 forecasted for the
Hadoop market
http://www.ap-institute.com/big-data-
articles/big-data-what-is-hadoop-
%E2%80%93-an-explanation-for-
absolutely-anyone.aspx
http://www.marketanalysis.com/?p=279
58%
of all data is ever
analyzed and used at
the moment
http://www.technologyreview.com/news/51
4346/the-data-made-me-do-it/
0.5%<
4. Big Data is for everyone
The market for Big Data technologies is growing more than six times faster than the
information technology market as a whole….
…and those companies who use their data well win.
5. Why AWS for Big Data?
Immediately
Available
Broad and Deep
Capabilities
Trusted and
Secure
Scalable
6. Collect, Store, Analyze, and Visualize
It’s easy to get data to AWS, store it securely, and analyze it with the engine of your choice,
without any long-term commitment or vendor lock-in
Collect
Import/Export
Snowball
Direct Connect
VM Import/Export
Store
Amazon S3
EMR
Amazon Glacier
Amazon Redshift
DynamoDB
Analyze
Amazon Kinesis
Lambda
EMR
EC2
Aurora
7. AWS provides the most complete
platform for Big Data
What can you do with Big Data on AWS?
Big Data Repositories Clickstream Analysis ETL Offload
Machine Learning Online Ad Serving BI Applications
11. Big Data deployments are difficult
Where Big Data falls short
Rigid and
inflexible
infrastructure
Non adaptive
software
services
Highly
specialized
systems
Difficult to build
and operate
12. Big Data deployments are difficult
Qubole Confidential
Big Data Deployments are difficult
months to implement
6-18
succeed
27%
achieve full-scale
production
13%
cite skills gap as a
major inhibitor
57%
Where Big Data falls short
Source:
https://www.capgemini-consulting.com/resource-file-access/resource/pdf/cracking_the_data_conundrum-big_data_pov_13-1-15_v2.pdf
http://www.gartner.com/newsroom/id/3051717
22. Data Admins - Use Qubole’s built-in
Ganglia Monitoring
23. Scalability on the Cloud
Provisioning, Management, Autoscaling
Qubole Advantage
24. On-premise HDFS cluster
Compute & storage live together
Compute & storage scale together
Provisioned for peak capacity
Cluster must be persistently on
Qubole Confidential
C+S C+S C+SC+S
C+S C+S C+SC+S
C+S C+S C+SC+S
25. C C CC
C C CC
C C CC
Amazon
S3
C C CC
C C CC
C C CC
Compute and storage separated on the Cloud
26. Auto-scales back up
when batch jobs start
Take advantage of the scale of the Cloud
Unlimited compute capacity
3:30 p.m.
Downscaling
7:00 p.m. Min
cluster size
C C CC C C CC
27. Take advantage of the scale of the Cloud
Instance type flexibility
instance types
40+
integration with AWS
reserved instances
different instance types
used
37
28. On-premises to the Cloud
Qubole Confidential
Qubole’s Hadoop Migration Service
29. Migrate workload to the cloud
Any on-premises Hadoop distro Data consistency and unified data visibility between on-premises and cloud
Cloud migration use cases
Pain
Maxed-out on-prem cluster
Requirements
Data in-synch during migration
Decommission on prem workload
24x7 data replication, no data loss
No downtime
30. Cloud migration use cases
Migrate workload to the cloud
Any on-premises Hadoop distro Data consistency and unified data visibility between on-premises and cloud
Pain
Maxed-out on-prem cluster
Requirements
Data in-synch during migration
Decommission on prem workload
24x7 data replication, no data loss
No downtime
Solution
Data Cloud
Apps/data pipelines QDS
31. Cloud migration use cases
Workload burst out to the cloud
Any on-premises Hadoop distro Data consistency and unified data visibility between on-premises and cloud
Pain
Workload spikes Can’t be processed on-prem
Requirements
24x7 data replication, no data loss
No downtime
Bi-directional replication
32. Cloud migration use cases
Workload burst out to the cloud
Any on-premises Hadoop distro Data consistency and unified data visibility between on-premises and cloud
Pain
Workload spikes Can’t be processed on-prem
Requirements
24x7 data replication, no data loss
No downtime
Bi-directional replication
Solution
Sync On-Prem Data Cloud
Results On-Prem
Workloads QDS
33. Cloud migration use cases
Move test/dev environment to the Cloud
Any on-prem Hadoop distro Data consistency and unified data visibility between on-prem and Cloud
Pain
Shared cluster Production
Requirements
Periodic replication No data loss
No downtime
Development Limit
34. Cloud migration use cases
Move test/dev environment to the Cloud
Any on-prem Hadoop distro Data consistency and unified data visibility between on-prem and Cloud
Pain
Shared cluster Production
Requirements
Periodic replication No data loss
No downtime
Solution
Free on-prem resources
Apps/data pipelines QDS
Data subset Cloud
Development Limit
38. Strength in numbers
Qubole Confidential
Each record = financial transaction
Qubole case study:
impression
opportunities a day
180B
peak qps of data/day
(compressed)
3+M 3+TB
40. “We needed something that was reliable
and easy to learn, setup, use and put into
production without the risk and high
expectations that comes with committing
millions of dollars in upfront investment.
Qubole was that thing.”
Marc Rosen Sr. Director, Data Analytics
The solution – Qubole
Qubole case study:
41. Analytics
Spark/Hive
(with Amazon Redshift connector)
Qubole case study:
Qubole at MediaMath
Product
Hive
Engineering
Spark/Hive
Business Analysts
SmartQuery
Data Science
Spark (Scala)
42. Don’t have to worry about this anymore!!!
Qubole Confidential
Qubole case study: