SlideShare a Scribd company logo
1 of 73
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Vladimir Simek, Solutions Architect @ AWS
22/03/2016
Amazon Elastic MapReduce
How to run your Hadoop Cluster in 10 minutes
Agenda
• Two different companies – 2 stories
• Challenges with Big Data on premises
• Technical introduction to Amazon EMR
• Amazon EMR features and benefits
• Use case of AOL – moving 2 PB on-prem Hadoop
cluster to the AWS cloud
• Short demos
In the beginning – 2 different
stories
• In 2007 New York Times has decided create a digital
archive on the web – all articles from 1851-1922
• 11 million articles (4 TB of data) composed of:
• 405,000 large TIFF images
• 405,000 XML files
• 3.3 million SGML files
• Used Amazon EC2 and Hadoop to process the data
Time to process?
Less than 24 hours
Costs?
About $240
(Undisclosed international company) –
subsidiary in France
• In 2014 - has decided to run a POC on Big Data
analytics
• What was the 1st step they did?
Invested €7M into server purchase
“Want to increase innovation?
Lower the cost of failure.”
Joi Ito, Director of MIT Media Lab
How many big ticket
technology ideas can
your budget tolerate?
(Big) Data for Competitive Advantage
Customer segmentation
Marketing spend optimization
Financial modeling & forecasting
Ad targeting & real-time bidding
Clickstream analysis
Fraud detection
Security threat detection
Challenges with In-House Infrastructure
Fixed Cost
Slow Deployment
Cycle
Always On Self Serve
Static : Not Scalable Outages Impact Production Upgrade
Storage Compute
What is Amazon EMR and how
it addresses such issues?
Amazon EMR
• Managed platform
• MapReduce, Apache Spark, Presto
• Launch a cluster in minutes
• Open source distribution and MapR
distribution
• Leverage the elasticity of the cloud
• Baked in security features
• Pay by the hour and save with Spot
• Flexibility to customize
Make it easy, secure, and
cost-effective to run
data-processing frameworks
on the AWS cloud
What Do I Need to Build a Cluster ?
1. Choose instances
2. Choose your software
3. Choose your access method
Choice of Multiple Instances
CPU
c3 family
cc1.4xlarge
cc2.8xlarge
Memory
m2 family
r3 family
Disk/IO
d2 family
i2 family
General
m1 family
m3 family
Machine
Learning
Batch
Processing
In-memory
(Spark &
Presto)
Large HDFS
Select an Instance
Choose Your Software (Quick Bundles)
Choose Your Software – Custom
Hadoop Applications Available in Amazon EMR
Choose Security and Access Control
You Are Up and Running!
You Are Up and Running!
Master Node DNS
You Are Up and Running!
Information about the software you are
running, logs and features
You Are Up and Running!
Infrastructure for this cluster
You Are Up and Running!
Security Groups and Roles
Use the CLI
aws emr create-cluster
--release-label emr-4.0.0
--instance-groups
InstanceGroupType=MASTER,InstanceCount=1, InstanceType=m3.xlarge
InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge
Or use your favorite SDK
Demo – Build EMR cluster
Now that I have a cluster, I need to process
some data
Amazon EMR can process data from multiple sources
Hadoop Distributed File
System (HDFS)
Amazon S3 (EMRFS)
Amazon DynamoDB
Amazon Kinesis
Amazon EMR can process data from multiple sources
Hadoop Distributed File
System (HDFS)
Amazon S3 (EMRFS)
Amazon DynamoDB
Amazon Kinesis
On an On-premises Environment
Tightly coupled
Compute and Storage Grow Together
Tightly coupled
Storage grows along with
compute
Compute requirements vary
Underutilized or Scarce Resources
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Re-processingWeekly peaks
Steady state
Underutilized or Scarce Resources
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Underutilized capacity
Provisioned capacity
Contention for Same Resources
Compute
bound
Memory
bound
Separation of Resources Creates Data Silos
Team A
Replication Adds to Cost
3x
Single datacenter
So how does Amazon EMR solve these problems?
Decouple Storage and Compute
Amazon S3 is Your Persistent Data Store
Designed for 11 9’s durability
$0.03 / GB / month in Ireland
Lifecycle policies
Versioning
Distributed by default
EMRFSAmazon S3
The Amazon EMR File System (EMRFS)
• Allows you to leverage Amazon S3 as a file-system
• Streams data directly from Amazon S3
• Uses HDFS for intermediates
• Better read/write performance and error handling than
open source components
• Consistent view – consistency for read after write
• Support for encryption
• Fast listing of objects
Going from HDFS to Amazon S3
CREATE EXTERNAL TABLE serde_regex(
host STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
)
LOCATION ‘samples/pig-apache/input/'
Going from HDFS to Amazon S3
CREATE EXTERNAL TABLE serde_regex(
host STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
)
LOCATION
's3://elasticmapreduce.samples/pig-
apache/input/'
Benefit 1: Switch Off Clusters
Amazon S3Amazon S3 Amazon S3
Auto-Terminate Clusters
You Can Build a Pipeline
Run Transient or Long-Running Clusters
Benefit 2: Resize Your Cluster
Resize the Cluster
Scale Up, Scale Down, Stop a resize,
issue a resize on another
How do you scale up and save cost ?
Spot Instance
Bid
Price
OD
Price
Spot Integration
aws emr create-cluster --name "Spot cluster" --ami-version 3.3
InstanceGroupType=MASTER,
InstanceType=m3.xlarge,InstanceCount=1,
InstanceGroupType=CORE,
BidPrice=0.03,InstanceType=m3.xlarge,InstanceCount=2
InstanceGroupType=TASK,
BidPrice=0.10,InstanceType=m3.xlarge,InstanceCount=3
Spot Integration with Amazon EMR
• Can provision instances from the Spot market
• Impact of interruption
• Master node – Can lose the cluster
• Core node – Can lose intermediate data
• Task nodes – Jobs will restart on other nodes (application
dependent)
Scale up with Spot Instances
10 node cluster running for 14 hours
Cost = 1.0 * 10 * 14 = $140
Resize Nodes with Spot Instances
Add 10 more nodes on Spot
Resize Nodes with Spot Instances
20 node cluster running for 7 hours
Cost = 1.0 * 10 * 7 = $70
= 0.5 * 10 * 7 = $35
Total $105
Resize Nodes with Spot Instances
50 % less run-time ( 14  7)
25% less cost (140  105)
Intelligent Scale Down
Effectively Utilize Clusters
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Benefit 3: Logical Separation of Jobs
Hive, Pig,
Cascading
Prod
Presto Ad-Hoc
Amazon S3
Benefit 4: Disaster Recovery Built In
Cluster 1 Cluster 2
Cluster 3 Cluster 4
Amazon S3
Availability Zone Availability Zone
Demo 2 – Word Count Example
Case study: How AOL moved a
2 PB cluster to the AWS cloud
AOL Data Platforms Architecture 2014
AOL
Source Systems In-house Hadoop
Cluster
Database
Reporting Tools
Users
Data Stats & Insights
Cluster Size
2 PB
In-House
Cluster
100 Nodes
Raw
Data/Day
2-3 TB
Data
Retention
13-24 Months
Challenges with In-House Infrastructure
Fixed Cost
Slow Deployment
Cycle
Always On Self Serve
Static : Not Scalable Outages Impact Production Upgrade
Storage Compute
AOL Data Platforms Architecture 2015
1
2
2
3
4
56
Source
Systems
Amazon S3
Amazon EMR
Cluster
Watchdog
Amazon SNS
Amazon IAM
AOL
AWS Direct
Connect
Reporting
Tools
Database
Users
EMR Design Options
Transient
Amazon S3
Elastic Cluster
On-Demand vs. Reserved vs.
Core NodesAmazon EMR
vs. Persistent Cluster
vs. local HDFS
vs. Static Cluster
Spot
vs. Task Nodes
AWS vs. In-House Cost
0 2 4 6
Service
Cost Comparison
AWS
In-House
Service
Cost Comparison
0 2 4 6
AWS
In-House
Source : AOL & AWS Billing Tool
4xIn-House / Month
1xAWS / Month
** In-House cluster includes Storage, Power and Network cost.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Cores Nodes Demand - 06/01/2015 Core…
Restatement Use Case
• Restate historical data going back 6 months
Availability Zones
10
550
EMR Clusters
24,000
Spot EC2 Instances
0
10
20
30
40
50
60
70
Timing Comparison
In-House
AWS
Any questions?
Thank you!

More Related Content

What's hot

HSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSHSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSAmazon Web Services
 
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)Amazon Web Services
 
Configuration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateConfiguration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateAmazon Web Services
 
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon AuroraAmazon Web Services
 
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAmazon Web Services
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon AuroraAmazon Web Services
 
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
WKS407 Wild Rydes Takes Off – The Dawn of a New UnicornWKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
WKS407 Wild Rydes Takes Off – The Dawn of a New UnicornAmazon Web Services
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsAmazon Web Services
 
Amazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar SeriesAmazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar SeriesAmazon Web Services
 
Running Containerised Applications at Scale on AWS
Running Containerised Applications at Scale on AWSRunning Containerised Applications at Scale on AWS
Running Containerised Applications at Scale on AWSAmazon Web Services
 
AWS re:Invent 2016: Busting the Myth of Vendor Lock-In: How D2L Embraced the...
AWS re:Invent 2016: Busting the Myth of Vendor Lock-In:  How D2L Embraced the...AWS re:Invent 2016: Busting the Myth of Vendor Lock-In:  How D2L Embraced the...
AWS re:Invent 2016: Busting the Myth of Vendor Lock-In: How D2L Embraced the...Amazon Web Services
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Amazon Web Services
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...Amazon Web Services
 

What's hot (20)

HSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSHSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWS
 
Architecting on The Cloud
Architecting on The CloudArchitecting on The Cloud
Architecting on The Cloud
 
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
 
Configuration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateConfiguration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef Automate
 
Self-Service Supercomputing
Self-Service SupercomputingSelf-Service Supercomputing
Self-Service Supercomputing
 
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
WKS407 Wild Rydes Takes Off – The Dawn of a New UnicornWKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
WKS407 Wild Rydes Takes Off – The Dawn of a New Unicorn
 
Introduction on Amazon EC2
 Introduction on Amazon EC2 Introduction on Amazon EC2
Introduction on Amazon EC2
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
 
Amazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar SeriesAmazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar Series
 
Running Containerised Applications at Scale on AWS
Running Containerised Applications at Scale on AWSRunning Containerised Applications at Scale on AWS
Running Containerised Applications at Scale on AWS
 
AWS re:Invent 2016: Busting the Myth of Vendor Lock-In: How D2L Embraced the...
AWS re:Invent 2016: Busting the Myth of Vendor Lock-In:  How D2L Embraced the...AWS re:Invent 2016: Busting the Myth of Vendor Lock-In:  How D2L Embraced the...
AWS re:Invent 2016: Busting the Myth of Vendor Lock-In: How D2L Embraced the...
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
AWS re:Invent 2016: Design, Deploy, and Optimize Microsoft SharePoint on AWS ...
 

Viewers also liked

Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialAdrian Hornsby
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReducehuguk
 
AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce Amazon Web Services
 
JNE - Resolución N° 208-2015-JNE - Reglamento del Registro de Organizaciones ...
JNE - Resolución N° 208-2015-JNE - Reglamento del Registro de Organizaciones ...JNE - Resolución N° 208-2015-JNE - Reglamento del Registro de Organizaciones ...
JNE - Resolución N° 208-2015-JNE - Reglamento del Registro de Organizaciones ...Rooswelth Gerardo Zavaleta Benites
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
Space Invading: an approach to sensing
Space Invading: an approach to sensingSpace Invading: an approach to sensing
Space Invading: an approach to sensingAdrian Hornsby
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWSAdrian Hornsby
 
Being Well Architected in the Cloud
Being Well Architected in the CloudBeing Well Architected in the Cloud
Being Well Architected in the CloudAdrian Hornsby
 
8 ways to leverage AWS Lambda in your Big Data workloads
8 ways to leverage AWS Lambda in your Big Data workloads8 ways to leverage AWS Lambda in your Big Data workloads
8 ways to leverage AWS Lambda in your Big Data workloadsAdrian Hornsby
 
Derive Insight from IoT data in minute with AWS
Derive Insight from IoT data in minute with AWSDerive Insight from IoT data in minute with AWS
Derive Insight from IoT data in minute with AWSAdrian Hornsby
 
Θεατροπαιδαγωγική Δράση: Ο Οδυσσέας στον Ανεμόμυλο
Θεατροπαιδαγωγική Δράση: Ο Οδυσσέας στον ΑνεμόμυλοΘεατροπαιδαγωγική Δράση: Ο Οδυσσέας στον Ανεμόμυλο
Θεατροπαιδαγωγική Δράση: Ο Οδυσσέας στον Ανεμόμυλοtheatropaizontas
 
Καλλιέργεια συναισθηματικών δεξιοτήτων και θεατρικο παιχνιδι
Καλλιέργεια συναισθηματικών δεξιοτήτων και θεατρικο παιχνιδιΚαλλιέργεια συναισθηματικών δεξιοτήτων και θεατρικο παιχνιδι
Καλλιέργεια συναισθηματικών δεξιοτήτων και θεατρικο παιχνιδιtheatropaizontas
 
Linfoma y cuidaos de enfermeria
Linfoma  y cuidaos de enfermeria Linfoma  y cuidaos de enfermeria
Linfoma y cuidaos de enfermeria CesarArgus96
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAdrian Hornsby
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAdrian Hornsby
 
Reproductive system by Tlali
Reproductive system by TlaliReproductive system by Tlali
Reproductive system by TlaliMachitja Tlali
 
Unidad 2. marco funcional de la gestión de información principios rectores de...
Unidad 2. marco funcional de la gestión de información principios rectores de...Unidad 2. marco funcional de la gestión de información principios rectores de...
Unidad 2. marco funcional de la gestión de información principios rectores de...claudia Rojas
 
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)Amazon Web Services
 

Viewers also liked (20)

Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potential
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
 
AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce
 
JNE - Resolución N° 208-2015-JNE - Reglamento del Registro de Organizaciones ...
JNE - Resolución N° 208-2015-JNE - Reglamento del Registro de Organizaciones ...JNE - Resolución N° 208-2015-JNE - Reglamento del Registro de Organizaciones ...
JNE - Resolución N° 208-2015-JNE - Reglamento del Registro de Organizaciones ...
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Space Invading: an approach to sensing
Space Invading: an approach to sensingSpace Invading: an approach to sensing
Space Invading: an approach to sensing
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
 
Being Well Architected in the Cloud
Being Well Architected in the CloudBeing Well Architected in the Cloud
Being Well Architected in the Cloud
 
8 ways to leverage AWS Lambda in your Big Data workloads
8 ways to leverage AWS Lambda in your Big Data workloads8 ways to leverage AWS Lambda in your Big Data workloads
8 ways to leverage AWS Lambda in your Big Data workloads
 
Derive Insight from IoT data in minute with AWS
Derive Insight from IoT data in minute with AWSDerive Insight from IoT data in minute with AWS
Derive Insight from IoT data in minute with AWS
 
Θεατροπαιδαγωγική Δράση: Ο Οδυσσέας στον Ανεμόμυλο
Θεατροπαιδαγωγική Δράση: Ο Οδυσσέας στον ΑνεμόμυλοΘεατροπαιδαγωγική Δράση: Ο Οδυσσέας στον Ανεμόμυλο
Θεατροπαιδαγωγική Δράση: Ο Οδυσσέας στον Ανεμόμυλο
 
Καλλιέργεια συναισθηματικών δεξιοτήτων και θεατρικο παιχνιδι
Καλλιέργεια συναισθηματικών δεξιοτήτων και θεατρικο παιχνιδιΚαλλιέργεια συναισθηματικών δεξιοτήτων και θεατρικο παιχνιδι
Καλλιέργεια συναισθηματικών δεξιοτήτων και θεατρικο παιχνιδι
 
Linfoma y cuidaos de enfermeria
Linfoma  y cuidaos de enfermeria Linfoma  y cuidaos de enfermeria
Linfoma y cuidaos de enfermeria
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
Technical Track
Technical TrackTechnical Track
Technical Track
 
Reproductive system by Tlali
Reproductive system by TlaliReproductive system by Tlali
Reproductive system by Tlali
 
15 años y 1 día
15 años y 1 día15 años y 1 día
15 años y 1 día
 
Unidad 2. marco funcional de la gestión de información principios rectores de...
Unidad 2. marco funcional de la gestión de información principios rectores de...Unidad 2. marco funcional de la gestión de información principios rectores de...
Unidad 2. marco funcional de la gestión de información principios rectores de...
 
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
AWS re:Invent 2016: Scaling Up to Your First 10 Million Users (ARC201)
 

Similar to How to run your Hadoop Cluster in 10 minutes

(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
 
Amazon Elastic Map Reduce: the concepts
Amazon Elastic Map Reduce: the conceptsAmazon Elastic Map Reduce: the concepts
Amazon Elastic Map Reduce: the concepts Julien SIMON
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAmazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAmazon Web Services
 
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Web Services
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Scaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMRScaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMRIsrael AWS User Group
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best PracticesAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 

Similar to How to run your Hadoop Cluster in 10 minutes (20)

(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Big data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel AvivBig data with amazon EMR - Pop-up Loft Tel Aviv
Big data with amazon EMR - Pop-up Loft Tel Aviv
 
Amazon Elastic Map Reduce: the concepts
Amazon Elastic Map Reduce: the conceptsAmazon Elastic Map Reduce: the concepts
Amazon Elastic Map Reduce: the concepts
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMR
 
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Scaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMRScaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMR
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Analytics on AWS - IP Expo 2013
Analytics on AWS - IP Expo 2013Analytics on AWS - IP Expo 2013
Analytics on AWS - IP Expo 2013
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 

More from Vladimir Simek

Machine Learning with Amazon SageMaker
Machine Learning with Amazon SageMakerMachine Learning with Amazon SageMaker
Machine Learning with Amazon SageMakerVladimir Simek
 
AWS CZSK Webinář 2020.03: AWS Outposts
AWS CZSK Webinář 2020.03: AWS OutpostsAWS CZSK Webinář 2020.03: AWS Outposts
AWS CZSK Webinář 2020.03: AWS OutpostsVladimir Simek
 
AWS CZSK Webinar - Migrácia desktopov a aplikácií do AWS cloudu s Amazon Work...
AWS CZSK Webinar - Migrácia desktopov a aplikácií do AWS cloudu s Amazon Work...AWS CZSK Webinar - Migrácia desktopov a aplikácií do AWS cloudu s Amazon Work...
AWS CZSK Webinar - Migrácia desktopov a aplikácií do AWS cloudu s Amazon Work...Vladimir Simek
 
News from re:Invent 2019
News from re:Invent 2019News from re:Invent 2019
News from re:Invent 2019Vladimir Simek
 
Serverless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best PracticesServerless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best PracticesVladimir Simek
 
AWS CZSK Webinar 2019.07: Databazy na AWS
AWS CZSK Webinar 2019.07: Databazy na AWSAWS CZSK Webinar 2019.07: Databazy na AWS
AWS CZSK Webinar 2019.07: Databazy na AWSVladimir Simek
 
AWS CZSK Webinář 2019.05: Jak chránit vaše webové aplikace před DDoS útoky
AWS CZSK Webinář 2019.05: Jak chránit vaše webové aplikace před DDoS útokyAWS CZSK Webinář 2019.05: Jak chránit vaše webové aplikace před DDoS útoky
AWS CZSK Webinář 2019.05: Jak chránit vaše webové aplikace před DDoS útokyVladimir Simek
 
Česko-Slovenský AWS Webinář 07 - Optimalizace nákladů v AWS
Česko-Slovenský AWS Webinář 07 - Optimalizace nákladů v AWSČesko-Slovenský AWS Webinář 07 - Optimalizace nákladů v AWS
Česko-Slovenský AWS Webinář 07 - Optimalizace nákladů v AWSVladimir Simek
 
AWS Česko-Slovenský Webinár 03: Vývoj v AWS
AWS Česko-Slovenský Webinár 03: Vývoj v AWSAWS Česko-Slovenský Webinár 03: Vývoj v AWS
AWS Česko-Slovenský Webinár 03: Vývoj v AWSVladimir Simek
 
Artificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to StartArtificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to StartVladimir Simek
 
Artificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to StartArtificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to StartVladimir Simek
 
AWS Webinar CZSK 02 Bezpecnost v AWS cloudu
AWS Webinar CZSK 02 Bezpecnost v AWS clouduAWS Webinar CZSK 02 Bezpecnost v AWS cloudu
AWS Webinar CZSK 02 Bezpecnost v AWS clouduVladimir Simek
 
AWS Webinar CZSK Uvod do cloud computingu
AWS Webinar CZSK Uvod do cloud computinguAWS Webinar CZSK Uvod do cloud computingu
AWS Webinar CZSK Uvod do cloud computinguVladimir Simek
 
Introduction to EKS (AWS User Group Slovakia)
Introduction to EKS (AWS User Group Slovakia)Introduction to EKS (AWS User Group Slovakia)
Introduction to EKS (AWS User Group Slovakia)Vladimir Simek
 
Running Docker Containers on AWS
Running Docker Containers on AWSRunning Docker Containers on AWS
Running Docker Containers on AWSVladimir Simek
 

More from Vladimir Simek (16)

Machine Learning with Amazon SageMaker
Machine Learning with Amazon SageMakerMachine Learning with Amazon SageMaker
Machine Learning with Amazon SageMaker
 
AWS CZSK Webinář 2020.03: AWS Outposts
AWS CZSK Webinář 2020.03: AWS OutpostsAWS CZSK Webinář 2020.03: AWS Outposts
AWS CZSK Webinář 2020.03: AWS Outposts
 
AWS CZSK Webinar - Migrácia desktopov a aplikácií do AWS cloudu s Amazon Work...
AWS CZSK Webinar - Migrácia desktopov a aplikácií do AWS cloudu s Amazon Work...AWS CZSK Webinar - Migrácia desktopov a aplikácií do AWS cloudu s Amazon Work...
AWS CZSK Webinar - Migrácia desktopov a aplikácií do AWS cloudu s Amazon Work...
 
News from re:Invent 2019
News from re:Invent 2019News from re:Invent 2019
News from re:Invent 2019
 
Serverless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best PracticesServerless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best Practices
 
AWS CZSK Webinar 2019.07: Databazy na AWS
AWS CZSK Webinar 2019.07: Databazy na AWSAWS CZSK Webinar 2019.07: Databazy na AWS
AWS CZSK Webinar 2019.07: Databazy na AWS
 
AWS CZSK Webinář 2019.05: Jak chránit vaše webové aplikace před DDoS útoky
AWS CZSK Webinář 2019.05: Jak chránit vaše webové aplikace před DDoS útokyAWS CZSK Webinář 2019.05: Jak chránit vaše webové aplikace před DDoS útoky
AWS CZSK Webinář 2019.05: Jak chránit vaše webové aplikace před DDoS útoky
 
Česko-Slovenský AWS Webinář 07 - Optimalizace nákladů v AWS
Česko-Slovenský AWS Webinář 07 - Optimalizace nákladů v AWSČesko-Slovenský AWS Webinář 07 - Optimalizace nákladů v AWS
Česko-Slovenský AWS Webinář 07 - Optimalizace nákladů v AWS
 
AWS Česko-Slovenský Webinár 03: Vývoj v AWS
AWS Česko-Slovenský Webinár 03: Vývoj v AWSAWS Česko-Slovenský Webinár 03: Vývoj v AWS
AWS Česko-Slovenský Webinár 03: Vývoj v AWS
 
Gaming with AWS
Gaming with AWSGaming with AWS
Gaming with AWS
 
Artificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to StartArtificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to Start
 
Artificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to StartArtificial Intelligence (Machine Learning) on AWS: How to Start
Artificial Intelligence (Machine Learning) on AWS: How to Start
 
AWS Webinar CZSK 02 Bezpecnost v AWS cloudu
AWS Webinar CZSK 02 Bezpecnost v AWS clouduAWS Webinar CZSK 02 Bezpecnost v AWS cloudu
AWS Webinar CZSK 02 Bezpecnost v AWS cloudu
 
AWS Webinar CZSK Uvod do cloud computingu
AWS Webinar CZSK Uvod do cloud computinguAWS Webinar CZSK Uvod do cloud computingu
AWS Webinar CZSK Uvod do cloud computingu
 
Introduction to EKS (AWS User Group Slovakia)
Introduction to EKS (AWS User Group Slovakia)Introduction to EKS (AWS User Group Slovakia)
Introduction to EKS (AWS User Group Slovakia)
 
Running Docker Containers on AWS
Running Docker Containers on AWSRunning Docker Containers on AWS
Running Docker Containers on AWS
 

Recently uploaded

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 

Recently uploaded (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

How to run your Hadoop Cluster in 10 minutes

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Vladimir Simek, Solutions Architect @ AWS 22/03/2016 Amazon Elastic MapReduce How to run your Hadoop Cluster in 10 minutes
  • 2. Agenda • Two different companies – 2 stories • Challenges with Big Data on premises • Technical introduction to Amazon EMR • Amazon EMR features and benefits • Use case of AOL – moving 2 PB on-prem Hadoop cluster to the AWS cloud • Short demos
  • 3. In the beginning – 2 different stories
  • 4. • In 2007 New York Times has decided create a digital archive on the web – all articles from 1851-1922 • 11 million articles (4 TB of data) composed of: • 405,000 large TIFF images • 405,000 XML files • 3.3 million SGML files • Used Amazon EC2 and Hadoop to process the data
  • 5. Time to process? Less than 24 hours Costs? About $240
  • 6.
  • 7. (Undisclosed international company) – subsidiary in France • In 2014 - has decided to run a POC on Big Data analytics • What was the 1st step they did? Invested €7M into server purchase
  • 8. “Want to increase innovation? Lower the cost of failure.” Joi Ito, Director of MIT Media Lab
  • 9. How many big ticket technology ideas can your budget tolerate?
  • 10. (Big) Data for Competitive Advantage Customer segmentation Marketing spend optimization Financial modeling & forecasting Ad targeting & real-time bidding Clickstream analysis Fraud detection Security threat detection
  • 11. Challenges with In-House Infrastructure Fixed Cost Slow Deployment Cycle Always On Self Serve Static : Not Scalable Outages Impact Production Upgrade Storage Compute
  • 12. What is Amazon EMR and how it addresses such issues?
  • 13. Amazon EMR • Managed platform • MapReduce, Apache Spark, Presto • Launch a cluster in minutes • Open source distribution and MapR distribution • Leverage the elasticity of the cloud • Baked in security features • Pay by the hour and save with Spot • Flexibility to customize
  • 14. Make it easy, secure, and cost-effective to run data-processing frameworks on the AWS cloud
  • 15. What Do I Need to Build a Cluster ? 1. Choose instances 2. Choose your software 3. Choose your access method
  • 16. Choice of Multiple Instances CPU c3 family cc1.4xlarge cc2.8xlarge Memory m2 family r3 family Disk/IO d2 family i2 family General m1 family m3 family Machine Learning Batch Processing In-memory (Spark & Presto) Large HDFS
  • 18. Choose Your Software (Quick Bundles)
  • 19. Choose Your Software – Custom
  • 21. Choose Security and Access Control
  • 22. You Are Up and Running!
  • 23. You Are Up and Running! Master Node DNS
  • 24. You Are Up and Running! Information about the software you are running, logs and features
  • 25. You Are Up and Running! Infrastructure for this cluster
  • 26. You Are Up and Running! Security Groups and Roles
  • 27. Use the CLI aws emr create-cluster --release-label emr-4.0.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1, InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge Or use your favorite SDK
  • 28. Demo – Build EMR cluster
  • 29. Now that I have a cluster, I need to process some data
  • 30. Amazon EMR can process data from multiple sources Hadoop Distributed File System (HDFS) Amazon S3 (EMRFS) Amazon DynamoDB Amazon Kinesis
  • 31. Amazon EMR can process data from multiple sources Hadoop Distributed File System (HDFS) Amazon S3 (EMRFS) Amazon DynamoDB Amazon Kinesis
  • 32. On an On-premises Environment Tightly coupled
  • 33. Compute and Storage Grow Together Tightly coupled Storage grows along with compute Compute requirements vary
  • 34. Underutilized or Scarce Resources 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Re-processingWeekly peaks Steady state
  • 35. Underutilized or Scarce Resources 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Underutilized capacity Provisioned capacity
  • 36. Contention for Same Resources Compute bound Memory bound
  • 37. Separation of Resources Creates Data Silos Team A
  • 38. Replication Adds to Cost 3x Single datacenter
  • 39. So how does Amazon EMR solve these problems?
  • 41. Amazon S3 is Your Persistent Data Store Designed for 11 9’s durability $0.03 / GB / month in Ireland Lifecycle policies Versioning Distributed by default EMRFSAmazon S3
  • 42. The Amazon EMR File System (EMRFS) • Allows you to leverage Amazon S3 as a file-system • Streams data directly from Amazon S3 • Uses HDFS for intermediates • Better read/write performance and error handling than open source components • Consistent view – consistency for read after write • Support for encryption • Fast listing of objects
  • 43. Going from HDFS to Amazon S3 CREATE EXTERNAL TABLE serde_regex( host STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' ) LOCATION ‘samples/pig-apache/input/'
  • 44. Going from HDFS to Amazon S3 CREATE EXTERNAL TABLE serde_regex( host STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' ) LOCATION 's3://elasticmapreduce.samples/pig- apache/input/'
  • 45. Benefit 1: Switch Off Clusters Amazon S3Amazon S3 Amazon S3
  • 47. You Can Build a Pipeline
  • 48. Run Transient or Long-Running Clusters
  • 49. Benefit 2: Resize Your Cluster
  • 50. Resize the Cluster Scale Up, Scale Down, Stop a resize, issue a resize on another
  • 51. How do you scale up and save cost ?
  • 53. Spot Integration aws emr create-cluster --name "Spot cluster" --ami-version 3.3 InstanceGroupType=MASTER, InstanceType=m3.xlarge,InstanceCount=1, InstanceGroupType=CORE, BidPrice=0.03,InstanceType=m3.xlarge,InstanceCount=2 InstanceGroupType=TASK, BidPrice=0.10,InstanceType=m3.xlarge,InstanceCount=3
  • 54. Spot Integration with Amazon EMR • Can provision instances from the Spot market • Impact of interruption • Master node – Can lose the cluster • Core node – Can lose intermediate data • Task nodes – Jobs will restart on other nodes (application dependent)
  • 55. Scale up with Spot Instances 10 node cluster running for 14 hours Cost = 1.0 * 10 * 14 = $140
  • 56. Resize Nodes with Spot Instances Add 10 more nodes on Spot
  • 57. Resize Nodes with Spot Instances 20 node cluster running for 7 hours Cost = 1.0 * 10 * 7 = $70 = 0.5 * 10 * 7 = $35 Total $105
  • 58. Resize Nodes with Spot Instances 50 % less run-time ( 14  7) 25% less cost (140  105)
  • 60. Effectively Utilize Clusters 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
  • 61. Benefit 3: Logical Separation of Jobs Hive, Pig, Cascading Prod Presto Ad-Hoc Amazon S3
  • 62. Benefit 4: Disaster Recovery Built In Cluster 1 Cluster 2 Cluster 3 Cluster 4 Amazon S3 Availability Zone Availability Zone
  • 63. Demo 2 – Word Count Example
  • 64. Case study: How AOL moved a 2 PB cluster to the AWS cloud
  • 65. AOL Data Platforms Architecture 2014 AOL Source Systems In-house Hadoop Cluster Database Reporting Tools Users
  • 66. Data Stats & Insights Cluster Size 2 PB In-House Cluster 100 Nodes Raw Data/Day 2-3 TB Data Retention 13-24 Months
  • 67. Challenges with In-House Infrastructure Fixed Cost Slow Deployment Cycle Always On Self Serve Static : Not Scalable Outages Impact Production Upgrade Storage Compute
  • 68. AOL Data Platforms Architecture 2015 1 2 2 3 4 56 Source Systems Amazon S3 Amazon EMR Cluster Watchdog Amazon SNS Amazon IAM AOL AWS Direct Connect Reporting Tools Database Users
  • 69. EMR Design Options Transient Amazon S3 Elastic Cluster On-Demand vs. Reserved vs. Core NodesAmazon EMR vs. Persistent Cluster vs. local HDFS vs. Static Cluster Spot vs. Task Nodes
  • 70. AWS vs. In-House Cost 0 2 4 6 Service Cost Comparison AWS In-House Service Cost Comparison 0 2 4 6 AWS In-House Source : AOL & AWS Billing Tool 4xIn-House / Month 1xAWS / Month ** In-House cluster includes Storage, Power and Network cost.
  • 71. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Cores Nodes Demand - 06/01/2015 Core… Restatement Use Case • Restate historical data going back 6 months Availability Zones 10 550 EMR Clusters 24,000 Spot EC2 Instances 0 10 20 30 40 50 60 70 Timing Comparison In-House AWS

Editor's Notes

  1. Joi Ito - Japanese-American activist, entrepreneur, venture capitalist and Director of the MIT Media Lab
  2. Cost of failure is too high People are afraid to take risks Innovation suffers
  3. Here are a few big data use cases
  4. With storage we want persistence, durability, low cost, and high availability. With compute we want elasticity, and agility.