SlideShare a Scribd company logo
1 of 18
Download to read offline
Automated Hadoop
 Clusters on EC2
    Mark Kerzner
     SHMsoft
What is Hadoop? :) :) :)
Everybody knows that

... What is your definition?
What is a cloud?
Everybody knows that, but

1.   Elastic resources
2.   Internet delivery
3.   SAAS
4.   Virtualization
5.   Device-enabled
6.   Only (1) or all of the above
You are the Hadoop programmer
... and you need tools

What are your alternatives?
● IDE
● Local "cluster"
● Pseudo-distributed cluster
● EC2
You are the Hadoop programmer
... and you need tools

What are your alternatives?
● IDE - compile and run the code
● Local "cluster" - local file system
● Pseudo-distributed cluster - test outside
● EC2 - test on the cluster, test for scale
What are your resources
●   Tom White, "Hadoop, the Definitive Guide"
●   www.hadoopilluminated.com
For real play, you need a cluster
Hadoop+ (oh, by the way...)
HBase, Cassandra, MongoDB, NoSQL,
Dynamo, BigTable, Dryad (MS), Azure (MS),
MapReduce, MapR (EMC), Cloudera
distribution, EMC distribution, IBM distribution...
Whirr
Setup

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...


Install
curl -O http://www.apache.org/dist/whirr/whirr-0.7.1/whirr-0.7.1.tar.gz
tar zxf whirr-0.7.1.tar.gz; cd whirr-0.7.1

Generate key

sssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr

Run
bin/whirr launch-cluster --config recipes/zookeeper-ec2.properties --private-key-file ~/.ssh/id_rsa_whirr
Whirr limitations
● No EBS
● All or nothing
● Generates configuration artifacts
● Takes over your computer, no more local
  development - uses proxy
● Hard to customize
Amazon EMR
EMR limitations
●   No choice of image
●   Fixed architecture
●   Hard to debug
●   Hard to customize
You do it
Repeat the manual procedure, only automate it

Prepare
AMI, Java, Hadoop

On-the-fly
Start AMI, login, configure, start services,
verify, run test jobs
You do it - advanced

On startup

Under-provision, over-provision, progress

On-the-fly

Monitor, run test jobs, watch for cluster
deterioration
Cloudera Manager
MapR Manager
On the large scale
Hadoop 0.20 - up to 4,000 nodes
Hadoop 0.23 - up to 20,000
GridGain - 100's of 1,000's
Thank you
Questions?

More Related Content

What's hot

Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Andrei Savu
 
MSST-2013 Openstack in the Land of Guilder
MSST-2013 Openstack in the Land of GuilderMSST-2013 Openstack in the Land of Guilder
MSST-2013 Openstack in the Land of GuilderJoshua McKenty
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for DevelopersJeremy Hanna
 
Heroku Dockerの使い所
Heroku Dockerの使い所Heroku Dockerの使い所
Heroku Dockerの使い所Yusuke Kon
 
Redis for .NET Developers
Redis for .NET DevelopersRedis for .NET Developers
Redis for .NET DevelopersYuriy Guts
 
A site in 15 minutes with yii
A site in 15 minutes with yiiA site in 15 minutes with yii
A site in 15 minutes with yiiAndy Kelk
 
A Developer Overview of Redis
A Developer Overview of RedisA Developer Overview of Redis
A Developer Overview of RedisYuriy Guts
 
Haskell Tooling Whirlwind
Haskell Tooling WhirlwindHaskell Tooling Whirlwind
Haskell Tooling WhirlwindSteven Shaw
 
openSUSE storage workshop 2016
openSUSE storage workshop 2016openSUSE storage workshop 2016
openSUSE storage workshop 2016Alex Lau
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtqViet Stack
 
Terraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormationTerraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormationgeekQ
 
Environment for training models
Environment for training modelsEnvironment for training models
Environment for training modelsFlyElephant
 
Terraform, Ansible, or pure CloudFormation?
Terraform, Ansible, or pure CloudFormation?Terraform, Ansible, or pure CloudFormation?
Terraform, Ansible, or pure CloudFormation?geekQ
 
Hadoop enhancements using next gen IA technologies
Hadoop enhancements using next gen IA technologiesHadoop enhancements using next gen IA technologies
Hadoop enhancements using next gen IA technologiesBigdata Meetup Kochi
 
Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Kentoku
 
NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)Ryo ONODERA
 

What's hot (20)

Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
 
MSST-2013 Openstack in the Land of Guilder
MSST-2013 Openstack in the Land of GuilderMSST-2013 Openstack in the Land of Guilder
MSST-2013 Openstack in the Land of Guilder
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for Developers
 
Heroku Dockerの使い所
Heroku Dockerの使い所Heroku Dockerの使い所
Heroku Dockerの使い所
 
Guava
GuavaGuava
Guava
 
Redis for .NET Developers
Redis for .NET DevelopersRedis for .NET Developers
Redis for .NET Developers
 
A site in 15 minutes with yii
A site in 15 minutes with yiiA site in 15 minutes with yii
A site in 15 minutes with yii
 
A Developer Overview of Redis
A Developer Overview of RedisA Developer Overview of Redis
A Developer Overview of Redis
 
Terraform 9
Terraform 9Terraform 9
Terraform 9
 
Etcd terraform by Alex Somesan
Etcd terraform by Alex SomesanEtcd terraform by Alex Somesan
Etcd terraform by Alex Somesan
 
Haskell Tooling Whirlwind
Haskell Tooling WhirlwindHaskell Tooling Whirlwind
Haskell Tooling Whirlwind
 
openSUSE storage workshop 2016
openSUSE storage workshop 2016openSUSE storage workshop 2016
openSUSE storage workshop 2016
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
Terraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormationTerraform, Ansible or pure CloudFormation
Terraform, Ansible or pure CloudFormation
 
Environment for training models
Environment for training modelsEnvironment for training models
Environment for training models
 
Terraform, Ansible, or pure CloudFormation?
Terraform, Ansible, or pure CloudFormation?Terraform, Ansible, or pure CloudFormation?
Terraform, Ansible, or pure CloudFormation?
 
Dev ops meetup
Dev ops meetupDev ops meetup
Dev ops meetup
 
Hadoop enhancements using next gen IA technologies
Hadoop enhancements using next gen IA technologiesHadoop enhancements using next gen IA technologies
Hadoop enhancements using next gen IA technologies
 
Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Sharding with spider solutions 20160721
Sharding with spider solutions 20160721
 
NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)
 

Similar to Automated Hadoop Cluster Construction on EC2

Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardwayDave Pitts
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsMichael Zhang
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStackPuppet
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackke4qqq
 
Introduction to Docker and Containers
Introduction to Docker and ContainersIntroduction to Docker and Containers
Introduction to Docker and ContainersDocker, Inc.
 
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Introduction to Docker at SF Peninsula Software Development Meetup @GuidewireIntroduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Introduction to Docker at SF Peninsula Software Development Meetup @GuidewiredotCloud
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStackke4qqq
 
Take care of hundred containers and not go crazy
Take care of hundred containers and not go crazyTake care of hundred containers and not go crazy
Take care of hundred containers and not go crazyHonza Horák
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopfann wu
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Jérôme Petazzoni
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack eurobsdcon
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzerDmitry Vyukov
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Lessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionLessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionSri Ambati
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...Yandex
 

Similar to Automated Hadoop Cluster Construction on EC2 (20)

Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Introduction to Docker and Containers
Introduction to Docker and ContainersIntroduction to Docker and Containers
Introduction to Docker and Containers
 
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Introduction to Docker at SF Peninsula Software Development Meetup @GuidewireIntroduction to Docker at SF Peninsula Software Development Meetup @Guidewire
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
 
Take care of hundred containers and not go crazy
Take care of hundred containers and not go crazyTake care of hundred containers and not go crazy
Take care of hundred containers and not go crazy
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzer
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Lessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionLessons from Driverless AI going to Production
Lessons from Driverless AI going to Production
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 

More from Mark Kerzner

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingMark Kerzner
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupMark Kerzner
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleMark Kerzner
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiMark Kerzner
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overviewMark Kerzner
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentationMark Kerzner
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Mark Kerzner
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpMark Kerzner
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discoveryMark Kerzner
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryMark Kerzner
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandMark Kerzner
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetableMark Kerzner
 

More from Mark Kerzner (20)

IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Toorcamp 2016
Toorcamp 2016Toorcamp 2016
Toorcamp 2016
 
Witsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streamingWitsml data processing with kafka and spark streaming
Witsml data processing with kafka and spark streaming
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Cloudera search
Cloudera searchCloudera search
Cloudera search
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
FreeEed popcorn overview
FreeEed popcorn overviewFreeEed popcorn overview
FreeEed popcorn overview
 
FreeEed presentation
FreeEed presentationFreeEed presentation
FreeEed presentation
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS Night owl by Boyd Meyer of PROS
Night owl by Boyd Meyer of PROS
 
SHMcloud vision
SHMcloud visionSHMcloud vision
SHMcloud vision
 
Porting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdpPorting your hadoop app to horton works hdp
Porting your hadoop app to horton works hdp
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Open source e_discovery
Open source e_discoveryOpen source e_discovery
Open source e_discovery
 
FreEed - Open Source eDiscovery
FreEed - Open Source eDiscoveryFreEed - Open Source eDiscovery
FreEed - Open Source eDiscovery
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Google Office in Zurich, Switzerland
Google Office in Zurich, SwitzerlandGoogle Office in Zurich, Switzerland
Google Office in Zurich, Switzerland
 
Fun art with fruit and vegetable
Fun art with fruit and vegetableFun art with fruit and vegetable
Fun art with fruit and vegetable
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Automated Hadoop Cluster Construction on EC2

  • 1. Automated Hadoop Clusters on EC2 Mark Kerzner SHMsoft
  • 2. What is Hadoop? :) :) :) Everybody knows that ... What is your definition?
  • 3. What is a cloud? Everybody knows that, but 1. Elastic resources 2. Internet delivery 3. SAAS 4. Virtualization 5. Device-enabled 6. Only (1) or all of the above
  • 4. You are the Hadoop programmer ... and you need tools What are your alternatives? ● IDE ● Local "cluster" ● Pseudo-distributed cluster ● EC2
  • 5. You are the Hadoop programmer ... and you need tools What are your alternatives? ● IDE - compile and run the code ● Local "cluster" - local file system ● Pseudo-distributed cluster - test outside ● EC2 - test on the cluster, test for scale
  • 6. What are your resources ● Tom White, "Hadoop, the Definitive Guide" ● www.hadoopilluminated.com
  • 7. For real play, you need a cluster
  • 8. Hadoop+ (oh, by the way...) HBase, Cassandra, MongoDB, NoSQL, Dynamo, BigTable, Dryad (MS), Azure (MS), MapReduce, MapR (EMC), Cloudera distribution, EMC distribution, IBM distribution...
  • 9. Whirr Setup export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... Install curl -O http://www.apache.org/dist/whirr/whirr-0.7.1/whirr-0.7.1.tar.gz tar zxf whirr-0.7.1.tar.gz; cd whirr-0.7.1 Generate key sssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr Run bin/whirr launch-cluster --config recipes/zookeeper-ec2.properties --private-key-file ~/.ssh/id_rsa_whirr
  • 10. Whirr limitations ● No EBS ● All or nothing ● Generates configuration artifacts ● Takes over your computer, no more local development - uses proxy ● Hard to customize
  • 12. EMR limitations ● No choice of image ● Fixed architecture ● Hard to debug ● Hard to customize
  • 13. You do it Repeat the manual procedure, only automate it Prepare AMI, Java, Hadoop On-the-fly Start AMI, login, configure, start services, verify, run test jobs
  • 14. You do it - advanced On startup Under-provision, over-provision, progress On-the-fly Monitor, run test jobs, watch for cluster deterioration
  • 17. On the large scale Hadoop 0.20 - up to 4,000 nodes Hadoop 0.23 - up to 20,000 GridGain - 100's of 1,000's