SlideShare a Scribd company logo
1 of 46
Autoscaling Spark for Fun and
Profit
Rafal Kwasny
11th Spark London Meetup
2015-11-26
1
Who am I
•DevOPS
•Build a few platforms in my life
•mostly adtech, in-game analytics for Sony
Playstation
•Currently advising Investment Banks
•CTO Entropy Investments
2
How do you run spark?
•Who runs on AWS?
•Who uses EMR?
3
So how to use autoscaling on AWS?
4
Overview
•typical architecture for AWS
•How autoscaling works
•Scripts to make your life easier
5
Typical architecture for AWS
6
Typical architecture for AWS
7
Generate some data
Typical architecture for AWS
8
Store it in S3
Typical architecture for AWS
9
or store it in a message queue
Typical architecture for AWS
10
Use your favourite tool for ETL
Typical architecture for AWS
11
Ship it back to S3
Typical architecture for AWS
12
Or send it somewhere
Typical architecture for AWS
13
- EMR
- spark-ec2
- build cluster from scratch
Map-reduce is about quickly writing very inefficient
code and then running it at massive scale
(C) Someone
14
Problem
•EC2 is a pay-for-what-you-use model
•You just have to decide how much resources
you want to use before starting a cluster
15
Problem
Most common problems while running on EC2
Scaling up
•My team needs a new cluster, how big it
should be?
Scaling down
•Did I shut down the DEV cluster before leaving
the office on Friday evening?
16
How to automate scaling?
17
Types of scaling
Vertical scaling -
„Let’s get a bigger box”
•Change instance type
•Change EBS parameters
18
Horizontal scaling -
„Just add more nodes”
Autoscaling
•Automatic resizing based on demand
•Define minimum/maximum instance count
•Define when scaling should occur
•Use metrics
•Run your jobs and don’t worry about
infrastructure
19
Architecture with autoscaling
20
Using RAM/local SSDs for caching
Only saving output into S3
Fault recovery
Autoscaling components
•AMI - machine image with installed spark
•Launch configuration - defines:
•AMI
•instance type
•instance storage
•public IP
•security groups
23
Autoscaling components
•Autoscaling group
•launch configuration
•availability zones
•VPC details
•min/max servers
•when to scale
•metrics/health checks
24
Putting it all together
Then you can run your job
25
Complicated?
•AWS provides a lot of services
26
spark-cloud
• Better scripts to start spark clusters on EC2
• Alpha version
• https://github.com/entropyltd/spark-cloud
27
What’s inside spark-cloud
Building AMI’s through packer
Packer is a tool for creating machine and
container images for multiple platforms from a
single source configuration.
Supports AWS, DigitalOcean, Docker,
OpenStack, Parallels, QEMU, VirtualBox,
VMware
38
Current functionality
•Start cluster
•Shutdown cluster
•But more to come :)
39
Spot instances
•Spot instances
40
Spot instances
–On-Demand:
$1.400
–Spot: $0.15
–89% cheaper
41
Summary
•Spark and EC2 is a very common combination
•Because it makes your life easier
•And cheaper
•spark-cloud script will help you
•You can just worry about writing good Spark
code!
42
Thank You
rafal@entropy.be
43
44
Amazon S3 Tips
•Don’t use s3n://
•Use s3a:// with hadoop 2.6
–Parallel rename, especially important for committing output
–Supports IAM authentication
–no „xyz_$folder$" files
–input seek
–multipart upload ( no 5GB limit )
–Error recovery and retry
More info https://issues.apache.org/jira/browse/HADOOP-10400
45
Why not EMR?
•Why pay for EMR? It costs more than a spot
instance
•vendor lock-in and proprietary libraries
•netlib-java
46

More Related Content

What's hot

What's hot (20)

(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices(BDT305) Amazon EMR Deep Dive and Best Practices
(BDT305) Amazon EMR Deep Dive and Best Practices
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...
Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
 
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Spark Integration Architecture for restaurant data
Spark Integration Architecture for restaurant data�Spark Integration Architecture for restaurant data�
Spark Integration Architecture for restaurant data
 
AWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explainedAWS EMR (Elastic Map Reduce) explained
AWS EMR (Elastic Map Reduce) explained
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetup
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
SF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerSF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher Berner
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
 

Viewers also liked

基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻
Mason Mei
 

Viewers also liked (20)

OpsStack--Integrated Operation Platform
OpsStack--Integrated Operation PlatformOpsStack--Integrated Operation Platform
OpsStack--Integrated Operation Platform
 
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
 
Aws summit devops 云端多环境自动化运维和部署
Aws summit devops   云端多环境自动化运维和部署Aws summit devops   云端多环境自动化运维和部署
Aws summit devops 云端多环境自动化运维和部署
 
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
透過Amazon CloudFront 和AWS WAF來執行安全的內容傳輸
 
AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得AWS Solutions Architect 準備心得
AWS Solutions Architect 準備心得
 
基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻基于Aws的持续集成、交付和部署 代闻
基于Aws的持续集成、交付和部署 代闻
 
AwSome day 分享
AwSome day 分享AwSome day 分享
AwSome day 分享
 
使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎使用Amazon Machine Learning 建立即時推薦引擎
使用Amazon Machine Learning 建立即時推薦引擎
 
零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture Overview零到千万可扩展架构 AWS Architecture Overview
零到千万可扩展架构 AWS Architecture Overview
 
Internet Cloud Operations - ChinaNetcloud & AWS Event Beijing
Internet Cloud Operations - ChinaNetcloud & AWS Event BeijingInternet Cloud Operations - ChinaNetcloud & AWS Event Beijing
Internet Cloud Operations - ChinaNetcloud & AWS Event Beijing
 
AWS EC2 and ELB troubleshooting
AWS EC2 and ELB troubleshootingAWS EC2 and ELB troubleshooting
AWS EC2 and ELB troubleshooting
 
Aws容器服务详解
Aws容器服务详解Aws容器服务详解
Aws容器服务详解
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界
 
淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用
 
AWS Summit OaaS Talk by ChinaNetCloud
AWS Summit OaaS Talk by ChinaNetCloudAWS Summit OaaS Talk by ChinaNetCloud
AWS Summit OaaS Talk by ChinaNetCloud
 
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)1.	利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
1. 利用微服務架構建立雲端影音平台 (Building Media Platform by Microservices Architecture)
 
管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付管理程式對AWS LAMBDA持續交付
管理程式對AWS LAMBDA持續交付
 
Amazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk Introduction
 
Automate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAutomate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeploy
 
如何規劃與執行大型資料中心遷移和案例分享
如何規劃與執行大型資料中心遷移和案例分享如何規劃與執行大型資料中心遷移和案例分享
如何規劃與執行大型資料中心遷移和案例分享
 

Similar to Autoscaling Spark on AWS EC2 - 11th Spark London meetup

Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
DataWorks Summit
 

Similar to Autoscaling Spark on AWS EC2 - 11th Spark London meetup (20)

(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Leveraging elastic web scale computing with AWS
 Leveraging elastic web scale computing with AWS Leveraging elastic web scale computing with AWS
Leveraging elastic web scale computing with AWS
 
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
Building HPC Clusters as Code in the (Almost) Infinite Cloud | AWS Public Sec...
 
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOC
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOCBusiness Agility: Taking an App Global (at Speed) - Session Sponsored by ITOC
Business Agility: Taking an App Global (at Speed) - Session Sponsored by ITOC
 
Application Lifecycle Management on AWS
Application Lifecycle Management on AWSApplication Lifecycle Management on AWS
Application Lifecycle Management on AWS
 
EMR Training
EMR TrainingEMR Training
EMR Training
 
Part 1 of the REAL Webinars on Oracle Cloud Native Application Development
Part 1 of the REAL Webinars on Oracle Cloud Native Application DevelopmentPart 1 of the REAL Webinars on Oracle Cloud Native Application Development
Part 1 of the REAL Webinars on Oracle Cloud Native Application Development
 
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)Running Oracle EBS in the cloud (UKOUG APPS16 edition)
Running Oracle EBS in the cloud (UKOUG APPS16 edition)
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
AWS Lambda at JUST EAT
AWS Lambda at JUST EATAWS Lambda at JUST EAT
AWS Lambda at JUST EAT
 
NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013NetflixOSS for Triangle Devops Oct 2013
NetflixOSS for Triangle Devops Oct 2013
 
Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New UnicornWorkshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
 
Aws ec2
Aws ec2Aws ec2
Aws ec2
 
Sitecore 8.2 Update 1 on Azure Web Apps
Sitecore 8.2 Update 1 on Azure Web AppsSitecore 8.2 Update 1 on Azure Web Apps
Sitecore 8.2 Update 1 on Azure Web Apps
 
Wild Rides Takes off - The Dawn of a New Unicorn
Wild Rides Takes off - The Dawn of a New UnicornWild Rides Takes off - The Dawn of a New Unicorn
Wild Rides Takes off - The Dawn of a New Unicorn
 
AWS APAC Webinar Week - Getting The Most From EC2
AWS APAC Webinar Week - Getting The Most From EC2AWS APAC Webinar Week - Getting The Most From EC2
AWS APAC Webinar Week - Getting The Most From EC2
 
Self-Service Supercomputing
Self-Service SupercomputingSelf-Service Supercomputing
Self-Service Supercomputing
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda
 
Containerize all the things!
Containerize all the things!Containerize all the things!
Containerize all the things!
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Autoscaling Spark on AWS EC2 - 11th Spark London meetup

Editor's Notes

  1. How many of you use spark in production?
  2. Single source of data very good durability & availability Offloading storage complexity to AWS
  3. Parquet Columnar store Standard supported by Spark, Hive, Presto, Impala Optimised for: column based aggregations Not optimised for `select *` type queries INSERT/UPDATE’s
  4. When on EC2 you have 2 main options: spark-ec2 scripts EMR (Elastic Map-Reduce)
  5. no HDFS no state on workers