SlideShare a Scribd company logo
1 of 14
Download to read offline
Cloud Capacity Planning
South Bay SRE meetup - August 9th, 2016
● Cloud Capacity Planning..an Oxymoron?
● Santa Cloud: How Netflix Does Holiday Capacity Planning
● The Data Behind the Planning
Presenting...
Cloud Capacity Planning..an Oxymoron?
South Bay SRE Meetup: August 9th, 2016
● > 83M households
● 190 Countries
● 35% of Internet traffic in US at peak
● Entirely on Cloud*, three regions
● Evacuate a region monthly...for 24 hours
● Capacity planning ~ 5 people! (in the room :-)
* Content served from homegrown OpenConnect CDN
Capacity Planning Concerns
● Facility considerations (Space, Power, Network, Cooling)
● Supply Chain Management Constraints and Relationships
● Hardware lifetime contour & failure rates (MTBF)
● Systems management staff
● Seasonal and unexpected burst considerations
● Workload colocation and performance demands
● Over-provisioning for reliability and rate of innovation
● Effective tooling
● Business continuity planning
(Cloud) Capacity Planning Concerns
● Facility considerations (Power, Network, Cooling)
● Supply Chain Management Constraints and Relationships
● Hardware lifetime contour & failure rates (MTBF)
● Systems management staff
● Seasonal and unexpected burst considerations
● Workload colocation and performance demands
● Over-provisioning for reliability and rate of innovation
● Effective tooling
● Business continuity planning
Cloud-specific CP Factors
● Capacity bounds..unknown (-)
● Vendor Decisions (-/+)
○ Hardware/Offering Evolution Timeline
○ Resource Demand (CPU/Mem/Disk/Net) Matrix
● On-Demand Capability (+)
Netflix Model
● Depend on the AWS on-demand pool for elasticity
● Monitor insufficient capacity exceptions (ICEs) for boundaries
● Invest heavily in 3 year reservations
● Maintain relatively few, large reserved pools
● Cloud Capacity Analytics team develops tools for insight
● Leverage cross-account resource borrowing
The Triad Cloud Impact
Innovation
Reliability
Efficiency
Default Preferred
Considerations of Scale
● Capacity required for critical footprint might require “guarantees”
● API-based observability has limits
● All resources have capacity limits/throttles
● Resource limits by default set for lowest common denominator
● Get creative with unused, but paid for capacity
● Billing file size!
Summary
Capacity
Planning
Coburn Watson
● Director of Performance and Reliability at Netflix
○ Site Reliability Engineering, Performance and OS Engineering, Traffic Management, Chaos Engineering,
Capacity Planning, Cloud Network Engineering
● @coburnw, cwatson@netflix.com
● Looking for some great capacity planning-minded folks
● Performance and Reliability Youtube Channel

More Related Content

Viewers also liked

Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryMike McGarr
 
OTT & The Future of Connected TV
OTT & The Future of Connected TVOTT & The Future of Connected TV
OTT & The Future of Connected TVClearbridge Mobile
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondMike McGarr
 
Implementing DevOps
Implementing DevOpsImplementing DevOps
Implementing DevOpsMike McGarr
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and SecurityJason Chan
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux InstrumentationDarkStarSword
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s DilemmaAmazon Web Services
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 

Viewers also liked (10)

Engineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous DeliveryEngineering Tools at Netflix: Enabling Continuous Delivery
Engineering Tools at Netflix: Enabling Continuous Delivery
 
OTT & The Future of Connected TV
OTT & The Future of Connected TVOTT & The Future of Connected TV
OTT & The Future of Connected TV
 
Continuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyondContinuous Delivery at Netflix, and beyond
Continuous Delivery at Netflix, and beyond
 
Implementing DevOps
Implementing DevOpsImplementing DevOps
Implementing DevOps
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and Security
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux Instrumentation
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma(SPOT302) Availability: The New Kind of Innovator’s Dilemma
(SPOT302) Availability: The New Kind of Innovator’s Dilemma
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
Culture
CultureCulture
Culture
 

More from Coburn Watson

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Coburn Watson
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Coburn Watson
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in CheckCoburn Watson
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWSCoburn Watson
 

More from Coburn Watson (6)

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Check
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWS
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Cloud Capacity Planning..an Oxymoron? - South Bay SRE Meetup Aug-09-2016

  • 1. Cloud Capacity Planning South Bay SRE meetup - August 9th, 2016
  • 2. ● Cloud Capacity Planning..an Oxymoron? ● Santa Cloud: How Netflix Does Holiday Capacity Planning ● The Data Behind the Planning Presenting...
  • 3. Cloud Capacity Planning..an Oxymoron? South Bay SRE Meetup: August 9th, 2016
  • 4. ● > 83M households ● 190 Countries ● 35% of Internet traffic in US at peak ● Entirely on Cloud*, three regions ● Evacuate a region monthly...for 24 hours ● Capacity planning ~ 5 people! (in the room :-) * Content served from homegrown OpenConnect CDN
  • 5. Capacity Planning Concerns ● Facility considerations (Space, Power, Network, Cooling) ● Supply Chain Management Constraints and Relationships ● Hardware lifetime contour & failure rates (MTBF) ● Systems management staff ● Seasonal and unexpected burst considerations ● Workload colocation and performance demands ● Over-provisioning for reliability and rate of innovation ● Effective tooling ● Business continuity planning
  • 6. (Cloud) Capacity Planning Concerns ● Facility considerations (Power, Network, Cooling) ● Supply Chain Management Constraints and Relationships ● Hardware lifetime contour & failure rates (MTBF) ● Systems management staff ● Seasonal and unexpected burst considerations ● Workload colocation and performance demands ● Over-provisioning for reliability and rate of innovation ● Effective tooling ● Business continuity planning
  • 7.
  • 8. Cloud-specific CP Factors ● Capacity bounds..unknown (-) ● Vendor Decisions (-/+) ○ Hardware/Offering Evolution Timeline ○ Resource Demand (CPU/Mem/Disk/Net) Matrix ● On-Demand Capability (+)
  • 9. Netflix Model ● Depend on the AWS on-demand pool for elasticity ● Monitor insufficient capacity exceptions (ICEs) for boundaries ● Invest heavily in 3 year reservations ● Maintain relatively few, large reserved pools ● Cloud Capacity Analytics team develops tools for insight ● Leverage cross-account resource borrowing
  • 10. The Triad Cloud Impact Innovation Reliability Efficiency Default Preferred
  • 11.
  • 12. Considerations of Scale ● Capacity required for critical footprint might require “guarantees” ● API-based observability has limits ● All resources have capacity limits/throttles ● Resource limits by default set for lowest common denominator ● Get creative with unused, but paid for capacity ● Billing file size!
  • 14. Coburn Watson ● Director of Performance and Reliability at Netflix ○ Site Reliability Engineering, Performance and OS Engineering, Traffic Management, Chaos Engineering, Capacity Planning, Cloud Network Engineering ● @coburnw, cwatson@netflix.com ● Looking for some great capacity planning-minded folks ● Performance and Reliability Youtube Channel