SlideShare a Scribd company logo
1 of 15
Rainya Mosher, Dev Manager, Deploy Infrastructure
IRC: rainya on freenode Twitter: @rainyamosher
Learning to Scale OpenStack:
A Case Study in Rackspace's
Open Cloud Deployment
April 17, 2013 at 4:30pm
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
It is not the critic who counts; not the man who points out
how the strong man stumbles, or where the doer of deeds
could have done them better. The credit belongs to the man
who is actually in the arena, whose face is marred by dust
and sweat and blood; who strives valiantly; . . . who at best
knows in the end the triumph of high achievement, and who
at worst, if he fails, at least fails while daring greatly.
Theodore Roosevelt
The Man in the Arena, April 1910
2
In the Arena
Learning to Scale OpenStack
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Hundreds of HVs
Thousands of HVs
Tens of Thousand HVs
Hundreds of Thousand
HVs
Global
Cloud
Region Region
Cell Cell Cell
HV HV HV HV HV HV
Cell Cell
Region
3
What does “At Scale” Mean?
Learning to Scale OpenStack
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Code
Package
Deploy
Verify
4
What is the Control Plane Release Strategy?
Learning to Scale OpenStack
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
First Scaling Hurdle – Deploy Mechanism
Learning to Scale OpenStack
5
• Aug 2012
– Rackspace launches Open Cloud
– Frequent releases to fine tune
• Sep 2012 thru Nov 2012
– Deploying code that is two weeks
from trunk takes about two hours
– Begin designing new deploy
mechanism at October Summit
• Dec 2012
– Code deploys take 4 - 6 hours
– Deploy team says, bleary-eyed,
they aren’t doing it again
• Jan 2012
– Deploy again
– Takes more than 6 hours
– Accept that it is no longer
“reasonable” and temporarily stop
deploying code releases
– Focus on the deploy mechanism
0
1
2
3
4
5
6
0
1
2
3
4
5
6
7
Aug-12 Sep-12 Oct-12 Nov-12 Dec-12 Jan-13 Feb-13
Internal Code Releases Capacity Linear (Internal Code Releases)
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• switched from
Debian packages
to virtual
environments
Package
• used torrent for
package, pssh for
fact files, and
mcollective for
actions
Distribute • moved centralized
puppet master to
decentralized
masterless
puppet
Execute
6
Improving the Deploy Mechanism
Deploying from OpenStack Trunk
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Second Scaling Hurdle – Catch up to Trunk
Learning to Scale OpenStack
7
• March 2013
– Production code is 2 months
behind trunk
– Trunk as of 2/28 becomes our
“v152” and bakes in preprod
– Prep for impacting DB
migrations in production
– Re-enable our CI process
• April 2013
– Deploy v152 to production
– 10x increase in DB traffic
– Community works to fix
– Re-deploy v152 with
Community fixes
– Attend Summit in Portland
and share the story
1
2
3
4
1 – Normal DB throughput ; 2 – First installation of v152; 3 – Disabled several
periodic tasks; 4 – Re-installed v152 with patches from Community & turned
periodic tasks back on
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• Testing & Environments
– More robust testing coverage
– Deployer-specific testing further upstream
– Production-like dev environments
– Simulate production compute numbers on non-production hardware
• Database & Code Management
– Non-disruptive DB migration patterns
– DB calls with 6 million rows in mind, not just 60
– Code optimization paths for large datasets
• Process & Community
– Stay close to trunk, even though it is hard
– Explore options for a continuously deployable trunk
How Can We Adapt for Scale Issues?
Learning to Scale OpenStack
8
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Backup Slides
Learning to Scale OpenStack
9
Many of these backup slides were first presented on 4/16/2013 during the
OpenStack Summit session “Deploying from OpenStack Trunk” and are
included here for reference.
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
10
Merge and Branch Strategy
Learning to Scale OpenStack
• The most recent Rackspace release
branch took over 50 minor tags
make to work in production
• Rackspace Development branch is
about 40 patches on top of
OpenStack trunk for internal service
compatability
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
• per-project venv
• .tar of project
venvs + configs
Package
• seed .torrent
• distribute fact
files
• verify completion
Distribute • switch version
• sync databases
• run puppet
• verify completion
Execute
11
Package and Distribute Strategy
Learning to Scale OpenStack
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Deploy and Test Strategy
Learning to Scale OpenStack
• pre-code
check-in
validation
Dev
• smoke tests
• unit tests
Integration
• functional tests
• integration
tests
QA
• regression
tests
• build tests
Pre-Prod
• smoke tests
• build tests
Production
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Benefits and Challenges of Trunk Deploys
Learning to Scale OpenStack
13
Why We Do It (Benefits)
• Issue Resolution
– Early detection of issues and conflicts
– Shorter feedback loop within the
community
– Faster resolution of issues
• Early Feature Delivery
– Smaller, incremental periodic releases
– More stable release candidates at end of
cycle
Why It’s Hard (Challenges)
• Code Management
– Merge conflicts with local patches
– Disruptive DB migrations
– Service restarts
– Temporary version skew
• Testing
– Devstack-based testing vs testing at
scale
– Rework when issues found in RAX deploy
pipeline
• Process
– CI/CD vs Release methodology
– Time to merge patches
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
14
Scale of Deploy Pipeline
Learning to Scale OpenStack
1,000s of Nodes100s of Nodes10s of NodesDevStack
Dev
Integration
& QA
PreProd Production
15
RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM
RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

More Related Content

What's hot

20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4Tim Bell
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStackBelmiro Moreira
 
CERN User Story
CERN User StoryCERN User Story
CERN User StoryTim Bell
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016Belmiro Moreira
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNBelmiro Moreira
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN BarcelonaTim Bell
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellAmrita Prasad
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNBelmiro Moreira
 
Openstack Infrastructure Containerization
Openstack Infrastructure ContainerizationOpenstack Infrastructure Containerization
Openstack Infrastructure ContainerizationKeith Tobin
 
Containers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKAContainers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKABelmiro Moreira
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERNArne Wiebalck
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?Tim Bell
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3Tim Bell
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
Mastering OpenStack - Episode 15 - Network Design
Mastering OpenStack - Episode 15 - Network DesignMastering OpenStack - Episode 15 - Network Design
Mastering OpenStack - Episode 15 - Network DesignRoozbeh Shafiee
 
Mastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling OutMastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling OutRoozbeh Shafiee
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack LibertyStephen Gordon
 
Mastering OpenStack - Episode 02 - Simple Architectures
Mastering OpenStack - Episode 02 - Simple ArchitecturesMastering OpenStack - Episode 02 - Simple Architectures
Mastering OpenStack - Episode 02 - Simple ArchitecturesRoozbeh Shafiee
 
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)Arne Wiebalck
 
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebula Project
 

What's hot (20)

20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 
CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016CPU Optimizations in the CERN Cloud - February 2016
CPU Optimizations in the CERN Cloud - February 2016
 
Moving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERNMoving from CellsV1 to CellsV2 at CERN
Moving from CellsV1 to CellsV2 at CERN
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim Bell
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERN
 
Openstack Infrastructure Containerization
Openstack Infrastructure ContainerizationOpenstack Infrastructure Containerization
Openstack Infrastructure Containerization
 
Containers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKAContainers on Baremetal and Preemptible VMs at CERN and SKA
Containers on Baremetal and Preemptible VMs at CERN and SKA
 
The OpenStack Cloud at CERN
The OpenStack Cloud at CERNThe OpenStack Cloud at CERN
The OpenStack Cloud at CERN
 
OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?OpenStack Paris 2014 - Federation, are we there yet ?
OpenStack Paris 2014 - Federation, are we there yet ?
 
20141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v320141103 cern open_stack_paris_v3
20141103 cern open_stack_paris_v3
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
Mastering OpenStack - Episode 15 - Network Design
Mastering OpenStack - Episode 15 - Network DesignMastering OpenStack - Episode 15 - Network Design
Mastering OpenStack - Episode 15 - Network Design
 
Mastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling OutMastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling Out
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
 
Mastering OpenStack - Episode 02 - Simple Architectures
Mastering OpenStack - Episode 02 - Simple ArchitecturesMastering OpenStack - Episode 02 - Simple Architectures
Mastering OpenStack - Episode 02 - Simple Architectures
 
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
Manila on CephFS at CERN (OpenStack Summit Boston, 11 May 2017)
 
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...
 

Similar to Learning to Scale OpenStack

Lopez deploying openstacktrunk_20130416
Lopez deploying openstacktrunk_20130416Lopez deploying openstacktrunk_20130416
Lopez deploying openstacktrunk_20130416OpenStack Foundation
 
Deploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentDeploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentOpenStack Foundation
 
Operating OpenStack - Case Study in the Rackspace Cloud
Operating OpenStack - Case Study in the Rackspace CloudOperating OpenStack - Case Study in the Rackspace Cloud
Operating OpenStack - Case Study in the Rackspace CloudRainya Mosher
 
Intro to OpenStack - WAJUG
Intro to OpenStack - WAJUGIntro to OpenStack - WAJUG
Intro to OpenStack - WAJUGKevin Jackson
 
Rackspace Private Cloud presentation for ChefConf 2014
Rackspace Private Cloud presentation for ChefConf 2014Rackspace Private Cloud presentation for ChefConf 2014
Rackspace Private Cloud presentation for ChefConf 2014Joe Breu
 
Rackspace Private Cloud presentation for ChefConf 2013
Rackspace Private Cloud presentation for ChefConf 2013Rackspace Private Cloud presentation for ChefConf 2013
Rackspace Private Cloud presentation for ChefConf 2013Joe Breu
 
Getting Started with XenServer and OpenStack.pptx
Getting Started with XenServer and OpenStack.pptxGetting Started with XenServer and OpenStack.pptx
Getting Started with XenServer and OpenStack.pptxOpenStack Foundation
 
Openstackoverview-DEC2013
Openstackoverview-DEC2013Openstackoverview-DEC2013
Openstackoverview-DEC2013Michael Lessard
 
DevOps, CI, APIs, Oh My! - Texas Linux Fest 2012
DevOps, CI, APIs, Oh My! - Texas Linux Fest 2012DevOps, CI, APIs, Oh My! - Texas Linux Fest 2012
DevOps, CI, APIs, Oh My! - Texas Linux Fest 2012Matt Tesauro
 
Lessons Learned Running The Largest OpenStack Clouds
Lessons Learned Running The Largest OpenStack CloudsLessons Learned Running The Largest OpenStack Clouds
Lessons Learned Running The Largest OpenStack CloudsKenneth Hui
 
jclouds Support Training
jclouds Support Trainingjclouds Support Training
jclouds Support TrainingEverett Toews
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...Rahul Krishna Upadhyaya
 
OpenstackOverview.pdf
OpenstackOverview.pdfOpenstackOverview.pdf
OpenstackOverview.pdfKevinBuck30
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...VMware Tanzu
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...VMware Tanzu
 
Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven ! Animesh Singh
 
Webinar: Intro to Trove_Mirantis_26_feb2015
Webinar: Intro to Trove_Mirantis_26_feb2015Webinar: Intro to Trove_Mirantis_26_feb2015
Webinar: Intro to Trove_Mirantis_26_feb2015Tesora
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
 

Similar to Learning to Scale OpenStack (20)

Lopez deploying openstacktrunk_20130416
Lopez deploying openstacktrunk_20130416Lopez deploying openstacktrunk_20130416
Lopez deploying openstacktrunk_20130416
 
Deploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentDeploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production Environment
 
Operating OpenStack - Case Study in the Rackspace Cloud
Operating OpenStack - Case Study in the Rackspace CloudOperating OpenStack - Case Study in the Rackspace Cloud
Operating OpenStack - Case Study in the Rackspace Cloud
 
Intro to OpenStack - WAJUG
Intro to OpenStack - WAJUGIntro to OpenStack - WAJUG
Intro to OpenStack - WAJUG
 
Rackspace Private Cloud presentation for ChefConf 2014
Rackspace Private Cloud presentation for ChefConf 2014Rackspace Private Cloud presentation for ChefConf 2014
Rackspace Private Cloud presentation for ChefConf 2014
 
Rackspace Private Cloud presentation for ChefConf 2013
Rackspace Private Cloud presentation for ChefConf 2013Rackspace Private Cloud presentation for ChefConf 2013
Rackspace Private Cloud presentation for ChefConf 2013
 
Getting Started with XenServer and OpenStack.pptx
Getting Started with XenServer and OpenStack.pptxGetting Started with XenServer and OpenStack.pptx
Getting Started with XenServer and OpenStack.pptx
 
Openstackoverview-DEC2013
Openstackoverview-DEC2013Openstackoverview-DEC2013
Openstackoverview-DEC2013
 
DevOps, CI, APIs, Oh My! - Texas Linux Fest 2012
DevOps, CI, APIs, Oh My! - Texas Linux Fest 2012DevOps, CI, APIs, Oh My! - Texas Linux Fest 2012
DevOps, CI, APIs, Oh My! - Texas Linux Fest 2012
 
Lessons Learned Running The Largest OpenStack Clouds
Lessons Learned Running The Largest OpenStack CloudsLessons Learned Running The Largest OpenStack Clouds
Lessons Learned Running The Largest OpenStack Clouds
 
jclouds Support Training
jclouds Support Trainingjclouds Support Training
jclouds Support Training
 
DeveloperWeek 2014
DeveloperWeek 2014DeveloperWeek 2014
DeveloperWeek 2014
 
Neutron scale
Neutron scaleNeutron scale
Neutron scale
 
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
 
OpenstackOverview.pdf
OpenstackOverview.pdfOpenstackOverview.pdf
OpenstackOverview.pdf
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
 
Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !
 
Webinar: Intro to Trove_Mirantis_26_feb2015
Webinar: Intro to Trove_Mirantis_26_feb2015Webinar: Intro to Trove_Mirantis_26_feb2015
Webinar: Intro to Trove_Mirantis_26_feb2015
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Learning to Scale OpenStack

  • 1. Rainya Mosher, Dev Manager, Deploy Infrastructure IRC: rainya on freenode Twitter: @rainyamosher Learning to Scale OpenStack: A Case Study in Rackspace's Open Cloud Deployment April 17, 2013 at 4:30pm
  • 2. RACKSPACE® HOSTING | WWW.RACKSPACE.COM It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; . . . who at best knows in the end the triumph of high achievement, and who at worst, if he fails, at least fails while daring greatly. Theodore Roosevelt The Man in the Arena, April 1910 2 In the Arena Learning to Scale OpenStack
  • 3. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Hundreds of HVs Thousands of HVs Tens of Thousand HVs Hundreds of Thousand HVs Global Cloud Region Region Cell Cell Cell HV HV HV HV HV HV Cell Cell Region 3 What does “At Scale” Mean? Learning to Scale OpenStack
  • 4. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Code Package Deploy Verify 4 What is the Control Plane Release Strategy? Learning to Scale OpenStack
  • 5. RACKSPACE® HOSTING | WWW.RACKSPACE.COM First Scaling Hurdle – Deploy Mechanism Learning to Scale OpenStack 5 • Aug 2012 – Rackspace launches Open Cloud – Frequent releases to fine tune • Sep 2012 thru Nov 2012 – Deploying code that is two weeks from trunk takes about two hours – Begin designing new deploy mechanism at October Summit • Dec 2012 – Code deploys take 4 - 6 hours – Deploy team says, bleary-eyed, they aren’t doing it again • Jan 2012 – Deploy again – Takes more than 6 hours – Accept that it is no longer “reasonable” and temporarily stop deploying code releases – Focus on the deploy mechanism 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 Aug-12 Sep-12 Oct-12 Nov-12 Dec-12 Jan-13 Feb-13 Internal Code Releases Capacity Linear (Internal Code Releases)
  • 6. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • switched from Debian packages to virtual environments Package • used torrent for package, pssh for fact files, and mcollective for actions Distribute • moved centralized puppet master to decentralized masterless puppet Execute 6 Improving the Deploy Mechanism Deploying from OpenStack Trunk
  • 7. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Second Scaling Hurdle – Catch up to Trunk Learning to Scale OpenStack 7 • March 2013 – Production code is 2 months behind trunk – Trunk as of 2/28 becomes our “v152” and bakes in preprod – Prep for impacting DB migrations in production – Re-enable our CI process • April 2013 – Deploy v152 to production – 10x increase in DB traffic – Community works to fix – Re-deploy v152 with Community fixes – Attend Summit in Portland and share the story 1 2 3 4 1 – Normal DB throughput ; 2 – First installation of v152; 3 – Disabled several periodic tasks; 4 – Re-installed v152 with patches from Community & turned periodic tasks back on
  • 8. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • Testing & Environments – More robust testing coverage – Deployer-specific testing further upstream – Production-like dev environments – Simulate production compute numbers on non-production hardware • Database & Code Management – Non-disruptive DB migration patterns – DB calls with 6 million rows in mind, not just 60 – Code optimization paths for large datasets • Process & Community – Stay close to trunk, even though it is hard – Explore options for a continuously deployable trunk How Can We Adapt for Scale Issues? Learning to Scale OpenStack 8
  • 9. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Backup Slides Learning to Scale OpenStack 9 Many of these backup slides were first presented on 4/16/2013 during the OpenStack Summit session “Deploying from OpenStack Trunk” and are included here for reference.
  • 10. RACKSPACE® HOSTING | WWW.RACKSPACE.COM 10 Merge and Branch Strategy Learning to Scale OpenStack • The most recent Rackspace release branch took over 50 minor tags make to work in production • Rackspace Development branch is about 40 patches on top of OpenStack trunk for internal service compatability
  • 11. RACKSPACE® HOSTING | WWW.RACKSPACE.COM • per-project venv • .tar of project venvs + configs Package • seed .torrent • distribute fact files • verify completion Distribute • switch version • sync databases • run puppet • verify completion Execute 11 Package and Distribute Strategy Learning to Scale OpenStack
  • 12. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Deploy and Test Strategy Learning to Scale OpenStack • pre-code check-in validation Dev • smoke tests • unit tests Integration • functional tests • integration tests QA • regression tests • build tests Pre-Prod • smoke tests • build tests Production
  • 13. RACKSPACE® HOSTING | WWW.RACKSPACE.COM Benefits and Challenges of Trunk Deploys Learning to Scale OpenStack 13 Why We Do It (Benefits) • Issue Resolution – Early detection of issues and conflicts – Shorter feedback loop within the community – Faster resolution of issues • Early Feature Delivery – Smaller, incremental periodic releases – More stable release candidates at end of cycle Why It’s Hard (Challenges) • Code Management – Merge conflicts with local patches – Disruptive DB migrations – Service restarts – Temporary version skew • Testing – Devstack-based testing vs testing at scale – Rework when issues found in RAX deploy pipeline • Process – CI/CD vs Release methodology – Time to merge patches
  • 14. RACKSPACE® HOSTING | WWW.RACKSPACE.COM 14 Scale of Deploy Pipeline Learning to Scale OpenStack 1,000s of Nodes100s of Nodes10s of NodesDevStack Dev Integration & QA PreProd Production
  • 15. 15 RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Editor's Notes

  1. What we as a Community are doing is hard. And a little scary. And did I mention hard? We stumble. A lot. We help each other back up, we dust ourselves off, we say “that was hard” and then we dive back in, hopefully realizing that the stumble wasn’t just failure, but growth and learning and opportunity. Over the last year, there has been lots of opportunity for growth at Rackspace in learning to scale OpenStack within our public Open Cloud deployment.
  2. For Rackspace, “at scale” means deploying to our expanding, multi-region global cloud. A region is made of one or more cells, which in turn is made of hundreds of hypervisors. Rackspace is adding more regions in the next year and each region is trending quickly towards more than dozen cells each.
  3. The basic strategy we use to deployOpenStack onto our public cloud is simple. We take the OpenStack code, package it up with a few local integration modifications, distribute the package, execute the code in the package, and then verify that it works. Simple in concept, but not necessarily easy in execution.
  4. Our initial deploy mechanism used pssh to push the deployment package, a debian file, out to all the nodes. A central puppet master in each region handled configuration management for all the nodes and reported on the status of puppet runs. We prestaged the package earlier in the day, as it could take more than 30 minutes to pssh to all the nodes in the region. Once prestaged, we’d start the deploy scripts. Most nights, we’d be done within 2 hours, including verification through smoke and build tests. We had been working on a new deploy mechanism, knowing that we were going to out-grow our current process eventually, but it was difficult as the core people building the new mechanism were also the experts in the existing process. By January, we accepted we couldn’t keep going at this rate and called a halt to code releases to focus on improving the mechanism.
  5. We completed the deploy mechanism improvement project and implemented it in all production regions by early March. We upgraded the mechanism without changing the code so that we could minimize the changed elements. The new mechanism worked! The virtual environment based packaging reduced the dependency issues we would run into during deploys. We used torrent to seed the package out to all the nodes in a matter of seconds. Masterless puppet removed the central bottleneck that puppet master had become. Mcollective actions kicked everything off and reported on progress. We declared it a success and looked to get our code releases back on track.
  6. We’d resolved our deploy mechanism issues and it was time to catch up to trunk. We were nearly 2 months behind trunk, Grizzly feature freeze had just passed, and we knew it was going to be a challenge getting back into our previous 2 week cycle. We tagged Trunk as of 2/28/2013 as “v152” and deployed it to our internal pipeline. There were several instance faults and tracebacks discovered that were fixed in the v152 line and also submitted back up to Trunk. The DB migration for deleted_at was going to be massive due to the size of our databases, so we did some much needed maintenance to the affected table rows. The migration to include instance type data as key value pairs in the metadata table was concerning, but we’d been through large migrations before and were confident we could fix whatever issue arose. Once we deployed new code to our first data center, though, we knew we had a whole new hurdle to overcome.
  7. Check out Wednesday’s session at 430p on how Rackspace is “Learning to Scale OpenStack” for the story behind the most recent internal release branch!