Video of presentation can be found here: https://www.youtube.com/watch?v=3pc85InNR20
Time Warner Cable has been slowly deploying Dockerized OpenStack services in production since the Juno release. In this talk we'll share our real-world experiences with deploying OpenStack services in production with Docker
2. Overview
• The Pain of Operating Openstack
• Possible Solutions
• Why Docker Works
• Why Docker Doesn’t Work
• Docker @ TWC
• Lessons Learned
3. • Docker in production in July 2015
• First service was Designate
• Added Heat, Nova and Keystone
• Nova using Ceph and Solidfire Backends
• Neutron in progress
• Glance and Cinder later this year
• Using Docker 1.10 and Docker Registry V2
Docker & OpenStack @ TWC
4. • Started with packages for deployments
• Don’t like big-bang upgrades
• Want to be able to carry local patches
• Want to run mixed versions of services
• Smaller upgrades, more often
How Did We End Up Here?
5. Why Not Packages?
• Built packages for Keystone
• Worked for local patches
• Worked for updating stable branches
• Doesn’t work for mixed releases
• Limited by distro python packaging
• Packaging workflow is a pain
• Packages slow down your workflow
• Package may not exist yet
6. Why Not Python Virtual Envs?
• Deployed Designate with Virtual Envs
• Mirrored Python packages internally
• Built Virtual Envs on servers
• Was slow to deploy
• Still have to install/manage non-Python deps
9. • Reproducible builds
• Easy to distribute artifacts
• Contains all dependencies
• Easy to install multiple versions of an image
Why Docker?
10. • Restarting docker restarts containers
• Intermittent bugginess
• Complex services are hard to fit into Docker
• Requires new tooling for build/deployment/etc
Why Not Docker?
11. Docker @ TWC: Images
• Building base images using debootstrap
• Build openstack-dev image based on that
–Contains all common deps
• Image per OpenStack Service
• Per service base requirements.txt and a frozen one
• Frozen requirements.txt is used for image builds
• Uses upper-constraints.txt for frozen requirements1
1. https://github.com/openstack/requirements/blob/master/upper-constraints.txt
12. Docker @ TWC: Image Tags
• Tag should:
–Identify OpenStack service version
–Identify tooling version
–Be automatically generated
–Be unique
14. Docker @ TWC: Image Distribution
• Using Docker Registry V2
• Registry using file backend for local storage
• Publish to master registry via Jenkins
• Replicate to registry mirrors via rsync
• Mirrors provide read-only access to images
• No dependency on production environment
15. Docker @ TWC: Deployments
• Images installed with puppet-docker
• Managed with twc-openstack/os_docker
• Worked with Puppet OpenStack project to add
hooks for software and service management
• The os_docker module uses these to extend
OpenStack Puppet modules
16. Docker Registry Scaling
• Docker recommends (almost requires) TLS for
registry
• We deploy to 20 hypervisors in parallel
• 8 vCPU Docker Registry
• Supports concurrent 40 pulls * 500mb images
• Size your registry for concurrent pulls * image size
17. Beware Docker Networking
• We use --net host for all containers
• Many services *require* --net host
• Docker always creates bridge and NAT rules
• NAT rules aren’t tied to a specific interface
• Docker picks unused network range
–But can’t see VM IP addresses
• Found this out on first Nova Compute deploy
18. OpenStack Upgrades With Docker
• Allows upgrading single services!
• Allows staging the upgrade images ahead of time
• Not exciting
19. Why Not Kolla?
• At the time didn’t meet our requirements:
–Didn’t support plugins, no source build
–These things are resolved, or being resolved
• Great reference for running OpenStack with Docker
• Recommended
ERIC:
Introductions
This talk is about deploying *OpenStack* with Docker, not deploying docker containers *with* OpenStack
The Pain of Operating Openstack
Possible Solutions
Why Docker Works
Why Docker Doesn’t Work
Docker @ TWC
Lessons Learned
Just a bit of background
We first started using Docker in production in July of last year
First service we deployed with Docker was Designate
Followed by Heat, Nova, then Keystone
With Nova we did a two stage deploy process for control node services followed by compute a while later
With Nova we’re running Ceph and Solidfire as storage backends
It *is* possible to get nova-compute and iscsi working inside a docker container
We’ll be moving Neutron into Docker next, then coming back to Glance and Cinder
The primary short term driver for Neutron is OVS agent restart fixes, which cause small outages
Changes have largely been merged in the last couple of months, but we are trying to use stable release branches for running prod. Seeing these changes get merged into stable branches now
Using Docker 1.10 with Docker Registry V2
So how did we end up deploying OpenStack services with Docker?
We’ve traditionally used packages for deployments
Over time we realized packages really weren’t meeting our requirements very well
Packages tend to lead to a big bang type of upgrade.
We run multiple services on the same set of control servers, and when doing upgrades our API outages are longer and riskier than we wanted them to be
We want to be able to carry local patches and cherry-pick fixes from master branches
Many times we run into a bug, find it on launchpad and see that a fix is committed on master, but not backported. Or a fix is backported, but the package is not ready yet.
We can do some of those backports ourselves
We don’t want to have to run the same version of OpenStack for all services
For example, we’re much more aggressive about upgrading services like Horizon and Heat than Nova and Neutron
We want to upgrade services independently of each other.
We also want to follow stable updates more aggressively than distros do
Only a few stable releases are done over the six month lifetime of an OpenStack release
Distros usually lag behind those by weeks, if not longer
We want to do smaller upgrades, more often, and one or two services at a time.
So you may be thinking: Why can’t you do this with:
Packages? Virtualenvs, etc?
We looked at and tried some different options
We tried packages for Keystone
We took the packages from Canonical, replaced the source in them, left them mostly the same otherwise
This worked reasonably well for carrying patches, worked well for stable updates
This didn’t work well for mixed openstack releases
With normal distro packaging, you can’t have two versions of the same python library installed at the same time
There are significant conflicts in library requirements across OpenStack releases
Because of this we were still dependent on Canonical for packaging the python libraries that the services depended on.
Package workflow on Debian/Ubuntu isn’t rocket science, but it clearly hasn’t changed much in the last ten years.
I hate it.
There are certain times where we want the latest and greatest of some python library, which may not even have a package built for it. If you use pip install to install python libraries in system space, there is no telling what you might end up with - especially installing from git urls
Another option you sometime hear people using is Python virtual environments
We use virtual environments for horizon, it probably has the most dependencies
We originally deployed Designate using Python virtual environments, because there were no packages available
We mirrored the python packages internally, built them into wheels, created the virtualenvs on the servers at deploy time
This met most of our requirements, but:
Was slow
Had issues with python modules that required external commands, shared libraries, etc
Still an issue with shared dependencies, such as an oslo library that reads from a shared location on the filesystem like /etc/nova/foo etc
Everyone else is doing it?
I’m only kind of kidding here
Yes, you may have weird problems with Docker in some cases, but nearly every problem we’ve had, other people have had also.
It’s getting better
It’s being actively developed and it’s maturing at an impressive pace.
Packaging tools aren’t improving, and openst
There aren’t lots of mature toolchains deploying python-based virtual environments across dev, staging, and prod.
Don’t discount the value of following the crowd in this case.
Besides, you’re running OpenStack, already right? You’re used to deploying software to production that has what we might call a “quirky personality?”
But aside from that, why docker?
Being able to reproduce builds and deployment are really important for us
When we do a build we’re able to encapsulate everything that is needed to run that service
When we do a deploy, we’re only dependent on our internal Docker registry
It’s easy to automate building and distributing docker images.
And when you do build your images, it solves the issue of needing to manage shared libraries and other dependencies. It’s all inside the image.
It’s also easy to install multiple versions of a Docker image on a given server
When we’ve done upgrades in the past, the majority of the time to do the upgrades is the package download, install and configuration time
With Docker we can prestage the new image.
An upgrade just ends up being running database migrations, making any needed config changes and starting the service with the new image.
So why wouldn’t you want to use Docker for deploying OpenStack?
Restarting docker restarts all containers - Fixed in some future version
This can be a major issue for things like the Neutron OVS agent
Docker does have bugs:
We’ve seen intermittent issues with the aufs backend
We’ve also seen intermittent issues on new installs with the docker bridge not being configured correctly
However, we’ve been about to work around these relatively minor issues
Some services like keystone or heat are pretty easy to get into a container
However, more complex services like Neutron require a lot of specific configuration in order to talk to OVS and create network namespaces,etc
Nova requires special configuration for talking to storage and libvirt, etc
Also, unless you’re already deploying services with Docker, you’re going to need some new tooling
This includes building images, installing them and making sure they run.
This is yet another thing to manage and version
For example, the existing Puppet modules for OpenStack don’t have any direct Docker support.
That’s something we’re maintaining ourselves, but we’ll talk about that more in a bit.
Let’s talk a little bit about how we deploy OpenStack using Docker at Time Warner Cable
CLAYTON:
So we’ve covered some background and reasons why and why not to use Docker, so let’s talk about how we’re deploying services using Docker, starting with how we build our Docker images
We build our base images from an internal Ubuntu mirror using debootstrap
We build an image we call “openstack-dev” on top of that
This is a relatively fat image that all OpenStack services are built on top of
This includes all the shared libraries and command-line tools needed by any service
From there we build per service images (so nova image, keystone image, etc)
One key thing here is that we want to be very explicit about what version of dependencies we’re going to build the image with so that we have reproducible results
To achieve that, we have two requirements.txt files, one is very high level, and the other contains all dependencies pinned to specific versions
For example, the high level requirements.txt for nova pulls in nova itself, the mysql driver, the memcache client and some internal plugins we’ve developed.
From that high level requirements file, we have a tool that builds a Python virtual environment locally
We build that virtualenv using the upper-constraints.txt file from the upstream infra project
That ensures we’re using tested and supported versions of the libraries going into it
From that virtual environment we generate a frozen requirements.txt file that has all the required libraries pinned to a specific version
Both the high level and frozen requirements.txt files are checked in along with the Dockerfile
Make you have a plan for updating your base images.
Docker images are another thing to update when new bugs or security issues are announced
You want to make sure you’re only changing things you intend to
Another thing you need to think about for images is, how are you going to tag, or version them?
When we started thinking about how we wanted to tag our images, there were a few thing we wanted the tag to do:
It should be obvious which version of the service we were using, ideally to a specific commit
It should also clearly identify the version of the tooling was used to generate the image
If the Dockerfile changes, then the image tag should change also
It should be automatically generated. We didn’t want to rely on people to update the tags
Lastly, every image generated should have a unique tag. We didn’t want ambiguity about which version of a tag was the “right” one
This is an example of what we came up with
This is the tag we’re using for our Heat image currently:
The first part here is the output of the git-describe command for the Heat commit that we’ve put in this image
This includes the closest tag: 5.0.1
The number of commits since that tag: 9
And the short hash of that git commit
The second part versions the dockerfile and associated scripting that goes along with it
This includes the number of commits we’ve had: 16
And the short hash of the commit containing the Dockerfile and tooling
When we deploy new images, we always pin to a specific tag. We don’t use the “latest” tag convention that is common on DockerHub
So as I mentioned before, we’re using the Open Source Docker Registry V2
We’ve setup basic auth for this with TLS
We’re using the file backend for this to store image data in local storage
When a change is merged to git, a new image is automatically built by Jenkins, and then pushed into the Docker Registry “master”
Note that the Jenkins the only way to push images into the master repository
After that image is pushed, another job kicks off in our two development sites and mirrors the data from the master to local mirrors via rsync
These mirrors provide read-only access to the images and give us some measure of geographic redundancy
One key thing here is that our docker registries live in our development environments and don’t backend into Swift or anything like that.
We thought about using the Swift backend for our registry and intentionally decided not to have our production deploys depend on production being available.
The scary scenario here is this: We use keystone auth for swift and we deploy keystone using docker. What if we have a bad keystone deploy? How do we fix that?
Deployments: How do we actually get things running using these awesome docker images we’ve worked so hard on?
We’ve always been a Puppet shop, and we’re big fans of the Puppet OpenStack modules
However, these modules only support installing services from packages
When we first started down this path, we forked the Designate module, added support for non-package installs
We tried to contribute this upstream
Got complaints that it was very specific to our use case, wasn’t likely to be useful to other people
These complaints were 100% valid
What we came up with after that was the idea of adding hooks to the upstream Puppet modules to allow making package and service management extensible
We brought a proof of concept implementation to the Puppet OpenStack team and they were receptive to that idea, so we pursued it
We added this “hooks” support to the puppet-designate module, and created a new module that became the os_docker module.
This os_docker module is our “special sauce” for deploying OpenStack services with Docker
It contains the glue needed to pull docker images, setup the init scripts, example config files and the CLI wrappers around docker
Thing that that packages normally provide
The os_docker module is publically available in our github org, and we’ve tried to make it relatively unopinionated if you’re interested in taking a look
Supports keystone, nova, designate and heat today, adding more
Integrates with stock Puppet OpenStack modules
So everything hasn’t been smooth sailing with our Docker adventure, let’s talk about some of the issues we’ve run into
So we have run into some (dumb) problems with scaling our Docker registry
Docker recommends and nearly requires that you use TLS for your registry
When we first deployed nova-compute using Docker, it failed miserably because the registry fell over
When we investigated this, we realized that it wasn’t actually the *registry* that was dying, but that nginx in front of it was.
The issue being that we had only configured it to use a single thread and that thread was just running out of CPU doing the necessary encryption
We changed our nginx configuration to use one worker per CPU. We had 4 CPUs, so that improved things a lot
Next deploy we just had a few intermittent failures, so we ran some testing and raised the number of CPUs on the registry
We’ve ended up with an 8 vCPU vm to host our Docker registry and we can support 40 concurrent pulls of images that are roughly 500mb each
This is basic “how do I docker” stuff, but something we learned the hard way
So we don’t use Docker networking in production, you’ll probably want to avoid in some cases also
Partially this is because of performance
But mostly we just don’t need it for most services, and for some services they actually require being in the host network namespace
One thing to be aware of here, is that Docker sets up it’s native networking even if you’re not using it.
If you’ve looked into it much, you’ll have noticed that docker creates a docker0 network device and associates a large private network range with that interface
It’s pretty smart about this, it’ll make sure that it picks a network that’s not already in use on your machine.
So we deployed Docker to our hypervisors as prep work for dockerizing nova-compute
We had a customer that was using 172.17.0.0/16 their private network, which happened to be the same network that docker had decided to use on all compute hosts
That customer started reporting connectivity issues inside their private network shortly after that deploy
We realized docker installs a NAT rule for all traffic from that network range
It rewrites all the traffic from that range to be sourced from the hypervisor’s public, so that containers using Docker networking have network connectivity
This meant that Docker’s NAT rule was rewriting of this customer’s traffic to the sourced from the hypervisor’s IP, instead of going out a VXLAN tunnel
This broke all networking for that private network
Lesson learned, we turned Docker networking off entirely on control and compute hosts that do any customer network traffic
As we mentioned before, upgrades is one of the big drivers for us moving services into Docker
So how do those upgrades work out in practice?
Mostly the same as packages
There are two key differences with Docker:
Upgrade time can be shorter, due to ability to pre-stage the new version
You can upgrade just one service on a server that hosts multiple services, without worrying about conflicting dependencies.
This means you do do things like upgrade heat before neutron.
I find the idea of upgrading Heat to stable mitaka next week a lot less scary than upgrading neutron to mitaka next week
Overall, upgrading OpenStack in Docker containers isn’t exciting, but that’s *good* because upgrading all services on a node with packages at the same time *is* kind of scary at times.
Those of you familiar with the Kolla project are probably wondering why we didn’t use that instead of doing our own thing
For those of you that aren’t familiar, Kolla is an official OpenStack project for building and deploying OpenStack using Docker.
We’ve looked at Kolla a few times, and have been really impressed by how quickly it’s moving.
There were a few reasons we’ve not yet adopted Kolla
Maturity was definitely an issue when we first looked at it.
Building images from source wasn’t supported initially and that was a requirement for us
A few months later, we looked at it again, and they had added the ability to build from source but didn’t yet support building images that had third party plugins in them
This was a requirement for us for Designate, but also for some internal middleware we use for monitoring
This was discussed at the Tokyo summit, and it was something they had planned for the Mitaka cycle.
However, Kolla has been a great resource for us
We have regularly consulted both the Kolla source, and the Kolla team with questions we’ve had about how to handle various issues
Both the Kolla source and the Kolla team have been really easy to work with
Thanks to them for that
If you’re interested in deploying OpenStack with Docker, we’d recommend putting Kolla on your short list of projects to look at.
We’re definitely keeping an eye on the project, and like the idea of being able to use their images as least with our existing tooling.
That’s all we’ve got, we appreciate everyone coming
If you want to get in touch with us then here is our contact information
Hopefully have some time for questions