How to build a PostgreSQL server that weights less than 20Mb - using Docker & Buildroot

How to build a Postgresql server that weights
less than 20Mb (with docker and buildroot)
Highlights of this article (TL,DR): we'll show how to use buildroot to create a basic but fully
functional container using less than 4 MB of disk space (uncompressed). Then we will apply the
same technique to obtain a PostgreSQL image which fits in less than 20 MB (not including your
databases, of course). You can play with those containers at once if you want. Just run "docker
run jpetazzo/pglite", and within seconds, you will have a PostgreSQL server running on your
machine!
I like containers, because they are lighter than virtual machines. This means that they will
use less disk space, less memory, and ultimately be cheaper and faster than their heavier
counterparts. They also boot much faster. Great.
But how "lightweight" is "lightweight"? I wanted to know. We all wanted to know. We already
have a small image, docker-ut, using a statically compiled buildbox (it's built using this script). It
uses about 7 MB of disk space, and is only good to run simple shell scripts; but it is fully
functional—and perfect for Docker unit tests.
How can we build something even smaller? And how can we build something more useful (e.g.,
a PostgreSQL server), but with a ridiculously low footprint?
To build really small systems, you have to look at embedded systems. That's where you find
the experts about everything small-footprint and space efficient. In the world of embedded
systems, sometimes you have to cram a complete system, including Linux kernel, drivers, start
up scripts, essential libraries, web and SSH servers, WiFi access point management code, radius
server, OpenVPN client, bittorrent downloader -- all in 4 MB of flash. Sounds like what we need,
right?
There are many tools out there to build images for embedded systems. We decided to use
buildroot. Quoting buildroot's project page: "Buildroot is a set of Makefiles and patches that
makes it easy to generate a complete embedded Linux system." Let's put it to the test!
The first step is to download and unpack buildroot:
curl http://buildroot.uclibc.org/downloads/buildroot-2013.05.tar.bz2 | tar jx
Buildroot itself is rather small, because it doesn't include the source of all the things that it compiles. It
will download those later. Now let's dive in:
cd buildroot-2013.05/

The first thing is to tell buildroot what we want to build. If you have ever built your own kernel, this step
will look familiar:
make menuconfig
For now, we will change just one thing: tell buildroot that we want to compile for a 64 bits traget. Go to
the "target architecture" menu, and select x86_64. Then exit (save along the way). Now brew a big pot
of coffee, and fire up the build:
make
This will take a while (from 10 minutes to a couple of hours, depending on your local machine
beefiness). This takes so long because it will first compile a toolchain. It means that instead of
using your default compiler and libraries, it will: download and compile a preset version of gcc;
download and compile uclibc (a small-footprint libc); and then it will use those to compile
everything else. This sounds like a lot of extra work, but it brings two huge advantages:
 if you want to build for a different architecture (e.g. that Raspberry Pi), it will work
exactly the same way;
 it abstracts your local compiler: your version of gcc/clang/other is irrelevant, since your
image will be built by the versions fixed by buildroot anyway.
At the end of the build, our minimalist container is ready! Let's have a look:
cd output/images
ls -l
You should see a small, lean, rootfs.tar file, containing the image to be imported in Docker.
But it's not quite ready yet. We need to fix a few things.
 Docker sets the DNS configuration by bind-mounting over /etc/resolv.conf. This
means that /etc/resolv.conf has to be a standard file. By default, buildroot makes it a
symlink. We have to replace that symlink with a file (an empty file will do).
 Likewise, Docker "injects" itself within containers by bind-mounting over /sbin/init.
This means that /sbin/init should be a regular file as well. By default, buildroot makes
it a symlink to busybox. We will change that, too.
 Docker injects itself within containers, and (as of I write this) it is dynamically linked.
This means that it requires a couple of libraries to run correctly. We will need to add
those libraries to the container.
(Note: Docker will eventually switch to static linkage, which means that the last step won't be
necessary anymore.)
We could unpack the tar file, do our changes, and repack; but that would be boring. So instead,
we will be fancy and update the file on the fly.

Let's create an extra directory, and populate it with those "additions":
mkdir extra extra/etc extra/sbin extra/lib extra/lib64
touch extra/etc/resolv.conf
touch extra/sbin/init
cp /lib/x86_64-linux-gnu/libpthread.so.0 /lib/x86_64-linux-gnu/libc.so.6 extra/lib
cp /lib64/ld-linux-x86-64.so.2 extra/lib64
The paths to the libraries might be different on your machine. In doubt, you can run ldd
$(which docker) to see which libraries are used by your local Docker install.
Then, create a new tarball including those extra files:
cp rootfs.tar fixup.tar
tar rvf fixup.tar -C extra .
Last but not least, the "import" command will bring this image into Docker. We will name it "dietfs":
docker import - dietfs < fixup.tar
We're done! Let's make sure that everything worked properly, by creating a new container with this
image:
docker run -t -i dietfs /bin/sh
For what it's worth, I put together a small fixup script on Gist, to automate those steps, so you can also
execute it like this:
curl https://gist.github.com/jpetazzo/b932fb0c753e69c73d31/raw > fixup.sh
sh fixup.sh

The result is a rather small image; less than 3.5 MB:
REPOSITORY TAG ID CREATED SIZE
jpetazzo/busybox latest 0c0468ea37af 5 days ago 3.389 MB (virtual 3.389 MB)
Not Bad!
Now, how do we build something more complex, like a PostgreSQL server?
Why PostgreSQL? Two reasons. One: it's awesome. Two: I didn't find a PostgreSQL package
in buildroot, so it was an excellent opportunity to learn how to include something "from scratch",
as opposed to merely ticking a checkbox and recompiling away.
First, we want to create a directory for our new package. From buildroot's top directory:
mkdir packages/postgres
Then, we need to put a couple of files in that directory. For your convenience, I stored them on Gist:
curl https://gist.github.com/jpetazzo/5819538/raw/Config.in > packages/postgres/Config.in
curl https://gist.github.com/jpetazzo/5819538/raw/postgres.mk > packages/postgres/postgres.mk

Let's have a look at those files now. First, Config.in: it is used by make menuconfig to display a
checkbox for our new package (yay!), but also to define some build dependencies. In that case, we need
IPV6 support.
config BR2_PACKAGE_POSTGRES
bool "postgres"
depends on BR2_TOOLCHAIN_BUILDROOT_INET_IPV6
help
PostgresSQL server
comment "postgres requires a toolchain with IPV6 support enabled"
depends on !BR2_TOOLCHAIN_BUILDROOT_INET_IPV6
How does one know which dependencies to use? I confess that I tried first with no dependency
at all. The build failed, so I had a look at the error messages, saw that it complained about
missing IPV6 headers; so I fixed the issue by adding the required dependencies.
The other file, postgres.mk, contains the actual build instructions:
#############################################################
#
# postgresql
#
#############################################################
POSTGRES_VERSION = 9.2.4
POSTGRES_SOURCE = postgresql-$(POSTGRES_VERSION).tar.gz
POSTGRES_SITE = http://ftp.postgresql.org/pub/source/v$(POSTGRES_VERSION)/$(POSTGRES_SOURCE)
POSTGRES_CONF_OPT = --with-system-tzdata=/usr/share/zoneinfo
POSTGRES_DEPENDENCIES = readline zlib
$(eval $(autotools-package))
As you can see, it is pretty straightforward. The main thing is to define some variables to tell
buildroot where it should fetch PostgreSQL source code. We don't have to provide actual build
instructions, because PostgreSQL uses autotools. ("This project uses autotools" means that you
typically compile it with "./configure && make && make install ; this probably rings a bell
if you ever compiled a significant project manually on any kind of UNIX system!)
The build instructions will actually be expanded from the last line. If you want more details
about buildroot's operation, have a look at buildroot's autotools package tutorial.

We can see that postgres.mk also defines more dependencies: readline and zlib. So what's the
difference between the CONF_OPT, DEPENDENCIES, and the "depends" previously seen in
Config.in?
 CONF_OPT provides extra flags which will be passed to ./configure. In this case, the
compilation was failing, telling me that I should specify the path to timezone data. I
looked around and figured out the right flag.
 DEPENDENCIES tells buildroot to compile extra libraries before taking care of our
package. Guess what: when I tried to compile, it failed and complained about missing
readline and zlib; so I added those dependencies and that's it.
 "depends" in Config.in is a toolchain dependency. It is not really a library; it merely
tells buildroot "hey, when you will compile uclibc, make sure to include IPV6 support,
will you?". It has a strong implication: when you change the configuration of the
toolchain (C library or compiler), you have to recompile everything: the toolchain and
everything which was compiled with it. This will obviously be longer than just
recompiling a single package. It is done with the command make clean all.
Last but not least, we need to include our Config.in file in the top-level Config.in. The quick
and dirty way is to do this (from buildroot top directory):
echo 'source "package/postgres/Config.in"' >> Config.in
Note: normally, we should do this in a neat submenu section within e.g. packages/Config.in.
But this way will save us some hassle navigating through the menus.
Alright, now run make menuconfig again; go to "Toolchain", enable IPV6 support, go back to
the main menu, and enable "postgres". Now recompile everything with make clean all. This
will take a while.
Just like before, we need to "fixup" the resulting image:
cd output/images
curl https://gist.github.com/jpetazzo/b932fb0c753e69c73d31/raw | sh
We now have a Docker image with PostgreSQL in it; but it is not enough. We still need to setup
the image to start PostgreSQL automatically, and even before that, PostgreSQL will have to
initialize its data directory (with initdb). We will use a Dockerfile and a custom script for
that.
What's a Dockerfile? A Dockerfile contains basic instructions telling Docker how to build an
image. When you use Docker for the first time, you will probably use "docker run" and "docker
commit" to create new images; but you should quickly move to Dockerfiles and "docker build"
because it automates those operations and makes it easier to share "recipes" to build images.

Let's start with the custom script. We want this script to run automatically within the container
when it starts. Make a new empty directory, and create the following init file in it:
#!/bin/sh
set –e
mkdir /usr/share/zoneinfo /data
chown default /data
head -c 16 /dev/urandom | sha1sum | cut -c1-10 > /pwfile
echo "PG_PASSWORD=$(cat /pwfile)"
su default -s /usr/bin/initdb -- --pgdata=/data --pwfile=/pwfile --username=postgres --auth=trust
>/dev/null
echo host all all 0.0.0.0 0.0.0.0 md5 >> /data/pg_hba.conf
exec su default -s /usr/bin/postgres -- -D /data -c 'listen_addresses=*'
PostgreSQL will refuse to run as root, so we use the default user (conveniently provided by
buildroot). We create /data to hold PostgreSQL data files, assign it to the non-privileged user.
We also generate a random password, save it to /pwfile, and display it (to make it easier to
retrieve later). We can then run initdb to actually create the data files. Then, we extend
pg_hba.conf to authorize connections from the network (by default, only local connections are
allowed). The last step is to actually start the server.
Make sure that the script is executable:
chmod +x init
Now, in the same directory, we will create the following Dockerfile, to actually inject the previous
script in a new image:
from dietfs
add . /
expose 5432
cmd /init
The fixup.sh script has imported our image under the name "dietfs", so our Dockerfile will start
with from dietfs, to tell Docker that we want to use that image as a base. Then, we add all the
files in the current directory to the root of our image. This will also inject the Dockerfile itself,
but we don't care. We expose TCP port 5432, and finally tell Docker that by default, when a
container is created from this image, it should run our /init script. You can read more about the
Dockerfile syntax in Docker's documentation.

The next step is to build the new image using our Dockerfile:
docker build -t pglite .
That's it. You can now start a new PostgreSQL instance:
docker run pglite
The output will include the password, and then the first log messages from the server:
PG_PASSWORD=4e68b1958c
LOG: database system was shut down at 2013-06-20 03:55:50 UTC
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
Weak Password Is Weak! Our password is random, but in only includes hexadecimal digits
(i.e. [0-9a-f]). You can make it better by including base64 in the image, and using base64 instead
of md5sum. Alternatively, you can use longer passwords.
Take note of the password. It's OK to hit "Ctrl-C" now: the container will still run in the
background. Let's check which port was allocated for our container. docker ps will show us all
the containers currently running; but to make things even simpler, we will use docker ps -l,
which only shows the latest container.
$ docker ps –l
ID IMAGE COMMAND CREATED STATUS PORTS SIZE
e21ba744ff09 pglite:latest /bin/sh -c /init About a minute ago Up About a minute 49168->5432
23.53 MB (virtual 39.87 MB)
Alright, that's port 49168. Does it really work? Let's check for ourselves! You can try locally if you have a
PostgreSQL client installed on your Docker machine; or from anywhere else (just replace "localhost"
with the hostname or IP address of your Docker machine).
$ psql postgres --host localhost --port 49168 --username postgres
Password for user postgres: 4e68b1958c
psql (9.1.3, server 9.2.4)
WARNING: psql version 9.1, server version 9.2.
Some psql features might not work.
Type "help" for help.

postgres=# q
$
A small note about sizes: the image takes about 16 MB, but the data files take almost 24 MB.
So the total footprint is really about 40 MB.
What if we want to automate the creation of our PostgreSQL container, to run our own
PostgreSQL-as-a-Service platform? Easy, with just a tiny bit of shell trickery!
CONTAINERID=$(docker run -d pglite)
while ! docker logs $CONTAINERID 2>/dev/null | grep -q ^PG_PASSWORD= ; do sleep 1 ; done
eval $(docker logs $CONTAINERID 2>/dev/null)
PG_PORT=$(docker port $CONTAINERID 5432)
echo "A new PostgreSQL instance is listening on port $PG_PORT. The admin user is postgres, the admin
password is $PG_PASSWORD."
That's it! If you name your image "yourname/pglite" instead of just "pglite", you will be able to "docker
push" it to the Docker Public Registry, and to "docker pull" it from any other Docker host anywhere in
the world. You are one PHP script away from setting up your own PostgreSQL-as-a-Service provider
About Jérôme Petazzoni
Jérôme is a senior engineer at dotCloud, where he rotates between Ops, Support
and Evangelist duties and has earned the nickname of “master Yoda”. In a
previous life he built and operated large scale Xen hosting back when EC2 was
just the name of a plane, supervized the deployment of fiber interconnects
through the French subway, built a specialized GIS to visualize fiber
infrastructure, specialized in commando deployments of large-scale computer
systems in bandwidth-constrained environments such as conference centers,
and various other feats of technical wizardry. He cares for the servers powering
dotCloud, helps our users feel at home on the platform, and documents the
many ways to use dotCloud in articles, tutorials and sample applications. He’s
also an avid dotCloud power user who has deployed just about anything on
dotCloud – look for one of his many custom services on our Github repository.
Connect with Jérôme on Twitter! @jpetazzo

How to build a PostgreSQL server that weights less than 20Mb - using Docker & Buildroot

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

How to build a PostgreSQL server that weights less than 20Mb - using Docker & Buildroot