A talk given at the Google-hosted Container Security Summit on Wednesday, February 12th, 2020 in Seattle, Washington. This talk covered the impact of work done at the lower-level runtimes layer and up through layers like cri-o, containerd, and Docker to bring specific security features to overall platforms like Kubernetes.
1. @estesp
Enabling security via runtimes
A container runtime perspective on security
Phil Estes
Distinguished Engineer & CTO, IBM Cloud Platform
CNCF containerd project maintainer
OCI Technical Oversight Board chair
2. @estesp
Runtimes circa 2014
● No seccomp
● No pids limit
● No user namespaces
● No rootless
● No content-addressable image format
● No sandboxes
● ...
3. @estesp
Runtime Reorganization
● Open Container Initiative (OCI) and CNCF formed: Summer 2015
● libcontainer project donated to become “runc”
● Docker architecturally splits into the engine, containerd, and runc
● Later, containerd and cri-o emerge as standalone runtimes without
Docker engine
4. @estesp
Runtime Security Progress
Late 20162015
Early 2016 2017
PIDS controller
SECCOMP support
Docker 1.10
● user namespaces, seccomp
profiles, auth plugins
● New content-addressable
image format (v2.2)
No new privs support
User NS nesting
Ambient caps
readonly+userns
rootless runc
7. @estesp
Resources, Attack Surface, ...Privilege!
Rootless containers refers to the ability for an unprivileged user to create, run and
otherwise manage containers.
● Docker “Shocker” 2014
● Docker CVE-2014-9357
● Containerd #2001 (2018)
● Runc #1962 (2019)
● K8s CVE-2017-1002101, CVE-2017-1002102
● K8s CVE-2018-1002105
● …?
8. @estesp
Rootless Containers
● Built on Linux kernel user namespaces functionality
○ An unprivileged user manages a user and group range in which containers
will run
● Limitations for privileged actions handled via userspace
functionality
○ slirp4netns - required for network creation/interactions
○ fuse-overlayfs - required for root filesystem interactions and handles
UID/GID shifts
○ cgroups - cannot manage cgroups in rootless containers, v2 is solution
● Upstream kernel and related projects (overlayfs) continue
work to remove limitations/performance impacts
● Available at all levels (runc, container runtimes, builders,
Kubernetes***)
11. @estesp
Runtime shim v2 API
● Minimal and scoped to the execution lifecycle of a container
● Binary naming convention
○ Type io.containerd.runsc.v1 -> Binary containerd-shim-runsc-v1