Delve Labs was present during the GoSec 2016 conference, where our lead DevOps engineer presented an overview of the current options available for securing Docker in production environments.
https://www.delve-labs.com
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Docker Security in Production Overview
1. Docker Security In Production
#DevOps #Infrastructure #Deployment #Security
2. ➔ CI/CD chain security ( git / notary / registry )
◆ … export DOCKER_CONTENT_TRUST=1
➔ Microservices architecture
◆ … secret management (Vault & al.)
◆ … Orchestration & Deployment Strategies
➔ Keeping binaries & libs. up to date in production
➔ Monitoring / Alerting / Metric / SOC / SIEM / etc.
What this talk is NOT about
3.
4. Infrastructure information leak
Denial of Service
Data corruption
Software & Crypto exploit
Container escape
Root / Kernel exploit
Hypervisor escape
Hardware Implant, etc.
Reconnaissance
Loss of Availability
Loss of Integrity
Loss of Confidentiality
Privilege Escalation to Host
Host Auditability compromised
Pivot to other Host
Tin foil hat & Cryptopocalypse !
Type of attack Threat “hierarchy”
⇦
⇦
⇦
⇦
⇦
⇦
⇦
⇦
5. Docker builds on Kernel & Host Security
➔ Grsecurity kernel
Randomization++, Bound checking,
Fork delay, Hardened seccomp BPF
➔ SELinux / AppArmor
Complex execution profiles, {White,Black}-listing
➔ Sysctl settings
fd limit, IP stack, sysrq, buffers, etc.
➔ Unattended-upgrades
And all the typical hardening
& distro compile flags!
6. Docker Daemon
➔ Limit docker group : docker.sock
Access to socket = root
➔ Authorization plugin API
Docker 1.10+: --authorization-plugin
should help mitigate previous issue soon
➔ docker-machine & TLS
Use --tls-verify (port 2376)
➔ SELinux / AppArmor Profile
apparmor.d/docker + restrictions
limit path, resources, etc.
➔ Export logs outside of host
--log-driver= (syslog, fluentd, ...)
7. cgroups hardware resource limits
➔ Mitigate potential DoS attacks
Limit memory, disk, network I/O & CPU share
➔ cgroups only limit resources share, not access
Not blocking access to:
kcore, modprobe, sysrq, mknod, eth0, ...
➔ You can define your own initial cgroup
--cgroup-parent to inherit a previous context
8. Limiting CPU usage
➔ Limit the total or relative amount of CPU time share
--cpu-shares relative weight (== cpu_shares: 100)
--cpu-period CFS (QoS) period
--cpu-quota CFS (QoS) quota
➔ Limit which CPU or RAM node can be used
--cpuset-cpus CPU affinity (== cpu_set: 0,1)
--cpuset-mems Memory NUMA node (ie: 0-3, 0,1)
10. Device I/O & Filesystems
➔ Put docker on its own partition
/var/lib/docker as a ZFS/BTRFS volume (snapshots, quotas)
➔ Minimum rights
“rwm” options, i.e: --device=/dev/zero:/dev/zero:r
➔ Mount root & volumes as read-only
For volumes: /path:roz (Zz = SELinux label)
for root (/): read_only: true
Use with --shm-size & /dev/shm for pid files, scratch, tmp, etc.
--tmpfs /run:rw,noexec,nodev,nosuid,size=8m
➔ Limit allocated I/O bandwidth
--device-read-bps, --device-write-bps
--device-read-iops, --device-write-iops
--blkio-weight-device 10 -> 1000
11. ➔ Create an internal N-Tier architecture
networks: ( docker-compose 1.6+ & version: ‘2’ ) || --net=
➔ Think about inter-container communication
--icc=false + --link= (but deprecated), --ip-forward=
➔ Disable userland-proxy
--userland-proxy=false … saves memory & faster
➔ Use iptables and tc
Limit access and use QoS if necessary.
Networking
12. ➔ Set your typical soft & hard limits
Daemon: --default-ulimit nofile=50:100
Container: --ulimit nofile=50:100
compose 1.6+: ulimit: nofile: soft:50 hard:100
➔ Prevent fork bombs: threads / process limits
compose 1.6+: ulimits: nproc: soft:32 hard:64
Docker 1.11+
& Kernel 4.3+: --pids-limit (cgroup support)
➔ Think about your restart policy
restart: always? no?
System resources & ulimits
13. Namespaces
➔ Currently namespaced resources
Audit, cgroups, IPC, mount, NET, PID, Syslog, UID, UTS
--userns-remap=default (new in 1.10+), *but*:
Per daemon, not per container (--userns=host not yet in compose)
Volumes UID/GID also remapped...
Incompatible with IPC/PID/NET NS sharing...
i.e. --net=container:app1, --readonly filesystem...
➔ NOT (yet) Namespaced
The Kernel, LSM, UID (by default), keyring,
ring buffer (dmesg), /proc/{sys}, /sys, /dev/{shm} ...
➔ A lot of work & cleanup still required for namespaces
Many holes over the years:
CVE-2010-0006, CVE-2011-2189, CVE-2013-1858, CVE-2013-1956, CVE-2013-4205,
CVE-2014-4014, CVE-2014-5206, CVE-2014-5207, CVE-2014-8989, CVE-2015-8709, (!)
15. Seccomp (Secure Computing)
➔ Extremely granular filter
BPF filters of syscalls + arguments
Docker default blacklist (whitelist in the future)
➔ Use tools to create profiles
dockersl.im, genSeccomp.sh, etc.
strace -c -f -S name ls 2>&1 >/dev/null | tail -n +3 | head -n -2 | awk '{print $(NF)}'
➔ --seccomp:/path/profile.json
Disable default Seccomp filtering --seccomp:unconfined
➔ Use security_opt: - no-new-privileges
Keeps UID, GID & LSM Labels + can’t gain Capabilities/SUID
16. ➔ Swarm init / join
Expose master nodes carefully (hold cluster’s secrets)
Mutually auth. TLS, AES-GCM, 12 hours key rotation (Gossip / Raft)
➔ Use overlay network encryption
docker network create -d overlay -o encrypted mynet
- Keys shared with tasks & services, but not «docker run»
➔ Mutually authenticate your microservices too
Microservices should not rely on overlay encryption:
Authenticate & Encrypt [container ↔ container] communications
➔ «docker-compose bundle» - experimental status
Lacks support for most useful runtime security options, maybe in 1.13+?
Swarm Networking [1.12+]
17. ➔ Never use --privileged
Use granular solutions previously described
➔ Run process as a user
Don’t run inside container as root: use nobody
Remove SUID, strip unused files, etc.
➔ Layer as many security features
Not all of them will apply, work, be enabled, etc.
➔ Don’t forget to harden applications!
NGINX configs, exposed services, databases, etc.
Containers Runtime Security