Namespaces, Cgroups and systemd document discusses:
1. Namespaces and cgroups which provide isolation and resource management capabilities in Linux.
2. Systemd which is a system and service manager that aims to boot faster and improve dependencies between services.
3. Key components of systemd include unit files, systemctl, and tools to manage services, devices, mounts and other resources.
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
First steps on CentOs7
1. Namespaces, Cgroups and systemd
Firsts steps on
CentOS 7
Marc Cortinas – Production Services -
Webops - March 2015
2. Why?
• Why? Motivations:
• 1. Trying to understanding why Lennart_Poettering was a “little bit” prepotent
• 2. Know the mainly changes on Linux in the next years
• 3. Learn more deeply CentOs7 after change the default distribution on Odigeo
• Fosdem Conferences - whats_new_in_systemd,_2015_edition
3. Agenda¿Why colors?
----------------------------------------- so far…
Memory Spaces and IPC - dbus
The kernel - udev
Namespaces
Virtualizations
------------------------------------------ more close…
Init systems on Unix
Control groups
• Overview
• Subsystems or resource controllers
• Demo
• Commands
Dbus
AutoFS
------------------------------------------ void main ()
SystemD
• Motivations
• Definition and features
• Overview
• Unit Files, Core components and libraries
• Commands
• Other Components:
1. Udev
2. JournalD
3. NetworkD
4. ConsoleD
5. LoginD
6. TimedateD
7. Systemd-Nspawn
4. Memory Spaces and Inter Process
comunication
User Space – Memory space to run user processes
• Only kernel processes can access a user space
• System prevents one process from interfering with another process
Kernel Space – Memory Space where kernel processes run
• System call is the only way user has access
• Arguments from system call exported from user space to kernel
space
• User process became kernel process when it executes system call
Communication Inter Process, not yet dBus
• Half-duplex UNIX Pipes best sysadmin friend
• Named Pipes, ack UNIX socket AF_UNIX
• SYS V IPC
– IPC:
– Messages queues
– Semaphores
– Shared memory
Linux Kernel Archs - Amir Hossein
http://www.tldp.org/LDP/lpg/node7.html
5. The Kernel
Linux Kernel Archs - Amir Hossein
Kernel: modules or sub-system that provides operating systems functions
ukernel: Includes code necessary to allow the system to proves major functionallity
– Ipc
– Memory Management
– Process Management
– IO Management
Flexible, modular, easy to implement
Monotlhitic kernel: https://en.wikipedia.org/wiki/Monolithic_kernel
- entire operating system is working in kernel space and is alone in supervisor mode
- defines a high-level virtual interface over computer hardware
- device drivers can be added to the kernel as modules ,or not? uDev..
Better Performance
Hybrid Kernel, nanokernel, picokernel, etc….
6. Namespaces
Namespaces – lightweight process virtualization
• Isolation: Enable a process (or group) to have different views of the system than
other processes
• Much likes Zones in Solaris
• No hypervisor layer
• Only one system call added (setns())
• Started in kernel 2.6.23 and finished in 3.8
• 6 namespaces
– Mount namespaces (CLONE_NEWNS, Linux 2.4.19) isolate the set of filesystem mount points seen by a
group of processes
– UTS namespaces (CLONE_NEWUTS, Linux 2.6.19) isolate two system identifiers—nodename and
domainname
– IPC namespaces (CLONE_NEWIPC, Linux 2.6.19) isolate certain interprocess communication (IPC)
resources, namely, System V IPC objects and (since Linux 2.6.30) POSIX message queues
– PID namespaces (CLONE_NEWPID, Linux 2.6.24) isolate the process ID number space
– Network namespaces (CLONE_NEWNET, started in Linux 2.6.24-2.6.29) provide isolation of the system
resources associated with networking
– User namespaces (CLONE_NEWUSER, started in Linux 2.6.23 and completed in Linux 3.8) isolate the user
and group ID number spaces
http://lwn.net/Articles/531114/
http://www.haifux.org/lectures/299/netLec7.pdf
8. Init systems on Unix
LINKS:
http://en.wikipedia.org/wiki/Init
OS Init System Family
MacOSX LaunchD (from 10.5.1) BSD
NetBSD SysVinit BSD
OpenBSD SysVinit BSD
FreeBSD SysVinit BSD
Debian Upstart/SystemD/SysVinit Linux
Ubuntu Upstart Linux
RHEL6/CentOS6 SysVinit + LSB Linux
RHEL7/CentOS7 SystemD Linux
Solaris SMF Solaris
9. Cgroups
• Project was born in Google on 2006
• It’s called process container.
• Merged in kernel into release 2.6.24
1) an upstream kernel feature that allows system resources to be
partitioned/divided up amongst different processes, or a group of processes.
2) user-space tools which handle kernel control groups mechanism
Cgroup - set of tasks with a set of parameters for one or more subsystems
Subsystem - "resource controller" that schedules a resource or applies per-
cgroup limits
Hierarchy - a set of cgroups arranged in a tree
LINKS:
https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
https://www.youtube.com/watch?v=81j1WF5xEZc
http://fedoraproject.org/wiki/Features/ControlGroups
http://docs.fedoraproject.org/en-US/Fedora/16/html-single/Resource_Management_Guide/index.html
10. Cgroups – Subsystems OR Resource Controllers
• blkio — sets limits on input/output access to and from block devices;
• cpu — uses the CPU scheduler to provide cgroup tasks an access to the CPU. It is mounted
together with the cpuacct controller on the same mount;
• cpuacct — creates automatic reports on CPU resources used by tasks in a cgroup. It is
mounted together with the cpu controller on the same mount;
• cpuset — assigns individual CPUs (on a multicore system) and memory nodes to tasks in a
cgroup;
• devices — allows or denies access to devices for tasks in a cgroup;
• freezer — suspends or resumes tasks in a cgroup;
• memory — sets limits on memory use by tasks in a cgroup, and generates automatic reports
on memory resources used by those tasks;
• net_cls — tags network packets with a class identifier (classid) that allows the Linux traffic
controller (the tc command) to identify packets originating from a particular cgroup task;
• perf_event — enables monitoring cgroups with the perf tool;
• hugetlb — allows to use virtual memory pages of large sizes, and to enforce resource limits on
these pages.
#yum install kernel-doc and read /usr/share/doc/kernel-doc-<kernel_version>/Documentation/cgroups/
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/cgroups
12. Cgroups – Commands on libcgroup
tools
Description command
installation of packages tool to manage kernel API yum install libcgroup libcgroup-tools
creates persistent file snapshotting the currently hierarchy on runtime cgsnapshot -f /etc/cgconfig.conf
listing all available hierarchies along with their current mount points lssubsys -am
mount net_prio crontoller to a virtual file system mount -t cgroup -o net_prio none /sys/fs/cgroup/net_prio
unmount net_prio crontoller to a virtual file system umount /sys/fs/cgroup/controller_name
create transient cgroups in hierarchies, alternative 1 cgcreate -t uid:gid -a uid:gid -g controllers:path
create transient cgroups in hierarchies, alternative 2 mkdir /sys/fs/cgroup/net_prio/lab1/group1
remove cgroups cgdelete [-r] net_prio:/test-subgroup
set controller parameters by running cgset -r parameter=value path_to_cgroup
copy the parameters of one cgroup into another cgset --copy-from path_to_source_cgroup path_to_target_cgroup
Set controller parameters permanently vi /etc/cgconfig.conf ; systemctl stop cgconfig ; systemctl start cgconfig
Move a process into a cgroup cgclassify -g controllers:path_to_cgroup pidlist
Launch processes in a manually created cgroup cgexec -g controllers:path_to_cgroup command arguments
find the controllers that are available in your kernel cat /proc/cgroups
find the mount points of particular subsystem to find the mount points of particular subsystem
list the cgroups lscgroup
To restrict the output to a specific hierarchy lscgroup cpuset:adminusers
display the parameters of specific cgroups cgget -r parameter list_of_cgroups
13. Now dbus, next kdbus!
Goal: Improvements for Inter Process Communication
Before dbus: Pipe, Named pipe, queue messages, semaphores, shared memory.
Dbus: method call transactions, signals, properties, OO, broadcasting, introspection, policy,
activation, synchronization, type-safe marshalling, security, monitoring, exposes APIs, …. High level
concept!!!!
Dbus limitation: 10 copies + 4 complete validations + 4 context switches in duplex full transaction,
suitable for control but not payload,
Kdbus improvements: 2 or fewer copies + 2 validations + 2 context switches and more
Dbus Arch:
• Libdbus - library that allows two applications to connect to each other and exchange messages
• dbus-daemon - a message-bus daemon executable, built on libdbus
• wrapper libraries based on particular application frameworks
Linx Conf - Lennart Pottering - Dbus
DBus Freedesktop Project
Linux documentation project (tldp) - IPC
14. AutoFS
• What’s autoFS?
automount is a program for automatically mounting directories on an as-needed basis.
• Why autoFS in systemd?
Due to speed up boot process improving parallelization of startup process and
approach queue messaging into kernel until target proccess has been properly loaded.
RHEL 7 - Documentation
GIT repository in kernel code for autofs
Ubuntu help for autoFS
Man page for autofS
15. Motivations for SystemD
• Decrease the time used to init system with SysV solving dependencies (launchD)
• Bash language used to manage daemons, slow language and it could change base
on environment vars. (migrate to C)
• System need mount devices first before daemons (autofs)
• Keep track process after parent die (cgroups)
• Starts ordered and resolts dependencies (Require|Wants)
• Start only the services required on-demand (by default)
PMO systemD: Lennart Poettering
LINKS:
http://0pointer.de/blog/projects/systemd.html
http://0pointer.de/blog/projects/systemd-update.html
http://0pointer.de/blog/projects/systemd-update-2.html
http://0pointer.de/blog/projects/systemd-update-3.html
16. What’s systemD?
1. Boot system designed to start up the system more efficiently
1. Parallelization of start process, using sockets (AF_UNIX/AF_INET) and D-bus.
2. Suite of programs to manage daemons trying to avoid use BASH scripts with
environment variables dependency.
2. Daemon to administration system designed exclusively from API of kernel Linux
3. First process started on userspace
4. Framework to manage services and daemons dependencies
5. Daemon process running in background, added sufix -d-
6. Uses cgroups and fanotify to manage resources
7. Use AutoFS to avoid queue for any “fopen” call request
8. Keep track of process due to cgroups
18. Unit Files, Core components and
libraries
1. Unit file: Configuration file trying replace traditional startup bash scripts.
Service: A process or a group of processes based 1 cfg file
Scope: group - A group of externally created processes, registered with systemd
Slice: skel - A group of hierarchically organized units. Slices do not contain processes, they
organize a hierarchy in which scopes and services are placed.
(Default slices: -.slice; system.slice ; user.slice ;machine.slice)
1. Components
• systemd is a system and service manager for Linux operating systems.
• systemctl may be used to introspect and control the state of the systemd system and
service manager.
• systemd-analyze may be used to determine system boot-up performance statistics and
retrieve other state and tracing information from the system and service manager.
service socket device mount
automount swap target path
timer snapshot slice scope
19. SystemD commands
SYSV command OR Description SystemD command
init 3 systemctl isolate multi-user.target
service httpd [command] systemctl [command] httpd
ls /etc/rc.d/init.d/ systemctl list-units --all
chkconfig httpd [on|off]
D: creates/remove a unit file in the /usr/lib/systemd/system/ directory
(Persistent cgroups)
systemctl [enable|disable] httpd
D: run the top utility in a service unit in a new slice called test (Transcient) systemd-run --unit=toptest --slice=test top -b
D: Stop the unit non-gracefully signal systemctl kill name.service --kill-who=PID,... --signal=signal
chkconfig frobozz --add systemctl daemon-reload
runlevel systemctl list-units --type=target
D: limit the CPU and memory usage of httpd.service systemctl set-property httpd.service CPUShares=600 MemoryLimit=500M
D: limit the CPU and memory usage of httpd.service, temporary systemctl set-property --runtime httpd.service CPUShares=600
D: Recursively show control group contents systemd-cgls
D: show control group for resource systemd-cgls memory
D: Add cgroup info in ps psc='ps xawf -eo pid,user,cgroup,args'
D: List dependencies in target systemctl show -p "Wants" multi-user.target
D: Analyze system boot-up performance systemd-analyze
D: Show top control groups by their resource usage systemd-cgtop
D: Run programs in transient scope or service units systemd-run
D: Control the systemd machine manager (LXC or VM) Machinectl
D: show cgroups hierarchy attached to a process cat proc/PID/cgroup
20. Other components on systemD
• Udevd: is a device manager for the Linux kernel, which handles the /dev directory
and all user space actions when adding/removing devices
• Journald: systemd-journald is a daemon responsible for event logging
• Consoled: systemd-consoled provides a user console daemon, intending to replace
the Linux kernel's virtual terminal
• Logind: systemd-logind is a daemon that manages user logins and seats in various
ways
• Networkd: networkd allows systemd to perform various networking configurations,
features such as DHCP server or VXLAN support
• Timedated: systemd-timedated is a daemon that can be used to control time-
related settings, such as the system time, system time zone, or selection between
UTC and local time zone system clock
• Systemd-nspawn: Spawn a namespace container for debugging, testing and
building
21. Udev – Device Manager
Device Manager for Linux kernel, project was born on November 2003, succesor of devfsd.
udev was introduced in Linux 2.5. April 2012, udev's codebase was merged into the systemd source tree.
In October 2012, Linus Torvalds criticized Kay Sievers' approach of udev maintenance and bugs related to
firmware loading: Not because firmware loading cannot be done in user space. But simply because udev
maintenance since Greg gave it up has gone downhill.
Goal: Manage device nodes mapping in /dev directory have been a static set of files
Udev arch:
• libudev that allows access to device information; it was incorporated into the systemd software bundle
• User space daemon udevd that manages the virtual /dev
• Administrative command-line utility udevadm for diagnostics.
Udev Features:
• Runs in userpace
• Dynamicalle create/remove device files
• Provides consistent naming
• Provides user-space API
• Kernel 2.6, added sysfs filesystem in /sys with all infromation about devices/filesystmes
• /etc/udev/rules.d/*.rules define rules post-actions when kernel detect some device and info is
populated in sysfs
Device Manager Tutorial - udevadm
23. NetworkD – not in CentOS7 – added on
CoreOS
Added on systemd in v209, 20th february 2014. adding dhcp server or VxLAN support
on July 2014 into release v215 systemd.
• Main goal: allows systemd to perform various networking configurations
• Cfg Path: /etc/systemd/network
• Enable:
– systemctl enable systemd-networkd.service
– systemctl start systemd-networkd.service
• CFG type files:
• .link files: networkd performs basic settings on network devices (name of the network
interface, MTU, Wake on LAN, modified MAC address, configuration file for systemd-udev)
• network files: cfg file for systemd-netword, same syntaxi .link files, match and network tag
• .netdev files: Even if you have to create virtual network devices, look no further than networkd
bridges, bonded interfaces and VLANS
Tip: Learn how to linux add predictible network name interfaces
LINKs:
Linux Magazine Example Configurations
CoreOs Documentation Example configurations
Networkd Project Freedesktop
24. ConsoleD 1/2
The current status is….
• Linux Console (Linus Trobald 1991), system console internal to the kernel, it’s a device I/O all kernel
messages and allow login in single user mode. There are 2 implementations
1. Text mode – Compatible with PC systems with CGA, EGA, MDA, VGA = LEGACY (array 2D display)
2. Framebuffer – (fbdev) is a graphic hardware-independent abstraction layer, used in default modern linux
distributions
• Virtual Console, multiplex linux console in a several (7) consoles using VT system, running in kernel
space. Implementations:
• Teminal Emulator runs in user space and let load graphical environments, GNOME, KDE, etc…
Systemd-consoled development wants …
• Released inside systemd v217, October 2014, git commit here
• Main goal: systemd-consoled provides a user console daemon, intending to replace the Linux
kernel's virtual terminal, running in userpace
• Uses kmscon, project borned on Nov 2011. kmscon = KMS (Kernel-Mode-Setting, Kernel API performs
mode-settings) + DRM (Direct-Rendering-Manager of kernel to acces graphical devices)
LINKs:
Wikipedia - Linux console
Wiki Freedesktop.org – kmscon
20 years of CONFIG_VT, according linux-kernel VT
26. LoginD
• Logind was merged inside systemd on v30 released in 1 august 2011
• What has logind build for:
• Keeping track of users and sessions, their processes and their idle state
• Providing PolicyKit-based access for users to operations such as system shutdown or sleep
• Implementing a shutdown/sleep inhibition logic for applications
• Handling of power/sleep hardware keys
• Multi-seat management
• Session switch management
• Device access management for users
• Automatic spawning of text logins (gettys) on virtual console activation and user runtime directory
management
• User sessions are registered in logind via the pam_systemd(8) PAM module. (pam_systemd.so)
– - creates/destroy /run/user/$USER
– - $XDG_SESSION_ID (1 Id for each user)
– - add/delete systemd scope copyying skel from user.slice
LINKS:
Wiki freedesktop.org – multiseat
Wiki freedesktop.org – logind
manpage freedesktop.org - pam_systemd
27. TimedateD
• Timedated was merged inside systemd on v30 released in 1 august
2011
• Goal: daemon that can be used to control time-related settings,
such as the system time, system time zone, or selection between
UTC and local time zone system clock. It is accessible through D-Bus.
• The system time
• The system timezone
• A boolean controlling whether the system RTC is in local or UTC
timezone
• Whether the systemd-timesyncd.service(8) (NTP) services is
enabled/started or disabled/stopped. See systemd-
timedated.service(8) for more information.
• Wiki freedesktop.org - timedated
28. systemd-nspawn and machinectl
• Systemd-nspawn is chroot on steroids
• Goal - Spawn a minimal namespace container for debugging, testing and building
# yum –releasever=20 --nogpg --installroot=/srv/mycontainer --disablerepo='*' --
enablerepo=fedora install systemd passwd yum fedora-release vim-minimal
…
# systemd-nspawn -bD /srv/mycontainer/
[root@fedora20 ~]# machinectl
MACHINE CONTAINER SERVICE
mycontainer container nspawn
LINKS
Lennart Poettering, Linux Conf 2013
Wiki - fedoraproject.org – SystemdLightweightContainers
Wiki - freedesktop.org - VirtualizedTesting
29. What’s new in SystemD? 2015…
Main changes announced in FOSDEM
• new tool systemd-hwdb for querying the hardware metadata database , decoupled from the old libudev library
• machinectl gained support for two new "copy-from" and "copy-to" commands for copying files from a running container
• machinectl gained support for a new "bind" command to bind mount host directories into local containers
• Routes configured with networkd may now be assigned a scope in .network files
• networkd may now configure IPv6 link-local addressing in addition to IPv4 link-local addressing
• The IPv6 "token" for use in SLAAC may now be configured for each .network interface in networkd.
• When the user presses Ctrl-Alt-Del more than 7x within 2s an immediate reboot is triggered
• networkd gained support for creating "ipvlan", "gretap","ip6gre", "ip6gretap" and "ip6tnl" network devices. Moreover, gained
support for collecting LLDP network announcements
• systemd-nspawn's --image= option is now capable of dissecting and booting MBR and GPT disk images, This allows running
cloud images from major distributions directly with systemd-nspawn, without modification.
• networkd .network files gained support for configuring per-link IPv4/IPv6 packet forwarding as well as IPv4 masquerading.
• The default TERM variable to use for units connected changes to vt220 rather than vt102
• systemd now provides a way to store file descriptors per-service in PID 1.This is useful for daemons to ensure that fds they
require are not lost during a daemon restart
• The directory /var/lib/containers/ has been deprecated and been replaced by /var/lib/machines
CONCLUSIONS: They are working on improving systemd-nspawn (with BTFRS) and networkD, mainly.
Timeline last code releases
Systemd v218 – 11 dec 2014 - http://cgit.freedesktop.org/systemd/systemd/tag/?id=v218
FOSDEM 2015 – 1 Feb 2015 - https://fosdem.org/2015/schedule/event/whats_new_in_systemd,_2015_edition/
maybe, someday,fosdem video will be gained in http://video.fosdem.org/2015/devroom-distributions/
Systemd v219 – 16 Feb 2015 - http://cgit.freedesktop.org/systemd/systemd/tag/?id=v219
Linux 4.0 – 22 Feb 2015 - http://lkml.iu.edu/hypermail/linux/kernel/1502.2/04059.html
30. • Thanks... Questions?
• Tips:
1. Wiki freedesktop.org TipsAndTricks
2. Trick to Know systemd version –
fedora20 ~]# /usr/bin/timedatectl --version
systemd 208