SlideShare a Scribd company logo
1 of 32
parallels.com || openvz.org || criu.org
Seven Problems
of Linux Containers
Kir Kolyshkin
<kir@openvz.org>
28 April 2013 LinuxFest Northwest
parallels.com || openvz.org || criu.org
Seventy Seven Problems
of Linux Containers
Kir Kolyshkin
<kir@openvz.org>
28 April 2013 LinuxFest Northwest
(of which I am going to cover six)
parallels.com || openvz.org || criu.org
Problem 1: Effective virtualization
●
Virtualization is partitioning
●
Historical way: $M mainframes
●
Modern way: virtual machines
●
Problem: performance overhead
●
Partial solution: hardware support
(Intel VT, AMD V)
parallels.com || openvz.org || criu.org
Solution: isolation
●
Run many isolated userspace instances
on top of sone single (Linux) kernel
●
All processes see each other
– files, process information, network,
shared memory, users, etc.
●
Make them unsee it!
parallels.com || openvz.org || criu.org
parallels.com || openvz.org || criu.org
One historical way to unsee
chroot()
parallels.com || openvz.org || criu.org
Namespaces
●
Implemented in the Linux kernel
– PID
– net
– IPC
– UTS
– mnt
– user
●
clone() with CLONE_NEW* flags
parallels.com || openvz.org || criu.org
Problem 2: Shared resources
●
All containers share the same set of resources
(CPU, RAM, disk, various kernel things ...)
●
Need fair distribution of goods so everyone
gets their share
●
Need DoS prevention
●
Need prioritization
– “All animals are equal, but some animals are more
equal than others” -- George Orwell
parallels.com || openvz.org || criu.org
parallels.com || openvz.org || criu.org
Solution: OpenVZ resource controls
●
OpenVZ:
– user beancounters
●
controls 20 parameters
– hierarchical CPU scheduler
– disk quota per containers
– I/O priorities per-container
●
Dynamic control, can “resize” runtime
parallels.com || openvz.org || criu.org
Solution: cgroups
●
Cgroups is a mechanism to control resources
per hierarchical groups of processes
●
Cgroups is nothing without controllers:
– blkio, cpu, cpuacct, cpuset, devices, freezer,
memory, net_cls, net_prio
●
Cgroups are orthogonal to namespaces
●
Still a work in progress (kernel memory)
parallels.com || openvz.org || criu.org
Problem 3: easy resources
●
User Beancounters are complicated:
– http://wiki.openvz.org/UBC_consistency_check
– user has to set all these parameters
– some of which are interdependent
●
We created a collection of valid configs,
●
... wrote a whole book about UBC
●
... and a set of tools to help
parallels.com || openvz.org || criu.org
parallels.com || openvz.org || criu.org
Solution: VSwap
●
Only two primary parameters: RAM and swap
– others still exist, but no longer required to set
●
Swap is virtual, no actual I/O is performed
●
Slow down to emulate real swap
●
Only when actual global RAM shortage
occurs,
virtual swap goes into the real swap
●
Currently only available in OpenVZ kernel
parallels.com || openvz.org || criu.org
Problem 4: fast live migration
●
We can migrate an OpenVZ container
from one physical server to another
without a shutdown
●
We want to do it fast even for huge containers
– huge disk: use shared storage
– huge RAM: ???
parallels.com || openvz.org || criu.org
Normal migration process
●
(Assuming shared storage)
●
1 Freeze the container
●
2 Dump its complete state to a dump file
●
3 Copy dump file to destination server
●
4 Undump
●
5 Unfreeze
●
Problem: huge dump file
parallels.com || openvz.org || criu.org
Solution 1: network swap
●
1 Dump the minimal memory, lock the rest
●
2 Restore the minimal memory,
mark the rest as swapped out
●
3 Set up network swap from the source
●
4 Unfreeze. Missing RAM will be “swapped in”
●
5 Migrate the rest of RAM and kill it on source
parallels.com || openvz.org || criu.org
parallels.com || openvz.org || criu.org
Solution 1: network swap
●
1 Dump the minimal memory, lock the rest
●
2 Copy, undump what we have,
mark the rest as swapped out
●
3 Set up network swap served from the source
●
4 Unfreeze. Missing RAM will be “swapped in”
●
5 Migrate the rest of RAM and kill it on source
●
PROBLEM? Reliability, no way to rollback
parallels.com || openvz.org || criu.org
Solution 2: Iterative RAM migration
●
1 Ask kernel to track modified pages
●
2 Copy all memory to destination system
●
3 Ask kernel for list of modified pages
●
4 Copy those pages
●
5 GOTO 3 until satisfied
●
6 Freeze and do migration as usual
parallels.com || openvz.org || criu.org
Problem 5: upstreaming
●
OpenVZ was developed separately
●
Then we wanted to merge it upstream
(i.e. to vanilla Linux kernel)
●
Problem?
parallels.com || openvz.org || criu.org
parallels.com || openvz.org || criu.org
Problem 5: upstreaming
●
OpenVZ was developed separately
●
Then we wanted to merge it upstream
(i.e. to vanilla Linux kernel)
●
Problem:
●
upstream devs are not accepting our work
parallels.com || openvz.org || criu.org
Solution 1: rewrite from scratch
●
User Beancounters -> CGroups
●
Did 2 rewrites for PID namespace
until it finally got accepted
●
Network namespace redone
●
It works!
●
about 1500 patches got landed to vanilla
●
II Parallels made it to top10 contributors
parallels.com || openvz.org || criu.org
Solution 2: CRIU
●
We tried hard to merge checkpoint/restore
●
Other people tried hard too, no luck
●
Can't make it to the kernel, let's go userspace
●
With minimal kernel intervention when
required
●
Kernel exports most of information already, so
let's just add missing bits and pieces
parallels.com || openvz.org || criu.org
CRIU
●
Checkpoint / Restore (mostly) In Userspace
Tools currently at version 0.4
●
Will do 1.0 release this year
●
Kernel 3.8 has about 120 patches from us
– 95% of needed features are there
●
Memory snapshot recently made it to -mm tree
parallels.com || openvz.org || criu.org
parallels.com || openvz.org || criu.org
Problem 6: common file system
●
Container is just a directory on host,
all CTs reside on the same FS
●
File system journal is a bottleneck
●
Lots of small-size files I/O on CT backup
●
No sub-tree disk quota support in upstream
●
No per-container snapshots
●
Live migration: rsync -- changed inodes
●
File system type and properties are fixed
parallels.com || openvz.org || criu.org
Solution 1: LVM
●
Only works only on top of block device
●
Hard to manage (e.g. how to migrate huge
volume?)
●
No dynamic allocation
●
Complicated management
parallels.com || openvz.org || criu.org
Solution 2: loop device
●
VFS operations leads to double page-caching
– (already fixed in the recent kernels)
●
No dynamic allocation, max space is used
●
Limited feature set
parallels.com || openvz.org || criu.org
Solution 3: ploop
●
Basic idea: same as loop, just better
●
Modular design:
– various image formats (qcow2 in TODO)
– various I/O backends
●
More features:
– live resize
– instant live snapshots
– write tracker to help in live migration
parallels.com || openvz.org || criu.org
Any problems questions?
●
kir@openvz.org
●
Twitter: @kolyshkin

More Related Content

What's hot

Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...dotCloud
 
Linux Virtualization
Linux VirtualizationLinux Virtualization
Linux VirtualizationOpenVZ
 
Lcna example-2012
Lcna example-2012Lcna example-2012
Lcna example-2012Gluster.org
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challengesGluster.org
 
Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSRoberto Franchini
 
GlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big dataGlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big dataRoberto Franchini
 
OpenZFS code repository
OpenZFS code repositoryOpenZFS code repository
OpenZFS code repositoryMatthew Ahrens
 
OpenZFS send and receive
OpenZFS send and receiveOpenZFS send and receive
OpenZFS send and receiveMatthew Ahrens
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster.org
 
OpenZFS Channel programs
OpenZFS Channel programsOpenZFS Channel programs
OpenZFS Channel programsMatthew Ahrens
 
How Can OpenNebula Fit Your Needs: A European Project Feedback
How Can OpenNebula Fit Your Needs: A European Project FeedbackHow Can OpenNebula Fit Your Needs: A European Project Feedback
How Can OpenNebula Fit Your Needs: A European Project FeedbackNETWAYS
 
How can OpenNebula fit your needs - OpenNebulaConf 2013
How can OpenNebula fit your needs - OpenNebulaConf 2013 How can OpenNebula fit your needs - OpenNebulaConf 2013
How can OpenNebula fit your needs - OpenNebulaConf 2013 Maxence Dunnewind
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overviewGluster.org
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliAnne Nicolas
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to BottomKernel TLV
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and WhereKernel TLV
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmapGluster.org
 

What's hot (20)

Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
 
Linux Virtualization
Linux VirtualizationLinux Virtualization
Linux Virtualization
 
Lcna example-2012
Lcna example-2012Lcna example-2012
Lcna example-2012
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challenges
 
Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFS
 
GlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big dataGlusterFs: a scalable file system for today's and tomorrow's big data
GlusterFs: a scalable file system for today's and tomorrow's big data
 
OpenZFS code repository
OpenZFS code repositoryOpenZFS code repository
OpenZFS code repository
 
OpenZFS - AsiaBSDcon
OpenZFS - AsiaBSDconOpenZFS - AsiaBSDcon
OpenZFS - AsiaBSDcon
 
OpenZFS send and receive
OpenZFS send and receiveOpenZFS send and receive
OpenZFS send and receive
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
OpenZFS Channel programs
OpenZFS Channel programsOpenZFS Channel programs
OpenZFS Channel programs
 
How Can OpenNebula Fit Your Needs: A European Project Feedback
How Can OpenNebula Fit Your Needs: A European Project FeedbackHow Can OpenNebula Fit Your Needs: A European Project Feedback
How Can OpenNebula Fit Your Needs: A European Project Feedback
 
How can OpenNebula fit your needs - OpenNebulaConf 2013
How can OpenNebula fit your needs - OpenNebulaConf 2013 How can OpenNebula fit your needs - OpenNebulaConf 2013
How can OpenNebula fit your needs - OpenNebulaConf 2013
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to Bottom
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 

Viewers also liked

Viewers also liked (20)

Fidelizacion museos
Fidelizacion museosFidelizacion museos
Fidelizacion museos
 
Portfolio
PortfolioPortfolio
Portfolio
 
What is Study Island?
What is Study Island?What is Study Island?
What is Study Island?
 
0815FINAL.FULLPDFTop200
0815FINAL.FULLPDFTop2000815FINAL.FULLPDFTop200
0815FINAL.FULLPDFTop200
 
Insight Magazine
Insight MagazineInsight Magazine
Insight Magazine
 
Estampas, Independencia y Revolución en el MUNAE
Estampas, Independencia y Revolución en el MUNAEEstampas, Independencia y Revolución en el MUNAE
Estampas, Independencia y Revolución en el MUNAE
 
Panbiogeografia en Haemagogus
Panbiogeografia en HaemagogusPanbiogeografia en Haemagogus
Panbiogeografia en Haemagogus
 
TheSummit_Summer_2013
TheSummit_Summer_2013TheSummit_Summer_2013
TheSummit_Summer_2013
 
Top ten attractions in Portland
Top ten attractions in PortlandTop ten attractions in Portland
Top ten attractions in Portland
 
Air Quality Compliance Affecting Oil and Gas Development
Air Quality Compliance Affecting Oil and Gas DevelopmentAir Quality Compliance Affecting Oil and Gas Development
Air Quality Compliance Affecting Oil and Gas Development
 
Naica crystalcavemexico
Naica crystalcavemexicoNaica crystalcavemexico
Naica crystalcavemexico
 
Media Trifecta For Non-Profit Marketing
Media Trifecta For Non-Profit MarketingMedia Trifecta For Non-Profit Marketing
Media Trifecta For Non-Profit Marketing
 
Ils 499 graphics project barnard
Ils 499 graphics project   barnardIls 499 graphics project   barnard
Ils 499 graphics project barnard
 
Hierba Artificial
Hierba ArtificialHierba Artificial
Hierba Artificial
 
Lcca
LccaLcca
Lcca
 
Quello che resta
Quello che restaQuello che resta
Quello che resta
 
Tobii T60 Xl User Manual S
Tobii T60 Xl User Manual STobii T60 Xl User Manual S
Tobii T60 Xl User Manual S
 
La india alba f 2
La india alba f 2La india alba f 2
La india alba f 2
 
Shakespeares Globe Theatre
Shakespeares Globe TheatreShakespeares Globe Theatre
Shakespeares Globe Theatre
 
How To Drive Webinar Registration | ON24 Infographic
How To Drive Webinar Registration | ON24 InfographicHow To Drive Webinar Registration | ON24 Infographic
How To Drive Webinar Registration | ON24 Infographic
 

Similar to Seven problems of Linux containers

N problems of Linux containers
N problems of Linux containersN problems of Linux containers
N problems of Linux containersOpenVZ
 
OpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerOpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerKirill Kolyshkin
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!All Things Open
 
Openvz booth
Openvz boothOpenvz booth
Openvz boothOpenVZ
 
Java in containers
Java in containersJava in containers
Java in containersMartin Baez
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingChinaNetCloud
 
LXC Containers and AUFs
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFsDocker, Inc.
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveAlison Chaiken
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroupsKernel TLV
 
OpenVZ Linux containers
OpenVZ Linux containersOpenVZ Linux containers
OpenVZ Linux containersOpenVZ
 
Tuning systemd for embedded
Tuning systemd for embeddedTuning systemd for embedded
Tuning systemd for embeddedAlison Chaiken
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xrkr10
 
Containers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux KernelContainers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux KernelOpenVZ
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with PacemakerKris Buytaert
 
Docker 原理與實作
Docker 原理與實作Docker 原理與實作
Docker 原理與實作kao kuo-tung
 
Containers - Cloud Phoenix March Meetup
Containers - Cloud Phoenix March MeetupContainers - Cloud Phoenix March Meetup
Containers - Cloud Phoenix March MeetupMiguel Zuniga
 

Similar to Seven problems of Linux containers (20)

N problems of Linux containers
N problems of Linux containersN problems of Linux containers
N problems of Linux containers
 
OpenVZ Linux Containers
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux Containers
 
OpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerOpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and Docker
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!
 
Openvz booth
Openvz boothOpenvz booth
Openvz booth
 
Java in containers
Java in containersJava in containers
Java in containers
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
 
LXC Containers and AUFs
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFs
 
An Introduction To Linux
An Introduction To LinuxAn Introduction To Linux
An Introduction To Linux
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to love
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroups
 
OpenVZ Linux containers
OpenVZ Linux containersOpenVZ Linux containers
OpenVZ Linux containers
 
Tuning systemd for embedded
Tuning systemd for embeddedTuning systemd for embedded
Tuning systemd for embedded
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Containers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux KernelContainers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux Kernel
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with Pacemaker
 
Docker 原理與實作
Docker 原理與實作Docker 原理與實作
Docker 原理與實作
 
Containers - Cloud Phoenix March Meetup
Containers - Cloud Phoenix March MeetupContainers - Cloud Phoenix March Meetup
Containers - Cloud Phoenix March Meetup
 

More from OpenVZ

PFcache - LinuxCon 2015
PFcache - LinuxCon 2015PFcache - LinuxCon 2015
PFcache - LinuxCon 2015OpenVZ
 
Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and topOpenVZ
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovOpenVZ
 
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovOpenVZ
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinOpenVZ
 
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015OpenVZ
 
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел ЕмельяновЖивая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел ЕмельяновOpenVZ
 
What's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey BronnikovWhat's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey BronnikovOpenVZ
 
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий МонаховПроблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий МонаховOpenVZ
 
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел ТихомировРазвёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел ТихомировOpenVZ
 
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан КупреевCRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан КупреевOpenVZ
 
LibCT и контейнеры на уровне приложений -- Александр Бурлука
	LibCT и контейнеры на уровне приложений -- Александр Бурлука	LibCT и контейнеры на уровне приложений -- Александр Бурлука
LibCT и контейнеры на уровне приложений -- Александр БурлукаOpenVZ
 
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир ДавыдовУправление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир ДавыдовOpenVZ
 
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел ЕмельяновЖивая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел ЕмельяновOpenVZ
 
LibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey VaginLibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey VaginOpenVZ
 
Denser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel EmelyanovDenser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel EmelyanovOpenVZ
 
CGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel EmelyanovCGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel EmelyanovOpenVZ
 
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...OpenVZ
 
Not so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinOpenVZ
 
Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ OpenVZ
 

More from OpenVZ (20)

PFcache - LinuxCon 2015
PFcache - LinuxCon 2015PFcache - LinuxCon 2015
PFcache - LinuxCon 2015
 
Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
 
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
 
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
 
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел ЕмельяновЖивая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
 
What's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey BronnikovWhat's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey Bronnikov
 
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий МонаховПроблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
 
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел ТихомировРазвёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
 
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан КупреевCRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
 
LibCT и контейнеры на уровне приложений -- Александр Бурлука
	LibCT и контейнеры на уровне приложений -- Александр Бурлука	LibCT и контейнеры на уровне приложений -- Александр Бурлука
LibCT и контейнеры на уровне приложений -- Александр Бурлука
 
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир ДавыдовУправление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
 
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел ЕмельяновЖивая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
 
LibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey VaginLibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey Vagin
 
Denser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel EmelyanovDenser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel Emelyanov
 
CGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel EmelyanovCGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel Emelyanov
 
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
 
Not so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir Kolyshkin
 
Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ
 

Recently uploaded

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 

Recently uploaded (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

Seven problems of Linux containers

  • 1. parallels.com || openvz.org || criu.org Seven Problems of Linux Containers Kir Kolyshkin <kir@openvz.org> 28 April 2013 LinuxFest Northwest
  • 2. parallels.com || openvz.org || criu.org Seventy Seven Problems of Linux Containers Kir Kolyshkin <kir@openvz.org> 28 April 2013 LinuxFest Northwest (of which I am going to cover six)
  • 3. parallels.com || openvz.org || criu.org Problem 1: Effective virtualization ● Virtualization is partitioning ● Historical way: $M mainframes ● Modern way: virtual machines ● Problem: performance overhead ● Partial solution: hardware support (Intel VT, AMD V)
  • 4. parallels.com || openvz.org || criu.org Solution: isolation ● Run many isolated userspace instances on top of sone single (Linux) kernel ● All processes see each other – files, process information, network, shared memory, users, etc. ● Make them unsee it!
  • 6. parallels.com || openvz.org || criu.org One historical way to unsee chroot()
  • 7. parallels.com || openvz.org || criu.org Namespaces ● Implemented in the Linux kernel – PID – net – IPC – UTS – mnt – user ● clone() with CLONE_NEW* flags
  • 8. parallels.com || openvz.org || criu.org Problem 2: Shared resources ● All containers share the same set of resources (CPU, RAM, disk, various kernel things ...) ● Need fair distribution of goods so everyone gets their share ● Need DoS prevention ● Need prioritization – “All animals are equal, but some animals are more equal than others” -- George Orwell
  • 10. parallels.com || openvz.org || criu.org Solution: OpenVZ resource controls ● OpenVZ: – user beancounters ● controls 20 parameters – hierarchical CPU scheduler – disk quota per containers – I/O priorities per-container ● Dynamic control, can “resize” runtime
  • 11. parallels.com || openvz.org || criu.org Solution: cgroups ● Cgroups is a mechanism to control resources per hierarchical groups of processes ● Cgroups is nothing without controllers: – blkio, cpu, cpuacct, cpuset, devices, freezer, memory, net_cls, net_prio ● Cgroups are orthogonal to namespaces ● Still a work in progress (kernel memory)
  • 12. parallels.com || openvz.org || criu.org Problem 3: easy resources ● User Beancounters are complicated: – http://wiki.openvz.org/UBC_consistency_check – user has to set all these parameters – some of which are interdependent ● We created a collection of valid configs, ● ... wrote a whole book about UBC ● ... and a set of tools to help
  • 14. parallels.com || openvz.org || criu.org Solution: VSwap ● Only two primary parameters: RAM and swap – others still exist, but no longer required to set ● Swap is virtual, no actual I/O is performed ● Slow down to emulate real swap ● Only when actual global RAM shortage occurs, virtual swap goes into the real swap ● Currently only available in OpenVZ kernel
  • 15. parallels.com || openvz.org || criu.org Problem 4: fast live migration ● We can migrate an OpenVZ container from one physical server to another without a shutdown ● We want to do it fast even for huge containers – huge disk: use shared storage – huge RAM: ???
  • 16. parallels.com || openvz.org || criu.org Normal migration process ● (Assuming shared storage) ● 1 Freeze the container ● 2 Dump its complete state to a dump file ● 3 Copy dump file to destination server ● 4 Undump ● 5 Unfreeze ● Problem: huge dump file
  • 17. parallels.com || openvz.org || criu.org Solution 1: network swap ● 1 Dump the minimal memory, lock the rest ● 2 Restore the minimal memory, mark the rest as swapped out ● 3 Set up network swap from the source ● 4 Unfreeze. Missing RAM will be “swapped in” ● 5 Migrate the rest of RAM and kill it on source
  • 19. parallels.com || openvz.org || criu.org Solution 1: network swap ● 1 Dump the minimal memory, lock the rest ● 2 Copy, undump what we have, mark the rest as swapped out ● 3 Set up network swap served from the source ● 4 Unfreeze. Missing RAM will be “swapped in” ● 5 Migrate the rest of RAM and kill it on source ● PROBLEM? Reliability, no way to rollback
  • 20. parallels.com || openvz.org || criu.org Solution 2: Iterative RAM migration ● 1 Ask kernel to track modified pages ● 2 Copy all memory to destination system ● 3 Ask kernel for list of modified pages ● 4 Copy those pages ● 5 GOTO 3 until satisfied ● 6 Freeze and do migration as usual
  • 21. parallels.com || openvz.org || criu.org Problem 5: upstreaming ● OpenVZ was developed separately ● Then we wanted to merge it upstream (i.e. to vanilla Linux kernel) ● Problem?
  • 23. parallels.com || openvz.org || criu.org Problem 5: upstreaming ● OpenVZ was developed separately ● Then we wanted to merge it upstream (i.e. to vanilla Linux kernel) ● Problem: ● upstream devs are not accepting our work
  • 24. parallels.com || openvz.org || criu.org Solution 1: rewrite from scratch ● User Beancounters -> CGroups ● Did 2 rewrites for PID namespace until it finally got accepted ● Network namespace redone ● It works! ● about 1500 patches got landed to vanilla ● II Parallels made it to top10 contributors
  • 25. parallels.com || openvz.org || criu.org Solution 2: CRIU ● We tried hard to merge checkpoint/restore ● Other people tried hard too, no luck ● Can't make it to the kernel, let's go userspace ● With minimal kernel intervention when required ● Kernel exports most of information already, so let's just add missing bits and pieces
  • 26. parallels.com || openvz.org || criu.org CRIU ● Checkpoint / Restore (mostly) In Userspace Tools currently at version 0.4 ● Will do 1.0 release this year ● Kernel 3.8 has about 120 patches from us – 95% of needed features are there ● Memory snapshot recently made it to -mm tree
  • 28. parallels.com || openvz.org || criu.org Problem 6: common file system ● Container is just a directory on host, all CTs reside on the same FS ● File system journal is a bottleneck ● Lots of small-size files I/O on CT backup ● No sub-tree disk quota support in upstream ● No per-container snapshots ● Live migration: rsync -- changed inodes ● File system type and properties are fixed
  • 29. parallels.com || openvz.org || criu.org Solution 1: LVM ● Only works only on top of block device ● Hard to manage (e.g. how to migrate huge volume?) ● No dynamic allocation ● Complicated management
  • 30. parallels.com || openvz.org || criu.org Solution 2: loop device ● VFS operations leads to double page-caching – (already fixed in the recent kernels) ● No dynamic allocation, max space is used ● Limited feature set
  • 31. parallels.com || openvz.org || criu.org Solution 3: ploop ● Basic idea: same as loop, just better ● Modular design: – various image formats (qcow2 in TODO) – various I/O backends ● More features: – live resize – instant live snapshots – write tracker to help in live migration
  • 32. parallels.com || openvz.org || criu.org Any problems questions? ● kir@openvz.org ● Twitter: @kolyshkin