SlideShare a Scribd company logo
1 of 20
Speeding up
ps and top
Kirill Kolyshkin, Andrey Vagin
SCALE 14x, 23 Jan 2016
Pasadena, CA
2
Agenda
● Intro {Virtuozzo, OpenVZ, CRIU}
● Limitations of current /proc/PID interface
● Similar problems solved before
● Proposed solutions (yabad and good ones)
● Performance results
3
● Leading provider of secure, production-ready
containers, hypervisors, and virtualized storage
● An industry pioneer, first containers in 2001
● Powering some of world’s largest cloud networks
– over 5 million mission critical cloud workloads
● 700+ worldwide partners
4
● Founded in 1997,
“spun off” in Dec 2015
● HQ in Seattle, offices in
London, Moscow, Munich
● Over 170 employees, including
100+ engineers, 15 kernel hackers
● Contributor/sponsor of key open source
initiatives
1997
2008
2015
2016
“A rose by any other name…”
5
$ whoami
● Linux user since 1995
– Slackware on floppy disks, kernels 1.0.9 and 1.1.50
● Developing VEs containers since 2002
– vzctl and vzpkg
● Leading OpenVZ from 2005 till 2015
● SCALE user speaker since SCALE4x (2004)
● Twitter: @kolyshkin
6
● Full (system) containers for Linux
● Developed since 1999,
open source since 2005
● Live migration since 2007
● ~2000 Linux kernel patches
– enabling LXC, Docker, CoreOS…
– biggest contributor to containers
● Now reborn as Virtuozzo 7, more open than ever
OpenVZ
7
CRIU: Checkpoint / Restore In Userspace
● About 3 y.o, ver 1.8 Dec 2015
● Replaces OpenVZ in-kernel c/r
● Saves and restores
sets of running processes
● Integrated into Docker, LXC
● Not just for live migration!
– save HPC job or game, update kernel or hardware,
balance load, speed-up boot, reverse debug, inject
faults
8
Ideas behind CRIU
● We can't merge kernel c/r upstream, so...
hack it! Redo the whole thing in userspace
● Use existing interfaces where available
– /proc, ptrace, netlink, parasite code injection
● Amend the kernel where necessary
– only ~170 kernel patches
– kernel v3.11+ is sufficient
(if CONFIG_CHECKPOINT_RESTORE is set)
9
Current interface: /proc/PID/*
$ ls /proc/self/
attr             cwd      loginuid    numa_maps      schedstat  task
autogroup        environ  map_files   oom_adj        sessionid  timers
auxv             exe      maps        oom_score      setgroups  uid_map
cgroup           fd       mem         oom_score_adj  smaps      wchan
clear_refs       fdinfo   mountinfo   pagemap        stack
cmdline          gid_map  mounts      personality    stat
comm             io       mountstats  projid_map     statm
coredump_filter  latency  net         root           status
cpuset           limits   ns          sched          syscall
10
Limitations of /proc/PID interface
● Requires at least three syscalls per each process
– open(), read(), close()
● Variety of formats, mostly text based
● Not enough information (/proc/PID/fd/*)
● Some formats are non-extendable
– /proc/PID/maps where the last column is optional
● Sometimes slow due to extra attributes
– /proc/PID/smaps vs /proc/PID/maps
●
11
/proc/PID/smaps
7f1cb0afc000-7f1cb0afd000 rw-p 00021000 08:03 656516 /usr/lib64/ld-2.21.so
Size: 4 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Anonymous: 4 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB
VmFlags: rd wr mr mw me dw ac sd
$ time cat /proc/*/maps > /dev/null
real 0m0.061s
user 0m0.002s
sys 0m0.059s
$ time cat /proc/*/smaps > /dev/null
real 0m0.253s
user 0m0.004s
sys 0m0.247s
12
Similar problem: info about sockets
● /proc
– /proc/net/netlink
– /proc/net/unix
– /proc/net/tcp
– /proc/net/packet
● Problems: not enough info, complex format, all-or-nothing
● Solution: use netlink, generalize tcp_diag as sock_diag
– the extendable binary format
– allows to specify a group of attributes and sockets
13
[Bad] solution 1: introduce task_diag
● Not obvious where to get pid and user
namespaces
● Impossible to restrict netlink sockets
– Credentials are saved when a socket is created
– Process can drop privileges, but netlink doesn't care
– The same socket can be used to get process
attributes and to set ip addresses
14
A new interface for processes
● /proc/task_diag is a transaction file
– write request → read response
● Netlink message format:
binary and extendable
● Get information about a specified set of processes
● Optimal grouping of attributes
– Any attribute in a group can't affect a response time
● Information about one process can be split
into a few messages (16KB message size)
● Work in progress, anything may change!
15
nlmsg_len
nlmsg_type nlmsg_flags
nlmsg_seq
nlmsg_id
nlattr_len nlattr_type
payload
nlattr_len nlattr_type
payload
Netlink message and attributes
● Simple and flexible
message-based protocol
● Easy to add a new group
● Easy to add new attribute
16
Ways to specify sets of processes
● TASK_DIAG_DUMP_ALL
– Dump all processes
● TASK_DIAG_DUMP_ALL_THREAD
– Dump all threads
● TASK_DIAG_DUMP_CHILDREN
– Dump children of a specified task
● TASK_DIAG_DUMP_THREAD
– Dump threads of a specified task
● TASK_DIAG_DUMP_ONE
– Dump one task
17
Groups of attributes
● TASK_DIAG_BASE
– PID, PGID, SID, TID, comm
● TASK_DIAG_CRED
– UID, GID, groups, capabilities
● TASK_DIAG_STAT
– per-task and per-process statistics (same as taskstats, not avail
in /proc)
● TASK_DIAG_VMA
– mapped memory regions and their access permissions (same as
maps)
● TASK_DIAG_VMA_STAT
– memory consumption for each mapping (same as smaps)
18
Performance: ps
Get pid, tid, pgid and comm for 50000 processes
$ time ./task_proc_all a
real 0m0.279s
user 0m0.013s
sys 0m0.255s
$ time ./task_diag_all a
real 0m0.051s
user 0m0.001s
sys 0m0.049s
A few times faster ;)
19
Performance: using perf tool
> Using the fork test command:
> 10,000 processes; 10k proc with 5 threads = 50,000 tasks
> reading /proc: 11.3 sec
> task_diag: 2.2 sec
>
> @7,440 tasks, reading /proc is at 0.77 sec and task_diag at 0.096
>
> 128 instances of sepcjbb, 80,000+ tasks:
> reading /proc: 32.1 sec
> task_diag: 3.9 sec
>
> So overall much snappier startup times.
// David Ahern
20
Thank you!
http://virtuozzo.com/
http://openvz.org/
http://criu.org/
@kolyshkin
@vagin_andrey
https://github.com/avagin/linux-task-diag/

More Related Content

What's hot

Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems PerformanceBrendan Gregg
 
Block I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBabak Farrokhi
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016Brendan Gregg
 
Real-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionReal-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionbcantrill
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsBrendan Gregg
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: IntroductionBrendan Gregg
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE MethodBrendan Gregg
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeKernel TLV
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologiesBrendan Gregg
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
 

What's hot (20)

Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
 
Block I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktrace
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
 
Real-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionReal-time in the real world: DIRT in production
Real-time in the real world: DIRT in production
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Analyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE MethodAnalyzing OS X Systems Performance with the USE Method
Analyzing OS X Systems Performance with the USE Method
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 

Similar to Speeding up ps and top

Time to rethink /proc
Time to rethink /procTime to rethink /proc
Time to rethink /procKir Kolyshkin
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...OpenShift Origin
 
Shall we play a game
Shall we play a gameShall we play a game
Shall we play a gamejackpot201
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
CS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXCS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXruchith
 
Terraforming your Infrastructure on GCP
Terraforming your Infrastructure on GCPTerraforming your Infrastructure on GCP
Terraforming your Infrastructure on GCPSamuel Chow
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xrkr10
 
1 session installation
1 session installation1 session installation
1 session installationRahul Hada
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
We shall play a game....
We shall play a game....We shall play a game....
We shall play a game....Sadia Textile
 
MySQL for Oracle DBAs
MySQL for Oracle DBAsMySQL for Oracle DBAs
MySQL for Oracle DBAsFromDual GmbH
 

Similar to Speeding up ps and top (20)

Time to rethink /proc
Time to rethink /procTime to rethink /proc
Time to rethink /proc
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Shall we play a game?
Shall we play a game?Shall we play a game?
Shall we play a game?
 
0507 057 01 98 * Adana Klima Servisleri
0507 057 01 98 * Adana Klima Servisleri0507 057 01 98 * Adana Klima Servisleri
0507 057 01 98 * Adana Klima Servisleri
 
Shall we play a game
Shall we play a gameShall we play a game
Shall we play a game
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
CS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIXCS 626 - March : Capsicum: Practical Capabilities for UNIX
CS 626 - March : Capsicum: Practical Capabilities for UNIX
 
Terraforming your Infrastructure on GCP
Terraforming your Infrastructure on GCPTerraforming your Infrastructure on GCP
Terraforming your Infrastructure on GCP
 
An Introduction To Linux
An Introduction To LinuxAn Introduction To Linux
An Introduction To Linux
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
1 session installation
1 session installation1 session installation
1 session installation
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
We shall play a game....
We shall play a game....We shall play a game....
We shall play a game....
 
MySQL for Oracle DBAs
MySQL for Oracle DBAsMySQL for Oracle DBAs
MySQL for Oracle DBAs
 

More from Kirill Kolyshkin

CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux ContainersCRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux ContainersKirill Kolyshkin
 
What's missing from upstream kernel containers?
What's missing from upstream kernel containers?What's missing from upstream kernel containers?
What's missing from upstream kernel containers?Kirill Kolyshkin
 
Not so brief history of Linux Containers
Not so brief history of Linux ContainersNot so brief history of Linux Containers
Not so brief history of Linux ContainersKirill Kolyshkin
 
N problems of Linux Containers
N problems of Linux ContainersN problems of Linux Containers
N problems of Linux ContainersKirill Kolyshkin
 
A brief history of Linux Containers
A brief history of Linux Containers A brief history of Linux Containers
A brief history of Linux Containers Kirill Kolyshkin
 
OpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerOpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerKirill Kolyshkin
 
Criu texas-linux-fest-2014
Criu texas-linux-fest-2014Criu texas-linux-fest-2014
Criu texas-linux-fest-2014Kirill Kolyshkin
 
Seven problems of Linux Containers
Seven problems of Linux ContainersSeven problems of Linux Containers
Seven problems of Linux ContainersKirill Kolyshkin
 
Checkpoint/Restore: are we there yet?
Checkpoint/Restore: are we there yet?Checkpoint/Restore: are we there yet?
Checkpoint/Restore: are we there yet?Kirill Kolyshkin
 

More from Kirill Kolyshkin (10)

CRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux ContainersCRIU: Time and Space Travel for Linux Containers
CRIU: Time and Space Travel for Linux Containers
 
What's missing from upstream kernel containers?
What's missing from upstream kernel containers?What's missing from upstream kernel containers?
What's missing from upstream kernel containers?
 
Not so brief history of Linux Containers
Not so brief history of Linux ContainersNot so brief history of Linux Containers
Not so brief history of Linux Containers
 
N problems of Linux Containers
N problems of Linux ContainersN problems of Linux Containers
N problems of Linux Containers
 
A brief history of Linux Containers
A brief history of Linux Containers A brief history of Linux Containers
A brief history of Linux Containers
 
OpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and DockerOpenVZ, Virtuozzo and Docker
OpenVZ, Virtuozzo and Docker
 
Criu texas-linux-fest-2014
Criu texas-linux-fest-2014Criu texas-linux-fest-2014
Criu texas-linux-fest-2014
 
Seven problems of Linux Containers
Seven problems of Linux ContainersSeven problems of Linux Containers
Seven problems of Linux Containers
 
Checkpoint/Restore: are we there yet?
Checkpoint/Restore: are we there yet?Checkpoint/Restore: are we there yet?
Checkpoint/Restore: are we there yet?
 
OpenVZ Linux Containers
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux Containers
 

Recently uploaded

Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 

Recently uploaded (20)

Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 

Speeding up ps and top

  • 1. Speeding up ps and top Kirill Kolyshkin, Andrey Vagin SCALE 14x, 23 Jan 2016 Pasadena, CA
  • 2. 2 Agenda ● Intro {Virtuozzo, OpenVZ, CRIU} ● Limitations of current /proc/PID interface ● Similar problems solved before ● Proposed solutions (yabad and good ones) ● Performance results
  • 3. 3 ● Leading provider of secure, production-ready containers, hypervisors, and virtualized storage ● An industry pioneer, first containers in 2001 ● Powering some of world’s largest cloud networks – over 5 million mission critical cloud workloads ● 700+ worldwide partners
  • 4. 4 ● Founded in 1997, “spun off” in Dec 2015 ● HQ in Seattle, offices in London, Moscow, Munich ● Over 170 employees, including 100+ engineers, 15 kernel hackers ● Contributor/sponsor of key open source initiatives 1997 2008 2015 2016 “A rose by any other name…”
  • 5. 5 $ whoami ● Linux user since 1995 – Slackware on floppy disks, kernels 1.0.9 and 1.1.50 ● Developing VEs containers since 2002 – vzctl and vzpkg ● Leading OpenVZ from 2005 till 2015 ● SCALE user speaker since SCALE4x (2004) ● Twitter: @kolyshkin
  • 6. 6 ● Full (system) containers for Linux ● Developed since 1999, open source since 2005 ● Live migration since 2007 ● ~2000 Linux kernel patches – enabling LXC, Docker, CoreOS… – biggest contributor to containers ● Now reborn as Virtuozzo 7, more open than ever OpenVZ
  • 7. 7 CRIU: Checkpoint / Restore In Userspace ● About 3 y.o, ver 1.8 Dec 2015 ● Replaces OpenVZ in-kernel c/r ● Saves and restores sets of running processes ● Integrated into Docker, LXC ● Not just for live migration! – save HPC job or game, update kernel or hardware, balance load, speed-up boot, reverse debug, inject faults
  • 8. 8 Ideas behind CRIU ● We can't merge kernel c/r upstream, so... hack it! Redo the whole thing in userspace ● Use existing interfaces where available – /proc, ptrace, netlink, parasite code injection ● Amend the kernel where necessary – only ~170 kernel patches – kernel v3.11+ is sufficient (if CONFIG_CHECKPOINT_RESTORE is set)
  • 9. 9 Current interface: /proc/PID/* $ ls /proc/self/ attr             cwd      loginuid    numa_maps      schedstat  task autogroup        environ  map_files   oom_adj        sessionid  timers auxv             exe      maps        oom_score      setgroups  uid_map cgroup           fd       mem         oom_score_adj  smaps      wchan clear_refs       fdinfo   mountinfo   pagemap        stack cmdline          gid_map  mounts      personality    stat comm             io       mountstats  projid_map     statm coredump_filter  latency  net         root           status cpuset           limits   ns          sched          syscall
  • 10. 10 Limitations of /proc/PID interface ● Requires at least three syscalls per each process – open(), read(), close() ● Variety of formats, mostly text based ● Not enough information (/proc/PID/fd/*) ● Some formats are non-extendable – /proc/PID/maps where the last column is optional ● Sometimes slow due to extra attributes – /proc/PID/smaps vs /proc/PID/maps ●
  • 11. 11 /proc/PID/smaps 7f1cb0afc000-7f1cb0afd000 rw-p 00021000 08:03 656516 /usr/lib64/ld-2.21.so Size: 4 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me dw ac sd $ time cat /proc/*/maps > /dev/null real 0m0.061s user 0m0.002s sys 0m0.059s $ time cat /proc/*/smaps > /dev/null real 0m0.253s user 0m0.004s sys 0m0.247s
  • 12. 12 Similar problem: info about sockets ● /proc – /proc/net/netlink – /proc/net/unix – /proc/net/tcp – /proc/net/packet ● Problems: not enough info, complex format, all-or-nothing ● Solution: use netlink, generalize tcp_diag as sock_diag – the extendable binary format – allows to specify a group of attributes and sockets
  • 13. 13 [Bad] solution 1: introduce task_diag ● Not obvious where to get pid and user namespaces ● Impossible to restrict netlink sockets – Credentials are saved when a socket is created – Process can drop privileges, but netlink doesn't care – The same socket can be used to get process attributes and to set ip addresses
  • 14. 14 A new interface for processes ● /proc/task_diag is a transaction file – write request → read response ● Netlink message format: binary and extendable ● Get information about a specified set of processes ● Optimal grouping of attributes – Any attribute in a group can't affect a response time ● Information about one process can be split into a few messages (16KB message size) ● Work in progress, anything may change!
  • 15. 15 nlmsg_len nlmsg_type nlmsg_flags nlmsg_seq nlmsg_id nlattr_len nlattr_type payload nlattr_len nlattr_type payload Netlink message and attributes ● Simple and flexible message-based protocol ● Easy to add a new group ● Easy to add new attribute
  • 16. 16 Ways to specify sets of processes ● TASK_DIAG_DUMP_ALL – Dump all processes ● TASK_DIAG_DUMP_ALL_THREAD – Dump all threads ● TASK_DIAG_DUMP_CHILDREN – Dump children of a specified task ● TASK_DIAG_DUMP_THREAD – Dump threads of a specified task ● TASK_DIAG_DUMP_ONE – Dump one task
  • 17. 17 Groups of attributes ● TASK_DIAG_BASE – PID, PGID, SID, TID, comm ● TASK_DIAG_CRED – UID, GID, groups, capabilities ● TASK_DIAG_STAT – per-task and per-process statistics (same as taskstats, not avail in /proc) ● TASK_DIAG_VMA – mapped memory regions and their access permissions (same as maps) ● TASK_DIAG_VMA_STAT – memory consumption for each mapping (same as smaps)
  • 18. 18 Performance: ps Get pid, tid, pgid and comm for 50000 processes $ time ./task_proc_all a real 0m0.279s user 0m0.013s sys 0m0.255s $ time ./task_diag_all a real 0m0.051s user 0m0.001s sys 0m0.049s A few times faster ;)
  • 19. 19 Performance: using perf tool > Using the fork test command: > 10,000 processes; 10k proc with 5 threads = 50,000 tasks > reading /proc: 11.3 sec > task_diag: 2.2 sec > > @7,440 tasks, reading /proc is at 0.77 sec and task_diag at 0.096 > > 128 instances of sepcjbb, 80,000+ tasks: > reading /proc: 32.1 sec > task_diag: 3.9 sec > > So overall much snappier startup times. // David Ahern

Editor's Notes

  1. “a rose by any other name” – you know your Shakespear, right?
  2. Kernel 1.0.9 did not have support for IDE CDROM, and it took me a week to compile the 1.1.50 kernel that had it (as each kernel compilation was an overnight job). SCALE speaker in 2004. How many of you were at SCALE4x? What makes it more interesting is that time I came all the way from Moscow, Russia, and it was my first time in U.S.
  3. OpenVZ, my beloved child
  4. We failed to merge in-kernel c/r because that kernel code is very invasive, touching every kernel subsystem, no kernel maintainer wanted that in their code
  5. More than 40 files and 10 directories for each process.
  6. Variety of formats – no one wants to spend their life writing parsers for all these formats An example of non-extendable format is /proc/*/maps – last field is file name, and it is ... optional!
  7. Another bad example of using netlink: taskstats
  8. The structure is pretty generic, this is what makes this format extendable.