SlideShare a Scribd company logo
1 of 37
Download to read offline
ProfilingyourApplications
usingtheLinuxPerfTools
/Kevin Funk KDAB
emBO++ Conf 2017, Bochum
Agenda
Perf setup
Benchmarking
Profiling
More Topics?
Intermission:WhoamI?
Software Engineering Consultant at KDAB since 2010
FOSS enthusiast working on Qt/C++ at KDE since 2006
Lead developer of the KDevelop IDE
mainly on the C/C++ support backed by Clang
as well as cross-platform support
Setup
Hardware
Linux Kernel prerequisites
Building user-space perf
Cross-compiling
Permissions
Hardware
Hardware performance counters
Working PMU
LinuxKernelPrerequisites
$ uname -r # should be at least 3.7
4.7.1-1-ARCH
$ zgrep PERF /proc/config.gz
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_PERF_REGS=y
...
BuildingUser-spaceperf
git clone https://github.com/torvalds/linux.git
cd linux/tools/perf
export CC=gcc # clang is not supported
make
Dependencies
Auto-detecting system features:
... dwarf: [ on ] # for symbol resolution
... dwarf_getlocations: [ on ] # for symbol resolution
... glibc: [ on ]
... gtk2: [ on ]
... libaudit: [ on ] # for syscall tracing
... libbfd: [ on ] # for symbol resolution
... libelf: [ on ] # for symbol resolution
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ] # for perl bindings
... libpython: [ on ] # for python bindings
... libslang: [ on ] # for TUI
... libcrypto: [ on ] # for JITed probe points
... libunwind: [ on ] # for unwinding
... libdw-dwarf-unwind: [ on ] # for unwinding
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
Cross-compiling
make prefix=somepath ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
Common pitfalls:
CC must not contain any flags
CFLAGS is ignored, use EXTRA_CFLAGS
prefix path ignored for include and library paths
Dependency issues:
linux/tools/build/feature/test-$FEATURE.make.output
Permissions
#!/bin/bash
sudo mount -o remount,mode=755 /sys/kernel/debug
sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
echo "0" | sudo tee /proc/sys/kernel/kptr_restrict
echo "-1" | sudo tee /proc/sys/kernel/perf_event_paranoid
sudo chown root:tracing /sys/kernel/debug/tracing/uprobe_events
sudo chmod g+rw /sys/kernel/debug/tracing/uprobe_events
Benchmarking
Be scientific!
Take variance into account
Compare before/after measurements
perf stat
$ perf stat -r 5 -o baseline.txt -- ./ex_branches
$ cat baseline.txt
Performance counter stats for './ex_branches' (5 runs):
807.951072 task-clock:u (msec) # 0.999 CPUs utilized ( +- 1.97% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
520 page-faults:u # 0.643 K/sec ( +- 0.15% )
2,487,366,239 cycles:u # 3.079 GHz ( +- 1.97% )
1,484,737,283 instructions:u # 0.60 insn per cycle ( +- 0.00% )
329,602,843 branches:u # 407.949 M/sec ( +- 0.00% )
80,476,858 branch-misses:u # 24.42% of all branches ( +- 0.06% )
0.808952447 seconds time elapsed ( +- 1.97% )
Kernelvs.Userspace
Use event modifiers to separate domains:
$ perf stat -r 5 --event=cycles:{k,u} -- ./ex_qdatetime
Performance counter stats for './ex_qdatetime' (5 runs):
13,337,722 cycles:k ( +- 3.82% )
9,745,474 cycles:u ( +- 1.58% )
0.008018321 seconds time elapsed ( +- 4.02% )
See man perf list for more.
perf list
$ perf list
List of pre-defined events (to be used in -e):
branch-misses [Hardware event]
cache-misses [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
...
alignment-faults [Software event]
context-switches OR cs [Software event]
page-faults OR faults [Software event]
...
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_iowait [Tracepoint event]
sched:sched_stat_runtime [Tracepoint event]
...
syscalls:sys_enter_futex [Tracepoint event]
syscalls:sys_exit_futex [Tracepoint event]
...
Profiling
CPU profiling
Sleep-time profiling
perf top
System-wide live profiling:
$ perf top
Samples: 12K of event 'cycles:ppp', Event count (approx.): 5456372201
Overhead Shared Object Symbol
13.11% libQt5Core.so.5.7.0 [.] QHashData::nextNode
5.08% libQt5Core.so.5.7.0 [.] operator==
2.90% libQt5Core.so.5.7.0 [.] 0x000000000012f0d1
2.33% libQt5DBus.so.5.7.0 [.] 0x000000000002281f
1.62% libQt5DBus.so.5.7.0 [.] 0x0000000000022810
...
StatisticalProfiling
Sampling the call stack is crucial!
UnwindingandCallStacks
frame pointers (fp)
debug information (dwarf)
Last Branch Record (lbr)
Recommendation
On embedded: enable frame pointers
On the desktop: rely on DWARF
On Intel: play with LBR
perf record
Profile new application and its children:
$ perf record --call-graph dwarf -- ./lab_mandelbrot -b 5
[ perf record: Woken up 256 times to write data ]
[ perf record: Captured and wrote 64.174 MB perf.data (7963 samples) ]
perf record
Attach to running process:
$ perf record --call-graph dwarf --pid $(pidof ...)
# wait for some time, then quit with CTRL + C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.904 MB perf.data (70 samples) ]
perf record
Profile whole system for some time:
$ perf record -a -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.498 MB perf.data (2731 samples) ]
perf report
perf report
Top-down inclusive cost report:
$ perf report
Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769
Children Self Command Shared Object Symbol
- 93.67% 31.76% lab_mandelbrot lab_mandelbrot [.] main
- 72.22% main
+ 28.42% hypot
__hypot_finite
19.87% __muldc3
3.45% __muldc3@plt
2.19% cabs@plt
+ 1.85% QColor::rgb
1.61% QImage::width@plt
1.26% QImage::height@plt
0.97% QColor::fromHsvF
+ 0.90% QApplicationPrivate::init
0.66% QImage::setPixel
+ 21.44% _start
+ 83.34% 0.00% lab_mandelbrot libc-2.24.so [.] __libc_start_main
+ 83.33% 0.00% lab_mandelbrot lab_mandelbrot [.] _start
...
perf report
Bottom-up self cost report:
$ perf report --no-children
Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769
Overhead Command Shared Object Symbol
- 31.76% lab_mandelbrot lab_mandelbrot [.] main
- main
- __libc_start_main
_start
- 23.31% lab_mandelbrot libm-2.24.so [.] __hypot_finite
- __hypot_finite
- 22.56% hypot
main
__libc_start_main
_start
- 23.04% lab_mandelbrot libgcc_s.so.1 [.] __muldc3
- __muldc3
+ main
- 5.90% lab_mandelbrot libm-2.24.so [.] hypot
+ hypot
...
perf report
Show file and line numbers:
$ perf report --no-children -s dso,sym,srcline
Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769
Overhead Shared Object Symbol Source:Line
- 7.82% lab_mandelbrot [.] main mandelbrot.h:41
+ main
- 7.79% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1945
__muldc3
main
__libc_start_main
_start
- 7.46% lab_mandelbrot [.] main complex:1326
- main
+ __libc_start_main
- 6.94% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1944
__muldc3
main
__libc_start_main
_start
...
perf report
Show file and line numbers in backtraces:
$ perf report --no-children -s dso,sym,srcline -g address
Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769
Overhead Shared Object Symbol Source:Line
- 7.82% lab_mandelbrot [.] main mandelbrot.h:41
- 2.84% main mandelbrot.h:41
__libc_start_main +241
_start +4194346
2.58% main mandelbrot.h:41
- 2.01% main mandelbrot.h:41
__libc_start_main +241
_start +4194346
- 7.79% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1945
+ 3.93% __muldc3 libgcc2.c:1945
+ 3.72% __muldc3 libgcc2.c:1945
- 7.46% lab_mandelbrot [.] main complex:1326
- 4.65% main complex:1326
__libc_start_main +241
_start +4194346
2.81% main complex:1326
...
perf config
Configure default output format:
[report]
children = false
sort_order = dso,sym,srcline
[call-graph]
record-mode = dwarf
print-type = graph
order = caller
sort-key = address
man perf config
FlameGraphs
perf script report stackcollapse | flamegraph.pl > graph.svg
Flame Graph Search
_start
__hypot_finite
__muldc3
__libc_start_main
hypot
main
c..
lab_mandelbrot
Moretopics?
Cross-machineReporting
When recording machine has symbols available:
# on first machine:
$ perf record ...
$ perf archive
Now please run:
$ tar xvf perf.data.tar.bz2 -C ~/.debug
wherever you need to run 'perf report' on.
# on second machine:
$ rsync machine1:path/to/perf.data{,tar.bz2} .
$ tar xf perf.data.tar.bz2 -C ~/.debug
$ perf report
Cross-machineReporting
When reporting machine has symbols available:
# on first machine:
$ perf record ...
# on second machine:
$ rsync machine1:path/to/perf.data .
$ perf report --symfs /path/to/sysroot
Sleep-timeProfiling
#!/bin/bash
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record 
--event sched:sched_stat_sleep/call-graph=fp/ 
--event sched:sched_process_exit/call-graph=fp/ 
--event sched:sched_switch/call-graph=dwarf/ 
--output perf.data.raw $@
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
Sleep-timeProfiling
$ perf-sleep-record ./ex_sleep
$ perf report
Samples: 24 of event 'sched:sched_switch', Event count (approx.): 8883195296
Overhead Trace output
- 100.00% ex_sleep:24938 [120] S ==> swapper/7:0 [120]
- 90.07% main main.cpp:10
QThread::sleep +11
0x1521ed
__nanosleep .:0
entry_SYSCALL_64_fastpath entry_64.o:0
sys_nanosleep +18446744071576748154
hrtimer_nanosleep +18446744071576748225
do_nanosleep hrtimer.c:0
schedule +18446744071576748092
__schedule core.c:0
+ 9.02% main main.cpp:11
+ 0.91% main main.cpp:6
perf script
Convert perf.data to callgrind format:
$ perf record --call-graph dwarf ...
$ perf script report callgrind > perf.callgrind
$ kcachegrind perf.callgrind
github.com/milianw/linux/.../callgrind.py
perf script
Convert perf.data to callgrind format:
Questions?
kevin.funk@kdab.com
https://www.kdab.com/
We offer trainings and workshops!Debugging and Profiling
More perf work from my colleague:
github.com/milianw/linux/tree/milian/perf
git clone -b milian/perf https://github.com/milianw/linux.git

More Related Content

What's hot

10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化Takuya ASADA
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracingViller Hsiao
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFBrendan Gregg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤTakashi Hoshino
 
Extreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningExtreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningMilind Koyande
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM ShapeBlue
 

What's hot (20)

10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ
 
Extreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningExtreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and Tuning
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 

Viewers also liked

A possible future of resource constrained software development
A possible future of resource constrained software developmentA possible future of resource constrained software development
A possible future of resource constrained software developmentemBO_Conference
 
Programming at Compile Time
Programming at Compile TimeProgramming at Compile Time
Programming at Compile TimeemBO_Conference
 
standardese - a WIP next-gen Doxygen
standardese - a WIP next-gen Doxygenstandardese - a WIP next-gen Doxygen
standardese - a WIP next-gen DoxygenemBO_Conference
 
'Embedding' a meta state machine
'Embedding' a meta state machine'Embedding' a meta state machine
'Embedding' a meta state machineemBO_Conference
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsemBO_Conference
 
Data-driven HAL generation
Data-driven HAL generationData-driven HAL generation
Data-driven HAL generationemBO_Conference
 

Viewers also liked (6)

A possible future of resource constrained software development
A possible future of resource constrained software developmentA possible future of resource constrained software development
A possible future of resource constrained software development
 
Programming at Compile Time
Programming at Compile TimeProgramming at Compile Time
Programming at Compile Time
 
standardese - a WIP next-gen Doxygen
standardese - a WIP next-gen Doxygenstandardese - a WIP next-gen Doxygen
standardese - a WIP next-gen Doxygen
 
'Embedding' a meta state machine
'Embedding' a meta state machine'Embedding' a meta state machine
'Embedding' a meta state machine
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
 
Data-driven HAL generation
Data-driven HAL generationData-driven HAL generation
Data-driven HAL generation
 

Similar to Profiling your Applications using the Linux Perf Tools

Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...confluent
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018Fabio Janiszevski
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time OptimizationKan-Ru Chen
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeSasha Goldshtein
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenLex Yu
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionYunong Xiao
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightLinaro
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and toolszhang hua
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafkaDori Waldman
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka Dori Waldman
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 

Similar to Profiling your Applications using the Linux Perf Tools (20)

Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In Production
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and tools
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 

Recently uploaded

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Profiling your Applications using the Linux Perf Tools

  • 3. Intermission:WhoamI? Software Engineering Consultant at KDAB since 2010 FOSS enthusiast working on Qt/C++ at KDE since 2006 Lead developer of the KDevelop IDE mainly on the C/C++ support backed by Clang as well as cross-platform support
  • 4. Setup Hardware Linux Kernel prerequisites Building user-space perf Cross-compiling Permissions
  • 6. LinuxKernelPrerequisites $ uname -r # should be at least 3.7 4.7.1-1-ARCH $ zgrep PERF /proc/config.gz CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_HAVE_PERF_USER_STACK_DUMP=y CONFIG_HAVE_PERF_REGS=y ...
  • 7. BuildingUser-spaceperf git clone https://github.com/torvalds/linux.git cd linux/tools/perf export CC=gcc # clang is not supported make
  • 8. Dependencies Auto-detecting system features: ... dwarf: [ on ] # for symbol resolution ... dwarf_getlocations: [ on ] # for symbol resolution ... glibc: [ on ] ... gtk2: [ on ] ... libaudit: [ on ] # for syscall tracing ... libbfd: [ on ] # for symbol resolution ... libelf: [ on ] # for symbol resolution ... libnuma: [ on ] ... numa_num_possible_cpus: [ on ] ... libperl: [ on ] # for perl bindings ... libpython: [ on ] # for python bindings ... libslang: [ on ] # for TUI ... libcrypto: [ on ] # for JITed probe points ... libunwind: [ on ] # for unwinding ... libdw-dwarf-unwind: [ on ] # for unwinding ... zlib: [ on ] ... lzma: [ on ] ... get_cpuid: [ on ] ... bpf: [ on ]
  • 9. Cross-compiling make prefix=somepath ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- Common pitfalls: CC must not contain any flags CFLAGS is ignored, use EXTRA_CFLAGS prefix path ignored for include and library paths Dependency issues: linux/tools/build/feature/test-$FEATURE.make.output
  • 10. Permissions #!/bin/bash sudo mount -o remount,mode=755 /sys/kernel/debug sudo mount -o remount,mode=755 /sys/kernel/debug/tracing echo "0" | sudo tee /proc/sys/kernel/kptr_restrict echo "-1" | sudo tee /proc/sys/kernel/perf_event_paranoid sudo chown root:tracing /sys/kernel/debug/tracing/uprobe_events sudo chmod g+rw /sys/kernel/debug/tracing/uprobe_events
  • 11. Benchmarking Be scientific! Take variance into account Compare before/after measurements
  • 12. perf stat $ perf stat -r 5 -o baseline.txt -- ./ex_branches $ cat baseline.txt Performance counter stats for './ex_branches' (5 runs): 807.951072 task-clock:u (msec) # 0.999 CPUs utilized ( +- 1.97% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 520 page-faults:u # 0.643 K/sec ( +- 0.15% ) 2,487,366,239 cycles:u # 3.079 GHz ( +- 1.97% ) 1,484,737,283 instructions:u # 0.60 insn per cycle ( +- 0.00% ) 329,602,843 branches:u # 407.949 M/sec ( +- 0.00% ) 80,476,858 branch-misses:u # 24.42% of all branches ( +- 0.06% ) 0.808952447 seconds time elapsed ( +- 1.97% )
  • 13. Kernelvs.Userspace Use event modifiers to separate domains: $ perf stat -r 5 --event=cycles:{k,u} -- ./ex_qdatetime Performance counter stats for './ex_qdatetime' (5 runs): 13,337,722 cycles:k ( +- 3.82% ) 9,745,474 cycles:u ( +- 1.58% ) 0.008018321 seconds time elapsed ( +- 4.02% ) See man perf list for more.
  • 14. perf list $ perf list List of pre-defined events (to be used in -e): branch-misses [Hardware event] cache-misses [Hardware event] cpu-cycles OR cycles [Hardware event] instructions [Hardware event] ref-cycles [Hardware event] ... alignment-faults [Software event] context-switches OR cs [Software event] page-faults OR faults [Software event] ... sched:sched_stat_sleep [Tracepoint event] sched:sched_stat_iowait [Tracepoint event] sched:sched_stat_runtime [Tracepoint event] ... syscalls:sys_enter_futex [Tracepoint event] syscalls:sys_exit_futex [Tracepoint event] ...
  • 16. perf top System-wide live profiling: $ perf top Samples: 12K of event 'cycles:ppp', Event count (approx.): 5456372201 Overhead Shared Object Symbol 13.11% libQt5Core.so.5.7.0 [.] QHashData::nextNode 5.08% libQt5Core.so.5.7.0 [.] operator== 2.90% libQt5Core.so.5.7.0 [.] 0x000000000012f0d1 2.33% libQt5DBus.so.5.7.0 [.] 0x000000000002281f 1.62% libQt5DBus.so.5.7.0 [.] 0x0000000000022810 ...
  • 18. UnwindingandCallStacks frame pointers (fp) debug information (dwarf) Last Branch Record (lbr)
  • 19. Recommendation On embedded: enable frame pointers On the desktop: rely on DWARF On Intel: play with LBR
  • 20. perf record Profile new application and its children: $ perf record --call-graph dwarf -- ./lab_mandelbrot -b 5 [ perf record: Woken up 256 times to write data ] [ perf record: Captured and wrote 64.174 MB perf.data (7963 samples) ]
  • 21. perf record Attach to running process: $ perf record --call-graph dwarf --pid $(pidof ...) # wait for some time, then quit with CTRL + C [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 3.904 MB perf.data (70 samples) ]
  • 22. perf record Profile whole system for some time: $ perf record -a -- sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.498 MB perf.data (2731 samples) ]
  • 24. perf report Top-down inclusive cost report: $ perf report Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769 Children Self Command Shared Object Symbol - 93.67% 31.76% lab_mandelbrot lab_mandelbrot [.] main - 72.22% main + 28.42% hypot __hypot_finite 19.87% __muldc3 3.45% __muldc3@plt 2.19% cabs@plt + 1.85% QColor::rgb 1.61% QImage::width@plt 1.26% QImage::height@plt 0.97% QColor::fromHsvF + 0.90% QApplicationPrivate::init 0.66% QImage::setPixel + 21.44% _start + 83.34% 0.00% lab_mandelbrot libc-2.24.so [.] __libc_start_main + 83.33% 0.00% lab_mandelbrot lab_mandelbrot [.] _start ...
  • 25. perf report Bottom-up self cost report: $ perf report --no-children Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769 Overhead Command Shared Object Symbol - 31.76% lab_mandelbrot lab_mandelbrot [.] main - main - __libc_start_main _start - 23.31% lab_mandelbrot libm-2.24.so [.] __hypot_finite - __hypot_finite - 22.56% hypot main __libc_start_main _start - 23.04% lab_mandelbrot libgcc_s.so.1 [.] __muldc3 - __muldc3 + main - 5.90% lab_mandelbrot libm-2.24.so [.] hypot + hypot ...
  • 26. perf report Show file and line numbers: $ perf report --no-children -s dso,sym,srcline Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769 Overhead Shared Object Symbol Source:Line - 7.82% lab_mandelbrot [.] main mandelbrot.h:41 + main - 7.79% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1945 __muldc3 main __libc_start_main _start - 7.46% lab_mandelbrot [.] main complex:1326 - main + __libc_start_main - 6.94% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1944 __muldc3 main __libc_start_main _start ...
  • 27. perf report Show file and line numbers in backtraces: $ perf report --no-children -s dso,sym,srcline -g address Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769 Overhead Shared Object Symbol Source:Line - 7.82% lab_mandelbrot [.] main mandelbrot.h:41 - 2.84% main mandelbrot.h:41 __libc_start_main +241 _start +4194346 2.58% main mandelbrot.h:41 - 2.01% main mandelbrot.h:41 __libc_start_main +241 _start +4194346 - 7.79% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1945 + 3.93% __muldc3 libgcc2.c:1945 + 3.72% __muldc3 libgcc2.c:1945 - 7.46% lab_mandelbrot [.] main complex:1326 - 4.65% main complex:1326 __libc_start_main +241 _start +4194346 2.81% main complex:1326 ...
  • 28. perf config Configure default output format: [report] children = false sort_order = dso,sym,srcline [call-graph] record-mode = dwarf print-type = graph order = caller sort-key = address man perf config
  • 29. FlameGraphs perf script report stackcollapse | flamegraph.pl > graph.svg Flame Graph Search _start __hypot_finite __muldc3 __libc_start_main hypot main c.. lab_mandelbrot
  • 31. Cross-machineReporting When recording machine has symbols available: # on first machine: $ perf record ... $ perf archive Now please run: $ tar xvf perf.data.tar.bz2 -C ~/.debug wherever you need to run 'perf report' on. # on second machine: $ rsync machine1:path/to/perf.data{,tar.bz2} . $ tar xf perf.data.tar.bz2 -C ~/.debug $ perf report
  • 32. Cross-machineReporting When reporting machine has symbols available: # on first machine: $ perf record ... # on second machine: $ rsync machine1:path/to/perf.data . $ perf report --symfs /path/to/sysroot
  • 33. Sleep-timeProfiling #!/bin/bash echo 1 | sudo tee /proc/sys/kernel/sched_schedstats perf record --event sched:sched_stat_sleep/call-graph=fp/ --event sched:sched_process_exit/call-graph=fp/ --event sched:sched_switch/call-graph=dwarf/ --output perf.data.raw $@ echo 0 | sudo tee /proc/sys/kernel/sched_schedstats perf inject --sched-stat --input perf.data.raw --output perf.data
  • 34. Sleep-timeProfiling $ perf-sleep-record ./ex_sleep $ perf report Samples: 24 of event 'sched:sched_switch', Event count (approx.): 8883195296 Overhead Trace output - 100.00% ex_sleep:24938 [120] S ==> swapper/7:0 [120] - 90.07% main main.cpp:10 QThread::sleep +11 0x1521ed __nanosleep .:0 entry_SYSCALL_64_fastpath entry_64.o:0 sys_nanosleep +18446744071576748154 hrtimer_nanosleep +18446744071576748225 do_nanosleep hrtimer.c:0 schedule +18446744071576748092 __schedule core.c:0 + 9.02% main main.cpp:11 + 0.91% main main.cpp:6
  • 35. perf script Convert perf.data to callgrind format: $ perf record --call-graph dwarf ... $ perf script report callgrind > perf.callgrind $ kcachegrind perf.callgrind github.com/milianw/linux/.../callgrind.py
  • 36. perf script Convert perf.data to callgrind format:
  • 37. Questions? kevin.funk@kdab.com https://www.kdab.com/ We offer trainings and workshops!Debugging and Profiling More perf work from my colleague: github.com/milianw/linux/tree/milian/perf git clone -b milian/perf https://github.com/milianw/linux.git