3. Intermission:WhoamI?
Software Engineering Consultant at KDAB since 2010
FOSS enthusiast working on Qt/C++ at KDE since 2006
Lead developer of the KDevelop IDE
mainly on the C/C++ support backed by Clang
as well as cross-platform support
8. Dependencies
Auto-detecting system features:
... dwarf: [ on ] # for symbol resolution
... dwarf_getlocations: [ on ] # for symbol resolution
... glibc: [ on ]
... gtk2: [ on ]
... libaudit: [ on ] # for syscall tracing
... libbfd: [ on ] # for symbol resolution
... libelf: [ on ] # for symbol resolution
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ] # for perl bindings
... libpython: [ on ] # for python bindings
... libslang: [ on ] # for TUI
... libcrypto: [ on ] # for JITed probe points
... libunwind: [ on ] # for unwinding
... libdw-dwarf-unwind: [ on ] # for unwinding
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
9. Cross-compiling
make prefix=somepath ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
Common pitfalls:
CC must not contain any flags
CFLAGS is ignored, use EXTRA_CFLAGS
prefix path ignored for include and library paths
Dependency issues:
linux/tools/build/feature/test-$FEATURE.make.output
10. Permissions
#!/bin/bash
sudo mount -o remount,mode=755 /sys/kernel/debug
sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
echo "0" | sudo tee /proc/sys/kernel/kptr_restrict
echo "-1" | sudo tee /proc/sys/kernel/perf_event_paranoid
sudo chown root:tracing /sys/kernel/debug/tracing/uprobe_events
sudo chmod g+rw /sys/kernel/debug/tracing/uprobe_events
13. Kernelvs.Userspace
Use event modifiers to separate domains:
$ perf stat -r 5 --event=cycles:{k,u} -- ./ex_qdatetime
Performance counter stats for './ex_qdatetime' (5 runs):
13,337,722 cycles:k ( +- 3.82% )
9,745,474 cycles:u ( +- 1.58% )
0.008018321 seconds time elapsed ( +- 4.02% )
See man perf list for more.
14. perf list
$ perf list
List of pre-defined events (to be used in -e):
branch-misses [Hardware event]
cache-misses [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
...
alignment-faults [Software event]
context-switches OR cs [Software event]
page-faults OR faults [Software event]
...
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_iowait [Tracepoint event]
sched:sched_stat_runtime [Tracepoint event]
...
syscalls:sys_enter_futex [Tracepoint event]
syscalls:sys_exit_futex [Tracepoint event]
...
20. perf record
Profile new application and its children:
$ perf record --call-graph dwarf -- ./lab_mandelbrot -b 5
[ perf record: Woken up 256 times to write data ]
[ perf record: Captured and wrote 64.174 MB perf.data (7963 samples) ]
21. perf record
Attach to running process:
$ perf record --call-graph dwarf --pid $(pidof ...)
# wait for some time, then quit with CTRL + C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.904 MB perf.data (70 samples) ]
22. perf record
Profile whole system for some time:
$ perf record -a -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.498 MB perf.data (2731 samples) ]
31. Cross-machineReporting
When recording machine has symbols available:
# on first machine:
$ perf record ...
$ perf archive
Now please run:
$ tar xvf perf.data.tar.bz2 -C ~/.debug
wherever you need to run 'perf report' on.
# on second machine:
$ rsync machine1:path/to/perf.data{,tar.bz2} .
$ tar xf perf.data.tar.bz2 -C ~/.debug
$ perf report
32. Cross-machineReporting
When reporting machine has symbols available:
# on first machine:
$ perf record ...
# on second machine:
$ rsync machine1:path/to/perf.data .
$ perf report --symfs /path/to/sysroot