eBPF (extended Berkeley Packet Filters) is a modern kernel technology that can be used to introduce dynamic tracing into a system that wasn't prepared or instrumented in any way. The tracing programs run in the kernel, are guaranteed to never crash or hang your system, and can probe every module and function -- from the kernel to user-space frameworks such as Node and Ruby.
In this workshop, you will experiment with Linux dynamic tracing first-hand. First, you will explore BCC, the BPF Compiler Collection, which is a set of tools and libraries for dynamic tracing. Many of your tracing needs will be answered by BCC, and you will experiment with memory leak analysis, generic function tracing, kernel tracepoints, static tracepoints in user-space programs, and the "baked" tools for file I/O, network, and CPU analysis. You'll be able to choose between working on a set of hands-on labs prepared by the instructors, or trying the tools out on your own test system.
Next, you will hack on some of the bleeding edge tools in the BCC toolkit, and build a couple of simple tools of your own. You'll be able to pick from a curated list of GitHub issues for the BCC project, a set of hands-on labs with known "school solutions", and an open-ended list of problems that need tools for effective analysis. At the end of this workshop, you will be equipped with a toolbox for diagnosing issues in the field, as well as a framework for building your own tools when the generic ones do not suffice.
1. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Staring into the eBPF Abyss
Sasha Goldshtein
CTO, Sela Group
@goldshtn
2. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Agenda
• Modern Linux tracing landscape
• BPF
• BCC – BPF Compiler Collection
• Using BCC tools
• Authoring BCC tools
3. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Prerequisites
• You should …
• Have experience developing on or administering a Linux deployment
• Be familiar with C/Python/Lua (a bonus)
• To use your own machines for this workshop …
• You will need Linux 4.6+
• Clone or install some open source tools (perf, bcc)
• You can also use the instructor-provided VirtualBox appliance or
Strigo workspace
• Instructions and labs:
https://github.com/goldshtn/linux-tracing-workshop
4. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Tracing on the Performance Spectrum
Invasiveness
Envelope
System completeness
Metrics &
simulations
Development Testing Production
Profilers
jprof, valgrind
Counters
top, vmstat
Debuggers
Function
tracers
Event tracers
Aggregators
SystemTap, BPF
Load
tools
ab
Light-weight profilers
perf
6. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Linux Tracing Tools, Today
Ease of use
BPF/BCC
SysDig
ktap
SystemTap
LTTng
ftrace
perf
custom .ko
new stable dead
Level of detail, features
7. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Berkeley Packet Filters (BPF)
• Originally designed for, well, packet filtering:
dst port 80 and len >= 100
• Custom instruction set, interpreted/JIT compiled
0: (bf) r6 = r1
1: (85) call 14
2: (67) r0 <<= 32
3: (77) r0 >>= 32
4: (15) if r0 == 0x49f goto pc+40
8. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Extended BPF (3.18 and ongoing)
• Attach BPF programs to kprobes/uprobes (4.1) and tracepoints (4.7)
• Data structures: array, hash (expandable), stack map (4.6)
• Output to trace buffer (4.3) and perf cyclic buffer (4.4)
• Helper functions: get time, get current comm, get current CPU, etc.
9. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
BCC: BPF Compiler Collection
• Library and Python/Lua module for compiling, loading, and executing
BPF programs
• Compile BPF program from C source
• Attach BPF program to kprobe/uprobe/tracepoint/USDT/socket
• Poll data from BPF program using Python/Lua
• Can do in-kernel aggregation and filtering
• Growing collection of tracing, networking, and performance tools
10. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
11. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
BCC
• The BCC repository contains a variety of existing scripts and tools to
get you started
• The BPF module (Python/Lua) can be used to build new tools or one-
off scripts
$ ls *.py
argdist.py
bashreadline.py
biolatency.py
biosnoop.py
biotop.py
bitesize.py
btrfsdist.py
btrfsslower.py
cachestat.py
cpudist.py
dcsnoop.py
dcstat.py
execsnoop.py
ext4dist.py
ext4slower.py
filelife.py
fileslower.py
filetop.py
funccount.py
funclatency.py
gethostlatency.py
hardirqs.py
killsnoop.py
mdflush.py
memleak.py
offcputime.py
offwaketime.py
oomkill.py
opensnoop.py
pidpersec.py
runqlat.py
softirqs.py
solisten.py
stackcount.py
stacksnoop.py
statsnoop.py
syncsnoop.py
tcpaccept.py
tcpconnect.py
tcpconnlat.py
tcpretrans.py
tplist.py
trace.py
vfscount.py
vfsstat.py
wakeuptime.py
xfsdist.py
xfsslower.py
zfsdist.py
zfsslower.py
12. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Specialized Tools
# ./hardirqs.py
Tracing hard irq event time... Hit Ctrl-C to end.
^C
HARDIRQ TOTAL_usecs
virtio0-input.0 959
ahci[0000:00:1f.2] 1290
# ./biolatency.py
Tracing block device I/O... Hit Ctrl-C to end.
^C
usecs : count distribution
64 -> 127 : 7 |********* |
128 -> 255 : 14 |****************** |
256 -> 511 : 5 |****** |
512 -> 1023 : 30 |****************************************|
1024 -> 2047 : 1 |* |
13. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Specialized Tools
# ./filetop.py
01:35:51 loadavg: 0.01 0.04 0.03 2/139 3611
PID COMM READS WRITES R_Kb W_Kb T FILE
2496 sshd 3 1 48 0 O ptmx
2939 screen 4 1 16 0 O ptmx
2496 sshd 1 3 16 0 S TCP
3611 clear 2 0 8 0 R screen
2939 screen 1 3 4 0 O 0
3589 filetop.py 2 0 2 0 R loadavg
3611 clear 1 0 0 0 R libtinfo.so.5.9
3611 clear 1 0 0 0 R libc-2.21.so
3611 filetop.py 3 0 0 0 R clear
3611 filetop.py 2 0 0 0 R ld-2.21.so
3611 clear 0 1 0 0 O 2
3589 filetop.py 0 3 0 0 O 2
# ./cachestat.py
HITS MISSES DIRTIES READ_HIT% WRITE_HIT% BUFFERS_MB CACHED_MB
0 0 0 0.0% 0.0% 54 482
842 0 0 100.0% 0.0% 54 482
889 128 0 87.4% 12.6% 54 482
14. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Specialized Tools
# ./stackcount.py __kmalloc
Tracing 1 functions for "__kmalloc"... Hit Ctrl-C to end.
^C
__kmalloc
alloc_fdtable
dup_fd
copy_process.part.31
_do_fork
sys_clone
do_syscall_64
return_from_SYSCALL_64
4
__kmalloc
create_pipe_files
__do_pipe_flags
sys_pipe
entry_SYSCALL_64_fastpath
6
__kmalloc
htree_dirblock_to_tree
ext4_htree_fill_tree
ext4_readdir
iterate_dir
SyS_getdents
entry_SYSCALL_64_fastpath
14
15. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
BPF Tracing Targets (circa July 2016)
Target Support Overhead
kprobes Native Low
uprobes Native
Medium
handler runs in KM
Kernel tracepoints NativeNEW Low
USDT tracepoints
Temporary
through uprobes
Medium
handler runs in KM
21. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Lab
#3 – Chasing a C++ Memory Leak
#4 – MySQL and Disk Stats and Stacks
#5 – Node and JVM USDT Probes
https://s.sashag.net/sreconlabs
22. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Kernel
Custom Tool Design
BPF program
Tracepoint kprobe
Python/Lua driver
App process
uprobeUSDT
Probe handler
Probe handler
Hash or
histogram
Cyclic buffer
U
K
30. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Example: Stack Map
BPF_HASH(counts, int);
BPF_STACK_TRACE(stacks, 1024);
...
int key = stacks.get_stackid(ctx, BPF_F_REUSE_STACKID);
u64 zero = 0;
u64 *val = counts.lookup_or_init(&key, &zero);
++(*val);
...
counts, stacks = bpf["counts"], bpf["stacks"]
for k, v in counts:
for addr in stacks.walk(k.value):
print(BPF.ksym(addr))
print(v.value)
31. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Custom Tool Design Tips
• Try to perform all aggregations in the BPF program and keep UM
copying to a minimum
• Limit hash/histogram/stackmap sizes, prune, keep only top entries
• Clear cyclic buffer often and quickly
32. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Deployment
• For Python tools, deploy Python + libbcc.so
• For Lua tools, deploy only bcc-lua
• Statically links libbcc.a but allows plugging libbcc.so
• Kernel build flags:
• CONFIG_BPF=y
• CONFIG_BPF_SYSCALL=y
• CONFIG_BPF_JIT=y
• CONFIG_HAVE_BPF_JIT=y
• CONFIG_BPF_EVENTS=y
33. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Lab
#6 – Contention Stats and Stacks
#7 – From BCC GitHub Issues
https://s.sashag.net/sreconlabs
34. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Summary
• Tracing can identify bugs and performance issues that no debugger or
profiler can catch
• Tools make low-overhead, dynamic, production tracing possible
• BPF is the next-generation backend for Linux tracing tools
35. SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
SRECon Europe 2016 @goldshtn https://s.sashag.net/bpfws07
Thank You!
Sasha Goldshtein
@goldshtn
Editor's Notes
Missing: cpudist
This example requires kernel 4.7 for BPF_PROG_TYPE_TRACEPOINT and a version of bcc with PR #602 merged, which introduces TRACEPOINT_PROBE.