This document provides an overview of cBPF and eBPF. It discusses the history and implementation of cBPF, including how it was originally used for packet filtering. It then covers eBPF in more depth, explaining what it is, its history, implementation including different program types and maps. It also discusses several uses of eBPF including networking, firewalls, DDoS mitigation, profiling, security, and chaos engineering. Finally, it introduces XDP and DPDK, comparing XDP's benefits over DPDK.
5. Michael Kehoe
$ WHOAMI
• Sr Staff Site Reliability Engineer @
LinkedIn
• Production-SRE Team
• What I do:
• Disaster Recovery
• (Organizational) Visibility Engineering
• Incident Management
• Reliability Research
7. “BPF is a highly flexible and efficient virtual
machine-like construct in the Linux kernel
allowing to execute bytecode at various hook
points in a safe manner. It is used in a number
of Linux kernel subsystems, most prominently
networking, tracing and security (e.g.
sandboxing).”
C i l i u m
8. What is cBPF?
• cBPF – Classic BPF
• Also known as “Linux Packet Filtering”
• BPF was first introduced in 1992 by
Steven McCanne and Van Jacobson in
BSD
• Better known as the packet filter
language in tcpdump
9. What is cBPF?
• Network packet filtering, Seccomp
• Filter Expressions Bytecode
Interpret
• Small, in-kernel VM, Register based,
switch dispatch interpreter, few
instructions
• BPF uses a simple, non-shared buffer
model made possible by today’s larger
address space
11. History of BPF
• Before BPF, each OS (Sun, DEC, SGI
etc) had its own packet filtering API
• In 1993: Steven McCanne & Van
Jacobsen released a paper titled the
BSD Packet Filter (BPF)
• Implemented as “Linux Socket Filter” in
kernel 2.2
• While maintaining the BPF language (for
describing filters), uses a different
internal architecture
13. BPF (original) implementation
• Open a special-purpose
character-device, namely
/dev/bpfn, for dealing with
raw packets.
• Associate the previous
device with a network
interface by using the
ioctl(2) system call
https://www.tcpdump.org/papers/bpf-usenix93.pdf
14. BPF (original) implementation
• Set various BPF
parameters, (e.g. buffer
size, attach some BPF
filters ) This is done using
the ioctl(2) system call
• Read packets from the
kernel, or send raw packets,
by reading/writing to the
corresponding file descriptor
of /dev/bpf using
read(2)/write(2) system callshttps://www.tcpdump.org/papers/bpf-usenix93.pdf
15. BPF (LSF) implementation
• Utilizes sockets for
passing/receiving packets
to/from the kernel-space
• Filters are attached with the
setsockopt(2) system call
https://www.tcpdump.org/papers/bpf-usenix93.pdf
16. BPF (LSF) implementation
• Create a special-purpose
socket (i.e., PF_PACKET) 2
• Attach a BPF program to
the socket using the
setsockopt(2) system call
https://www.tcpdump.org/papers/bpf-usenix93.pdf
17. BPF (LSF) implementation
• Set the network interface to
promiscuous mode with
ioctl(2) (optionally)
• Read packets from the
kernel, or send raw
packets, by reading/writing
to the file descriptor of the
socket using
recvfrom(2)/sendto(2)
system calls
https://www.tcpdump.org/papers/bpf-usenix93.pdf
18. BPF (LSF) implementation
TCPDUMP EXAMPLE
https://static.sched.com/hosted_files/kccnceu19/b8/KubeCon-Europe-2019-Beatriz_Martinez_eBPF.pdf
25. What is eBPF?
• eBPF – extended Berkeley Packet Filter
• User-defined, sandboxed bytecode
executed by the kernel
• VM that implements a RISC-like
assembly language in kernel space
• All interactions between kernel/ user
space are done through eBPF “maps”
• eBPF does not allow loops
26. What is eBPF?
• Similar to LSF, but with the following
improvements:
• More registers, JIT compiler (flexible/ faster),
verifier
• Attach on Tracepoint, Kprobe, Uprobe, USDT
• In-kernel trace aggregation & filtering
• Control via bpf()
• Designed for general event processing within
the kernel
• All interactions between kernel/ user space
are done through eBPF “maps”
28. History of BPF
• 3.15: Optimization of BPF Interpreter’s instruction
set
• 3.18: Linux eBPF was released (bpf() syscall)
• 3.19: Socket supports, BPF Maps
• 4.1: Kprobe support
• 4.4: Perf events
• 4.7: Attach to tracepoints
• 4.8: XDP core
• 4.10: cgroups support
• 4.18: bpfilter released
http://hsdm.dorsal.polymtl.ca/system/files/eBPF-5May2017%20%281%29.pdf
32. (e)BPF Program Types
• prog_type determines the
subset of kernel helper
functions that the program
may call
• Determines the program
input (bpf_context)
https://www.tcpdump.org/papers/bpf-usenix93.pdf
33. (e)BPF Program Types
SOCKET-RELATED
• SOCKET_FILTER: Filtering actions (e.g. drop packets)
• SK_SKB: Access SKB and docket details with a view to redirect
SKB’s
• SOCK_OPS – Catch socket operations
• XDP: Allows access to packet data as early as possible (DDoS
mitigation/ Load-balancing)
https://www.tcpdump.org/papers/bpf-usenix93.pdf
34. (e)BPF Program Types
XDP
• XDP: Allows access to packet data as early as possible (DDoS
mitigation/ Load-balancing)
https://www.tcpdump.org/papers/bpf-usenix93.pdf
35. (e)BPF Program Types
KPROBES, TRACEPOINTS & PERF
• KPROBE – Instrument code in any kernel function
• TRACEPOINT – Instrument tracepoints in kernel code
• PERF_EVENT: Instrument software and hardware perf events
https://www.tcpdump.org/papers/bpf-usenix93.pdf
36. (e)BPF Program Types
CGROUPS
• CGROUP_SKB – Allow or deny network access on IP egress/
ingress
• CGROUP_SOCK – Allow or deny network access at various
socket-lreated events
• CGROUP_DEVICE – Determine if a device operation should be
permitted
https://www.tcpdump.org/papers/bpf-usenix93.pdf
37. (e)BPF Program Types
LIGHTWEIGHT TUNNELS
• LWT_IN – Examine inbound packets for lightweight tunnel de-
encapsulation
• LWT_OUT – Implement encapsulation tunnels for specific
destination routes
• LWT_XMIT – Allowed to modify content and prepend a L2 header
https://www.tcpdump.org/papers/bpf-usenix93.pdf
38. (e)BPF Program Types
TRAFFIC CONTROL
• SCHED_CLS: A network traffic-control classifier
• SCHED_ACT: A network traffic-control action
https://www.tcpdump.org/papers/bpf-usenix93.pdf
40. (e)BPF Maps
• Generic structure for
storage of different types of
data
• Allow sharing of data
between:
• eBPF kernel program
• Kernel and user-space
https://www.tcpdump.org/papers/bpf-usenix93.pdf
41. (e)BPF Maps
• Each map has the following
attributes:
• Type
• Max number of elements
• Key Size (bytes)
• Value Size (bytes)
http://man7.org/linux/man-pages/man2/bpf.2.html
42. (e)BPF Maps
• HASH - A hash table
• ARRAY- An array map, optimized for fast lookup speeds
• PROG_ARRAY - An array of FD’s corresponding to eBPF
programs
• PERCPU_ARRAY - A per-CPU array, used to implement
histograms
• PERF_EVENT_ARRAY - Stores pointers to struct perf_event
• CGROUP_ARRAY – Stores pointers to control groups
https://lwn.net/Articles/740157/
43. (e)BPF Maps
• LRU_HASH - A hash table that only retains the most recently
used items
• LRU_PER_CPU_HASH - A per-CPU hash table that only retains
the most recently used items
• LPM_TRIE - A longest-prefix match true, good for matching IP
addresses
• STACK_TRACE - Stores stack traces
• ARRAY_OF_MAPS - A map-in-map data structure
• HASH_OF_MAPS – A map-in-map data structurehttps://lwn.net/Articles/740157/
44. (e)BPF Maps
• DEVICE_MAP - For storing and looking up network device
references
• SOCKET_MAP – Stores and looks up sockets and allows
redirection
https://lwn.net/Articles/740157/
46. What
can BPF
be used
for?
1 Networking (e.g. load balancing)
2 Firewalls
3 DDOS mitigation
4 Profiling & Tracing
5 Container Security
6 Device Drivers
7 Chaos Engineering
47. What can BPF be used for
NETWORKING
• Load-balancing
• Katran (Facebook)
• General networking
• Cilium
• Extending the TCP stack
• Network Monitoring
• Flowmill
• Weaveworks
48. What can BPF be used for
FIREWALLS
• Bpfilter (Linux 4.18)
49. What can BPF be used for
DDOS MITIGATION
• Use of eBPF & XDP to perform infra-wide
DDoS mitigation
• Facebook
• Cloudflare
50. What can BPF be used for
PROFILE & TRACING
• Sysdig
• bpftrace
51. What can BPF be used for
SECURITY
• Cilium
• Seccomp BPF
52. What can BPF be used for
DEVICE DRIVERS
• eBPF provides a pseudo device driver
possible to extend this in multiple ways
53. What can BPF be used for
CHAOS ENGINEERING
• Use Cilium to inject latency, packet-loss,
L7 HTTP errors (via a Go extension)
55. Introduction to XDP
• XDP – eXpress Data Path
• High performance, programmable
network data path (IO Visor Project)
• Linux Kernels answer for DPDK
(Released in 4.8)
56. Introduction to XDP
• Features:
• Does not require specialized hardware
• Does not require kernel bypass
• Does not replace TCP/ IP stack
• Works with TCP/ IP stack with eBPF
57. Introduction to XDP
• XDP program runs as soon as the packet
gets to the network driver
• XDP program needs to edit with an
action:
• XDP_TX
• XDP_DROP
• XDP_PASS
59. Introduction to DPDK
• DPDK – Data Plane Development Kit
• Created in 2010 by Intel
• Collection of data plane libraries & NIC
drivers for fast packet processing
• Open-Source under Linux Foundation
• Support for multiple CPU architectures
62. XDP & DPDK
BENEFITS OF XDP
• No 3rd party code
• Option of busy polling or interrupt driven
networking
• Removes the need to:
• Allocate large pages
• Dedicated CPU’s
• Inject packets into the kernel from 3rd
party user space
• Define a new security model
https://www.iovisor.org/technology/xdp