02 Jul 2022

Notes on eBPF & libbpf

A collection of useful tips and interesting finds as I build eBPF tooling

eBPF is powerful but needs a better developer experience

As I start building out bpfdeploy.io, I wanted to publish some of the notes on the developer experience research I did. This is a collection from the month of January 2022, but the date attached reflects when this was last edited.

Week 1

In libbpf, you can still use the BPF_KPROBE and BPF_KRETPROBE macros with uprobes.
The skeleton header generation part of libbpf is part of bpftool. Aka bpftool gen skeleton prog.o > prog.skel.h
(Because I keep forgetting) Current way of compiling BPF programs: clang -g -O2 -c -target bpf -D__TARGET_ARCH_x86 -o mybpfobject.o mybpfcode.bpf.c (should be another alias). The arch define (-D) is required.
With vmlinux.h, you don't need to #include kernel headers. vmlinux.h will contain the kernel types. Link
For reliable field access on older kernels, use BPF_CORE_READ yet apparently in some program types on some kernel versions, native C syntax is possible and reliable?
Alias to add:
alias vmlinux="bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h"
Could there be security implications of publically releasing vmlinux.h since it contains the layout of kernel structures?
You can use raw tracepoints to attach to a generic sys call handler: raw_tracepoint/sys_enter. Unsure if you can attach a raw tracepoint to a specific sys call? Can't find any examples in the kernel tree
Talk on BPF Raw Tracepoints: Link
Syscall functions in 4.17 kernels were rewritten to have an architecture prefix. So something like sys_open was converted into __x64_sys_kill which needs to be accounted for in kprobe/kretprobe(s).
Auto bumping of RLIMIT_MEMLOCK coming to libbpf: Link
Talk on rough BPF user experience: Link
Context for each bpf program type is defined in kernel src
The current best places to look up latest practices on libbpf use are: bcc's libbpf-tools and bpf-next's samples/testing trees.
In bpftrace, kfunc is the equivalent for fentry probe types. fentry probes are not strictly equivalent to kprobes but there is overlap.
libbpf's BPF_KPROBE does not work well with syscalls. They are considering adding specific BPF_KPROBE_SYSCALL/BPF_KRETPROBE_SYSCALL macros.
Using bpf_printk in a bpf program was breaking libbpf-rs's skeleton generation. This seemed to only affect libbpf-cargo but there have been no new releases with this fix so pinned a specific commit in Cargo.toml.

Week 2

bpftrace allegedly has the ability to look up a program's debug symbol information remotely but this isn't documented anywhere?
Generating PID to cgroup mappings is not always straightforward which can affect BPF-based analysis on the container unit concept
Different BPF program types have different natural lifetimes but lengthening them is possible with BPFFS.
To get a rough look at possible BPF program types in the kernel tree: rg "^SEC\(" ./ | awk -F: '{print $2}' | sort | uniq (Note: rg is just a stand-in for grep).
To look up possible uprobe attachment hooks in a program you can use nm(1) such as in nm /path/to/bin.
There is an open issue in bpftrace for USDTs where the provider specified in the user static tracepoint has to match the binary name. Symlinking the bin to a temporary file looks to be a workaround.
The libbpf version of the same bpftrace program bashreadline.bt is an order of magnitude difference in size. Tradeoffs
BPF helper bpf_get_stackid(7) makes it possible to analyze user or kernel stacks but it looks to be more convenient to try with bpftrace.
Because of Golang's coroutine stack processing, uretprobes need special handling
Can pass in -ex disassemble {function_symbol_name} to gdb to get offsets: From bpftrace

Week 3

Attaching uprobes and USDT currently requires offset lookup but there is work at Facebook to be able to attach USDT probes by name.
libbpf provides a btf_dump__dump_type_data function that allows you to pretty print data. Doesn't look simple to use but helpful for debugging once wired up. Doesn't seem to exist in libbpf-rs?
CAP_BPF opens up the bpf() sycall (rather than relying on the larger CAP_SYS_ADMIN) but it needs to be paired with other capabilities for most programs.
BPF spin locks (struct bpf_spin_lock) need to be tucked away in map elements themselves to be used but unsure on when exactly I should use this? Won't most BPF applications be accessing BPF maps concurrently?
Certain helpers such as bpf_copy_from_user() are allowed to only be called from sleepable BPF programs
Katran is an L4 lb based on BPF & XDP coming out of Facebook.
Work on sleepable BPF iterators coming through with helpers to access user space pointers. User data access, unlike kernel data, can page fault so sleeping is necessary.
Bpftool mirror now available on Github
There many BPF runtimes, including development for FreeBSD, Windows and userspace VMs. This creates a push for portable BPF programs (and possibly a BPF runtime spec?.
BCC collects a list of available BPF features by kernel version
In libbpf, bpf_map__resize() is being deprecated for bpf_map__set_max_entries(). libbpf is not stable yet but I do wonder how much churn these API changes create on higher level tooling such as BCC's libbpf-tools.
In kernel snprintf (used by printk) can use BTF type information to dump internal data structures.

Week 4

There's a BTF-enabled alternative to bpf_current_task() named bpf_current_task_btf() (not currently documented?) which allows direct dereferencing.
Legacy definitions of BPF maps are officially deprecated.
Ability to generate BTF information (what vmlinux.h is based off of) is coming to bpftool itself to help with BTF-enablement of older kernels.
bpfilter progress still looks empty but some development is occurring.

References

Written by @mdaverde