Notes on eBPF & libbpf

A collection of useful tips and interesting finds as I build eBPF tooling

eBPF is powerful but needs a better developer experience

As I start building out bpfdeploy.io, I wanted to publish some of the notes on the developer experience research I did. This is a collection from the month of January 2022, but the date attached reflects when this was last edited.

Week 1

  • In libbpf, you can still use the BPF_KPROBE and BPF_KRETPROBE macros with uprobes.
  • The skeleton header generation part of libbpf is part of bpftool. Aka bpftool gen skeleton prog.o > prog.skel.h
  • (Because I keep forgetting) Current way of compiling BPF programs: clang -g -O2 -c -target bpf -D__TARGET_ARCH_x86 -o mybpfobject.o mybpfcode.bpf.c (should be another alias). The arch define (-D) is required.
  • With vmlinux.h, you don't need to #include kernel headers. vmlinux.h will contain the kernel types. Link
  • For reliable field access on older kernels, use BPF_CORE_READ yet apparently in some program types on some kernel versions, native C syntax is possible and reliable?
  • Alias to add:
    alias vmlinux="bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h"
  • Could there be security implications of publically releasing vmlinux.h since it contains the layout of kernel structures?
  • You can use raw tracepoints to attach to a generic sys call handler: raw_tracepoint/sys_enter. Unsure if you can attach a raw tracepoint to a specific sys call? Can't find any examples in the kernel tree
  • Talk on BPF Raw Tracepoints: Link
  • Syscall functions in 4.17 kernels were rewritten to have an architecture prefix. So something like sys_open was converted into __x64_sys_kill which needs to be accounted for in kprobe/kretprobe(s).
  • Auto bumping of RLIMIT_MEMLOCK coming to libbpf: Link
  • Talk on rough BPF user experience: Link
  • Context for each bpf program type is defined in kernel src
  • The current best places to look up latest practices on libbpf use are: bcc's libbpf-tools and bpf-next's samples/testing trees.
  • In bpftrace, kfunc is the equivalent for fentry probe types. fentry probes are not strictly equivalent to kprobes but there is overlap.
  • libbpf's BPF_KPROBE does not work well with syscalls. They are considering adding specific BPF_KPROBE_SYSCALL/BPF_KRETPROBE_SYSCALL macros.
  • Using bpf_printk in a bpf program was breaking libbpf-rs's skeleton generation. This seemed to only affect libbpf-cargo but there have been no new releases with this fix so pinned a specific commit in Cargo.toml.

Week 2

  • bpftrace allegedly has the ability to look up a program's debug symbol information remotely but this isn't documented anywhere?
  • Generating PID to cgroup mappings is not always straightforward which can affect BPF-based analysis on the container unit concept
  • Different BPF program types have different natural lifetimes but lengthening them is possible with BPFFS.
  • To get a rough look at possible BPF program types in the kernel tree: rg "^SEC\(" ./ | awk -F: '{print $2}' | sort | uniq (Note: rg is just a stand-in for grep).
  • To look up possible uprobe attachment hooks in a program you can use nm(1) such as in nm /path/to/bin.
  • There is an open issue in bpftrace for USDTs where the provider specified in the user static tracepoint has to match the binary name. Symlinking the bin to a temporary file looks to be a workaround.
  • The libbpf version of the same bpftrace program bashreadline.bt is an order of magnitude difference in size. Tradeoffs
  • BPF helper bpf_get_stackid(7) makes it possible to analyze user or kernel stacks but it looks to be more convenient to try with bpftrace.
  • Because of Golang's coroutine stack processing, uretprobes need special handling
  • Can pass in -ex disassemble {function_symbol_name} to gdb to get offsets: From bpftrace

Week 3

  • Attaching uprobes and USDT currently requires offset lookup but there is work at Facebook to be able to attach USDT probes by name.
  • libbpf provides a btf_dump__dump_type_data function that allows you to pretty print data. Doesn't look simple to use but helpful for debugging once wired up. Doesn't seem to exist in libbpf-rs?
  • CAP_BPF opens up the bpf() sycall (rather than relying on the larger CAP_SYS_ADMIN) but it needs to be paired with other capabilities for most programs.
  • BPF spin locks (struct bpf_spin_lock) need to be tucked away in map elements themselves to be used but unsure on when exactly I should use this? Won't most BPF applications be accessing BPF maps concurrently?
  • Certain helpers such as bpf_copy_from_user() are allowed to only be called from sleepable BPF programs
  • Katran is an L4 lb based on BPF & XDP coming out of Facebook.
  • Work on sleepable BPF iterators coming through with helpers to access user space pointers. User data access, unlike kernel data, can page fault so sleeping is necessary.
  • Bpftool mirror now available on Github
  • There many BPF runtimes, including development for FreeBSD, Windows and userspace VMs. This creates a push for portable BPF programs (and possibly a BPF runtime spec?.
  • BCC collects a list of available BPF features by kernel version
  • In libbpf, bpf_map__resize() is being deprecated for bpf_map__set_max_entries(). libbpf is not stable yet but I do wonder how much churn these API changes create on higher level tooling such as BCC's libbpf-tools.
  • In kernel snprintf (used by printk) can use BTF type information to dump internal data structures.

Week 4

References

Written by