eBPF is powerful but needs a better developer experience
As I start building out bpfdeploy.io, I wanted to publish some of the notes on the developer experience research I did. This is a collection from the month of January 2022, but the date attached reflects when this was last edited.
Week 1
- In libbpf, you can still use the
BPF_KPROBE
andBPF_KRETPROBE
macros with uprobes. - The skeleton header generation part of libbpf is part of bpftool. Aka
bpftool gen skeleton prog.o > prog.skel.h
- (Because I keep forgetting) Current way of compiling BPF programs:
clang -g -O2 -c -target bpf -D__TARGET_ARCH_x86 -o mybpfobject.o mybpfcode.bpf.c
(should be another alias). The arch define (-D
) is required. - With
vmlinux.h
, you don't need to#include
kernel headers.vmlinux.h
will contain the kernel types. Link - For reliable field access on older kernels, use
BPF_CORE_READ
yet apparently in some program types on some kernel versions, native C syntax is possible and reliable? - Alias to add:
alias vmlinux="bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h"
- Could there be security implications of publically releasing
vmlinux.h
since it contains the layout of kernel structures? - You can use raw tracepoints to attach to a generic sys call handler:
raw_tracepoint/sys_enter
. Unsure if you can attach a raw tracepoint to a specific sys call? Can't find any examples in the kernel tree - Talk on BPF Raw Tracepoints: Link
- Syscall functions in 4.17 kernels were rewritten to have an architecture prefix. So something like
sys_open
was converted into__x64_sys_kill
which needs to be accounted for in kprobe/kretprobe(s). - Auto bumping of
RLIMIT_MEMLOCK
coming to libbpf: Link - Talk on rough BPF user experience: Link
- Context for each bpf program type is defined in kernel src
- The current best places to look up latest practices on libbpf use are: bcc's libbpf-tools and bpf-next's samples/testing trees.
- In bpftrace,
kfunc
is the equivalent forfentry
probe types.fentry
probes are not strictly equivalent tokprobe
s but there is overlap. - libbpf's
BPF_KPROBE
does not work well with syscalls. They are considering adding specificBPF_KPROBE_SYSCALL
/BPF_KRETPROBE_SYSCALL
macros. - Using
bpf_printk
in a bpf program was breaking libbpf-rs's skeleton generation. This seemed to only affect libbpf-cargo but there have been no new releases with this fix so pinned a specific commit in Cargo.toml.
Week 2
- bpftrace allegedly has the ability to look up a program's debug symbol information remotely but this isn't documented anywhere?
- Generating PID to cgroup mappings is not always straightforward which can affect BPF-based analysis on the container unit concept
- Different BPF program types have different natural lifetimes but lengthening them is possible with BPFFS.
- To get a rough look at possible BPF program types in the kernel tree:
rg "^SEC\(" ./ | awk -F: '{print $2}' | sort | uniq
(Note:rg
is just a stand-in for grep). - To look up possible uprobe attachment hooks in a program you can use
nm(1)
such as innm /path/to/bin
. - There is an open issue in bpftrace for USDTs where the provider specified in the user static tracepoint has to match the binary name. Symlinking the bin to a temporary file looks to be a workaround.
- The libbpf version of the same bpftrace program bashreadline.bt is an order of magnitude difference in size. Tradeoffs
- BPF helper
bpf_get_stackid(7)
makes it possible to analyze user or kernel stacks but it looks to be more convenient to try with bpftrace. - Because of Golang's coroutine stack processing, uretprobes need special handling
- Can pass in
-ex disassemble {function_symbol_name}
to gdb to get offsets: From bpftrace
Week 3
- Attaching uprobes and USDT currently requires offset lookup but there is work at Facebook to be able to attach USDT probes by name.
- libbpf provides a
btf_dump__dump_type_data
function that allows you to pretty print data. Doesn't look simple to use but helpful for debugging once wired up. Doesn't seem to exist in libbpf-rs? CAP_BPF
opens up thebpf()
sycall (rather than relying on the largerCAP_SYS_ADMIN
) but it needs to be paired with other capabilities for most programs.- BPF spin locks (
struct bpf_spin_lock
) need to be tucked away in map elements themselves to be used but unsure on when exactly I should use this? Won't most BPF applications be accessing BPF maps concurrently? - Certain helpers such as
bpf_copy_from_user()
are allowed to only be called from sleepable BPF programs - Katran is an L4 lb based on BPF & XDP coming out of Facebook.
- Work on sleepable BPF iterators coming through with helpers to access user space pointers. User data access, unlike kernel data, can page fault so sleeping is necessary.
- Bpftool mirror now available on Github
- There many BPF runtimes, including development for FreeBSD, Windows and userspace VMs. This creates a push for portable BPF programs (and possibly a BPF runtime spec?.
- BCC collects a list of available BPF features by kernel version
- In libbpf,
bpf_map__resize()
is being deprecated forbpf_map__set_max_entries()
. libbpf is not stable yet but I do wonder how much churn these API changes create on higher level tooling such as BCC's libbpf-tools. - In kernel
snprintf
(used byprintk
) can use BTF type information to dump internal data structures.
Week 4
- There's a BTF-enabled alternative to
bpf_current_task()
namedbpf_current_task_btf()
(not currently documented?) which allows direct dereferencing. - Legacy definitions of BPF maps are officially deprecated.
- Ability to generate BTF information (what
vmlinux.h
is based off of) is coming to bpftool itself to help with BTF-enablement of older kernels. - bpfilter progress still looks empty but some development is occurring.