Program type `BPF_PROG_TYPE_KPROBE`

v4.1

BPF_PROG_TYPE_KPROBE are eBPF programs that can attach to kprobes. KProbes are not a eBPF specific feature, but they do work very well together. Traditionally, one would have to write a custom kernel module which could be invoked from a kprobe or be content with just the trace log output. eBPF makes this process easier.

Usage

Probes come in 4 different flavors: kprobe, kretprobe, uprobe, and uretprobe. kprobe and kretprobe are used to probe the kernel, uprobe and uretprobe are used to probe userspace. The normal probes are invoked when the probed location is executed. The ret variants will execute once the function returns, allowing for the capture of the return value.

All of these probe types work with the kprobe program type, it is the attach method which determines how the program is executed.

The return value of kprobes programs doesn't do anything.

Context

The context passed to kprobe programs is struct pt_regs. This structure is different for each CPU architecture since it contains a copy of the CPU registers at the time the kprobe was invoked.

It is common for kprobe programs to use the macros from libbpf's bpf_tracing.h header file which defines PT_REGS_PARM1 ... PT_REGS_PARM5 as well as a number of others. These macros will translate to the correct field in struct pt_regs depending on the current architecture. Communicating the architecture you are compiling the BPF program for is done by defining one of the __TARGET_ARCH_* values in your program or via the command line while compiling.

The same header file also provides the BPF_KPROBE(name, args...) macro which allows program authors to define the function signatures in the same fashion as the functions they are tracing with type info and all. The macro will cast the correct argument numbers to the given argument names. For example:

SEC("kprobe/proc_sys_write")
int BPF_KPROBE(my_kprobe_example,
           struct file* filp, const char* buf,
           size_t count, loff_t* ppos) {
    ...
}

Similar macros also exists for kprobes intended to attach to syscalls: BPF_KSYSCALL(name, args...) and kretprobes: BPF_KRETPROBE(name, args...)

Attachment

There are two methods of attaching probe programs with variations for uprobes. The "legacy" way involves the manual creation of a k{ret}probe or u{ret}probe event via the DebugFS and then attaching a BPF program to that event via the perf_event_open syscall.

The newer method uses BPF links to do both the probe event creation and attaching in one.

Legacy kprobe attaching

First step is to create a kprobe or kretprobe trace event. To do so we can use the DebugFS, which we will assume is mounted at /sys/kernel/debug for the purposes of this document.

Existing kprobe events can be listed by printing /sys/kernel/debug/tracing/kprobe_events. And we can create new events by writing to this pseudo-file. For example executing echo 'p:myprobe do_sys_open' > /sys/kernel/debug/tracing/kprobe_events will make a new kprobe (p:) called myprobe at the do_sys_open function in the kernel. For details on the full syntax, checkout this link. kretprobes are created by specifying a r: prefix.

After the probe has been created, a new directory will appear in /sys/kernel/debug/tracing/events/kprobes/ with the same name as we have given our probe, /sys/kernel/debug/tracing/events/kprobes/myprobe in this case. This directory contains a few pseudo-files, for us id is important. The contents of /sys/kernel/debug/tracing/events/kprobes/myprobe/id contains a unique identifier we will need in the next step.

Next step is to open a new perf event using the perf_event_open syscall:

struct perf_event_attr attr = {
    .type = PERF_TYPE_TRACEPOINT,
    .size = sizeof(struct perf_event_attr),
    .config = kprobe_id, /* The ID of your kprobe */
    .sample_period = 1,
    .sample_type = PERF_SAMPLE_RAW,
    .wakeup_events = 1,
};

syscall(SYS_perf_event_open, 
    &attr,  /* struct perf_event_attr * */
    -1,     /* pid_t pid */
    0       /* int cpu */
    -1,     /* int group_fd */
    PERF_FLAG_FD_CLOEXEC /* unsigned long flags */
);

This syscall will return a file descriptor on success. The final step are two ioctl syscalls to attach our BPF program to the kprobe event and to enable the kprobe.

ioctl(perf_event_fd, PERF_EVENT_IOC_SET_BPF, bpf_prog_fd); to attach.

ioctl(perf_event_fd, PERF_EVENT_IOC_ENABLE, 0); to enable.

The kprobe can be temporality disabled with the PERF_EVENT_IOC_DISABLE ioctl option. Otherwise the kprobe stays attached until the perf_event goes away due to the closing of the perf_event FD or the program exiting. The perf event holds a reference to the BPF program so it will stay loaded until no more kprobes reference it.

Link kprobe attaching

The more modern and preferred way of attaching is using the link create command of the BPF syscall.

Helper functions

Not all helper functions are available in all program types. These are the helper calls available for socket filter programs:

Supported helper functions

KFuncs

Supported kfuncs

Program type BPF_PROG_TYPE_KPROBE