Skip to content

Program type BPF_PROG_TYPE_TRACEPOINT

v4.7

BPF_PROG_TYPE_TRACEPOINT programs are eBPF programs that attach to pre-defined trace points in the linux kernel. These tracepoint are often placed in locations which are interesting or common locations to measure performance.

Usage

Tracepoint programs can attach to trace events. These events are declared with the TRACE_EVENT macro. Take for example the xdp_exception trace event. With a combination of TP_* macros a function prototype for the tracepoint is defined, a structure which will be passed to any handlers and a conversion method for going from the arguments to the structure.

The TRACE_EVENT macro will make a tracepoint available via a function with the trace_ prefix followed by the name. So trace_xdp_exception will fire the xdp_exception event, which can happen from any number of locations in the code. The attached eBPF program will be called for all invocations of the trace program.

We can use the tracefs to list all of these available trace events. For the sake of this page we will assume the tracefs is mounted at /sys/kernel/tracing (which is usual for most distros). The /sys/kernel/tracing/events/ directory contains a number of yet more directories. The events are grouped by the first word in their name, so all kvm_* events reside in /sys/kernel/tracing/events/kvm. So xdp_exception is located in /sys/kernel/tracing/events/xdp/xdp_exception. We will refer to this directory as the "event directory".

Context

The context for a tracepoint program is a pointer to a structure, the type of which is different for each trace event. The event directory contains a pseudo-file called format so for xdp_exception that would be /sys/kernel/tracing/events/xdp/xdp_exception/format. We can read this file to get the layout of the struct type:

$ cat /sys/kernel/tracing/events/xdp/xdp_exception/format

name: xdp_exception
ID: 488
format:
    field:unsigned short common_type;   offset:0;   size:2; signed:0;
    field:unsigned char common_flags;   offset:2;   size:1; signed:0;
    field:unsigned char common_preempt_count;   offset:3;   size:1; signed:0;
    field:int common_pid;   offset:4;   size:4; signed:1;

    field:int prog_id;  offset:8;   size:4; signed:1;
    field:u32 act;  offset:12;  size:4; signed:0;
    field:int ifindex;  offset:16;  size:4; signed:1;

print fmt: "prog_id=%d action=%s ifindex=%d", REC->prog_id, __print_symbolic(REC->act, { 0, "ABORTED" }, { 1, "DROP" }, { 2, "PASS" }, { 3, "TX" }, { 4, "REDIRECT" }, { -1, ((void *)0) }), REC->ifindex

From this output we can reconstruct the context, which as C struct would look like:

struct xdp_exception_ctx {
    __u16 common_type;
    __u8 flags;
    __u8 common_preempt_count;
    __s32 common_pid;

    __s32 prog_int;
    __u32 act;
    __s32 ifindex;
};

Attachment

There are three methods of attaching tracepoint programs, from oldest and least recommended to newest and most recommended, however, all methods have this first part in common.

We start by looking up the event ID in the tracefs. Inside the event directory is located a pseudo-file called id, so for xdp_exception that would be /sys/kernel/tracing/events/xdp/xdp_exception/id. When reading the file a decimal number is returned.

Next step is to open a new perf event using the perf_event_open syscall:

struct perf_event_attr attr = {
    .type = PERF_TYPE_TRACEPOINT,
    .size = sizeof(struct perf_event_attr),
    .config = event_id, /* The ID of your trace event */
    .sample_period = 1,
    .sample_type = PERF_SAMPLE_RAW,
    .wakeup_events = 1,
};

syscall(SYS_perf_event_open, 
    &attr,  /* struct perf_event_attr * */
    -1,     /* pid_t pid */
    0       /* int cpu */
    -1,     /* int group_fd */
    PERF_FLAG_FD_CLOEXEC /* unsigned long flags */
);

This syscall will return a file descriptor on success.

ioctl method

This is the oldest and least recommended method. After we have the perf event file descriptor we execute two ioctl syscalls to attach our BPF program to the trace event and to enable the trace.

ioctl(perf_event_fd, PERF_EVENT_IOC_SET_BPF, bpf_prog_fd); to attach.

ioctl(perf_event_fd, PERF_EVENT_IOC_ENABLE, 0); to enable.

The tracepoint can be temporality disabled with the PERF_EVENT_IOC_DISABLE ioctl option. Otherwise the tracepoint stays attached until the perf_event goes away due to the closing of the perf_event FD or the program exiting. The perf event holds a reference to the BPF program so it will stay loaded until no more tracepoint reference it.

perf_event_open PMU

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

This is the newest and most recommended method of attaching tracepoint programs.

After we have gotten the perf event file descriptor we attach the program by making a bpf link via the link create syscall command.

We call the syscall command with the BPF_PERF_EVENT attach_type, target_fd set to the perf event fd, prog_fd to the file descriptor of the tracepoint program, and optionally a cookie

Helper functions

Supported helper functions

KFuncs

There are currently no kfuncs supported for this program type