Skip to content

Program type BPF_PROG_TYPE_CGROUP_SOCKOPT

v5.3

cGroup socket ops programs are executed when a process in the cGroup to which the program is attached uses the getsockopt or setsockopt syscall depending on the attach type and modify or block the operation.

Usage

cGroup socket ops programs are typically located in the cgroup/getsockopt or cgroup/setsockopt ELF section to indicate the BPF_CGROUP_GETSOCKOPT and BPF_CGROUP_SETSOCKOPT attach types respectively.

BPF_CGROUP_SETSOCKOPT

BPF_CGROUP_SETSOCKOPT is triggered before the kernel handling of sockopt and it has writable context: it can modify the supplied arguments before passing them down to the kernel. This hook has access to the cgroup and socket local storage.

If BPF program sets optlen to -1, the control will be returned back to the userspace after all other BPF programs in the cgroup chain finish (i.e. kernel setsockopt handling will not be executed).

Note

optlen can not be increased beyond the user-supplied value. It can only be decreased or set to -1. Any other value will trigger EFAULT.

Return Type:

  • 0 - reject the syscall, EPERM will be returned to the userspace.
  • 1 - success, continue with next BPF program in the cgroup chain.

BPF_CGROUP_GETSOCKOPT

BPF_CGROUP_GETSOCKOPT is triggered after the kernel handing of sockopt. The BPF hook can observe optval, optlen and retval if it's interested in whatever kernel has returned. BPF hook can override the values above, adjust optlen and reset retval to 0. If optlen has been increased above initial getsockopt value (i.e. userspace buffer is too small), EFAULT is returned.

This hook has access to the cgroup and socket local storage.

Note

The only acceptable value to set to retval is 0 and the original value that the kernel returned. Any other value will trigger EFAULT.

Return Type:

  • 0 - reject the syscall, EPERM will be returned to the userspace.
  • 1 - success: copy optval and optlen to userspace, returnretval from the syscall (note that this can be overwritten by the BPF program from the parent cgroup).

Cgroup Inheritance

Suppose, there is the following cgroup hierarchy where each cgroup has BPF_CGROUP_GETSOCKOPT attached at each level with BPF_F_ALLOW_MULTI

  A (root, parent)
   \
    B (child)

When the application calls getsockopt syscall from the cgroup B, the programs are executed from the bottom up: B, A. First program (B) sees the result of kernel's getsockopt. It can optionally adjust optval, optlen and reset retval to 0. After that control will be passed to the second (A) program which will see the same context as B including any potential modifications.

Same for BPF_CGROUP_SETSOCKOPT: if the program is attached to A and B, the trigger order is B, then A. If B does any changes to the input arguments (level, optname, optval, optlen), then the next program in the chain (A) will see those changes, not the original input setsockopt arguments. The potentially modified values will be then passed down to the kernel.

Large optval

When the optval is greater than the PAGE_SIZE, the BPF program can access only the first PAGE_SIZE of that data. So it has to options:

  • Set optlen to zero, which indicates that the kernel should use the original buffer from the userspace. Any modifications done by the BPF program to the optval are ignored.
  • Set optlen to the value less than PAGE_SIZE, which indicates that the kernel should use BPF's trimmed optval.

When the BPF program returns with the optlen greater than PAGE_SIZE, the userspace will receive original kernel buffers without any modifications that the BPF program might have applied.

Context

struct bpf_sockopt

C structure
struct bpf_sockopt {
    __bpf_md_ptr(struct bpf_sock *, sk);
    __bpf_md_ptr(void *, optval);
    __bpf_md_ptr(void *, optval_end);

    __s32   level;
    __s32   optname;
    __s32   optlen;
    __s32   retval;
};

sk

Pointer to the socket for which the syscall is invoked.

optval

Pointer to the start of the option value, the end pointer being optval_end. The program must perform bounds check with optval_end before accessing the memory.

For BPF_CGROUP_SETSOCKOPT the opt value contains the option the process wants to set. For BPF_CGROUP_GETSOCKOPT the opt value contains the option the syscall returned.

optval_end

This is the end pointer of the option value.

level

This field indicates the socket level for which the syscall is invoked. Values are one of SOL_* constants. Typically SOL_SOCKET, SOL_IP, SOL_IPV6, SOL_TCP, or SOL_UDP unless dealing with more specialized protocols. Only BPF_CGROUP_SETSOCKOPT programs are allowed to modify this field.

optname

This field indicates the name of the socket option. Valid options depend on the socket level. More info can be found in the man pages such as socket(7), ip(7), tcp(7), udp(7), etc. Only BPF_CGROUP_SETSOCKOPT programs are allowed to modify this field.

optlen

This field indicates the length of the socket option, which should be smaller or equal to optval_end - optval. The program can modify this value to trim the option value. Both BPF_CGROUP_SETSOCKOPT and BPF_CGROUP_GETSOCKOPT programs are allowed to modify this field.

retval

This field indicates the return value of the syscall. Only BPF_CGROUP_GETSOCKOPT programs can read and/or modify this value to override the return value of the syscall.

Attachment

cGroup socket buffer programs are attached to cgroups via the BPF_PROG_ATTACH syscall or via BPF link.

Example

SEC("cgroup/getsockopt")
int getsockopt(struct bpf_sockopt *ctx)
{
    /* Custom socket option. */
    if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
        ctx->retval = 0;
        optval[0] = ...;
        ctx->optlen = 1;
        return 1;
    }

    /* Modify kernel's socket option. */
    if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
        ctx->retval = 0;
        optval[0] = ...;
        ctx->optlen = 1;
        return 1;
    }

    /* optval larger than PAGE_SIZE use kernel's buffer. */
    if (ctx->optlen > PAGE_SIZE)
        ctx->optlen = 0;

    return 1;
}

SEC("cgroup/setsockopt")
int setsockopt(struct bpf_sockopt *ctx)
{
    /* Custom socket option. */
    if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
        /* do something */
        ctx->optlen = -1;
        return 1;
    }

    /* Modify kernel's socket option. */
    if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
        optval[0] = ...;
        return 1;
    }

    /* optval larger than PAGE_SIZE use kernel's buffer. */
    if (ctx->optlen > PAGE_SIZE)
        ctx->optlen = 0;

    return 1;
}

Helper functions

Supported helper functions

KFuncs

There are currently no kfuncs supported for this program type