Program type BPF_PROG_TYPE_SK_SKB
Socket SKB programs are called on L4 data streams to parse L7 messages and/or to determine if the L4/L7 messages should be allowed, blocked or redirected.
Usage
Socket SKB programs are attached to BPF_MAP_TYPE_SOCKMAP
or BPF_MAP_TYPE_SOCKHASH
maps and will be invoked when messages get received on the sockets which are part of the map the program is attached to. The exact purpose of the program differs depending on its attach type.
As BPF_SK_SKB_STREAM_PARSER
program
When this attach type is used the program acts as a stream parser. The idea behind a stream parser is to parse message based application layer protocols (OSI Layer 7) which are implemented on top of data streams such as TCP.
The job of the program is to parse the L7 data/packet and to tell the kernel how long the L7 message is. This will allow the kernel to combine multiple data stream packets and return complete L7 messages for every recv
instead of returning the TCP messages which might only contain part of the L7 message.
The return value is interpreted as follows:
>0
- indicates length of successfully parsed message0
- indicates more data must be received to parse the message-ESTRPIPE
- current message should not be processed by the kernel, return control of the socket to userspace which can proceed to read the messages itselfother < 0
- Error in parsing, give control back to userspace assuming that synchronization is lost and the stream is unrecoverable (application expected to close TCP socket)
Note
Before v5.10 it was required to have a stream parser attached to a sockmap if you wanted to use the stream verdict as well. On newer versions this is no longer required.
On the older kernels, a no-op program can be used to just return the length of the current skb to retain default behavior and pass verdict per TCP packet.
SEC("sk_skb/stream_parser")
int noop_parser(struct __sk_buff *skb)
{
return skb->len;
}
As BPF_SK_SKB_STREAM_VERDICT
program
When this attach type is used the program acts as a filter, comparable to TC or XDP programs. The program gets called for every message indicated by the parser (or TCP packet if no parser is specified) and returns a verdict.
The return value is interpreted as follows:
SK_PASS
- The message may pass to the socket or it has been redirected with a helper.SK_DROP
- The message should be dropped.
Unlike TC or XDP programs, there is no special redirect return code, helpers such as bpf_sk_redirect_map
will return SK_PASS
on success.
As BPF_SK_SKB_VERDICT
program
The non-stream verdict attach type is a replacement for the BPF_SK_SKB_STREAM_VERDICT
attach type. The program type has the same job and uses the same return values. The difference is that this the stream verdict variant only supports TCP data streams while BPF_SK_SKB_VERDICT
also supports UDP.
Context
Socket SKB programs are called by the kernel with a __sk_buff context.
This program type isn't allowed to read from and write to all fields of the context since doing so might break assumptions in the kernel or because data isn't available at the point where the program is hooked into the kernel.
Context fields
Attachment
Socket SKB programs are attached to BPF_MAP_TYPE_SOCKMAP
or BPF_MAP_TYPE_SOCKHASH
using the BPF_PROG_ATTACH
syscall (bpf_prog_attach
libbpf function).
The programs should be loaded with the same expected attach type as used during the attaching.
Note
Before BPF_SK_SKB_STREAM_VERDICT
and BPF_SK_SKB_VERDICT
are mutually exclusive per map, only one or the other program type can be used.
Example
Example BPF program:
// Copyright Red Hat
SEC("sk_skb/stream_verdict")
int bpf_prog_verdict(struct __sk_buff *skb)
{
__u32 lport = skb->local_port;
__u32 idx = 0;
if (lport == 10000)
return bpf_sk_redirect_map(skb, &sock_map_rx, idx, 0);
return SK_PASS;
}
Example userspace loader code:
// Copyright Red Hat
int create_sample_sockmap(int sock, int parse_prog_fd, int verdict_prog_fd)
{
int index = 0;
int map, err;
map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int), sizeof(int), 1, NULL);
if (map < 0) {
fprintf(stderr, "Failed to create sockmap: %s\n", strerror(errno));
return -1;
}
err = bpf_prog_attach(parse_prog_fd, map, BPF_SK_SKB_STREAM_PARSER, 0);
if (err){
fprintf(stderr, "Failed to attach_parser_prog_to_map: %s\n", strerror(errno));
goto out;
}
err = bpf_prog_attach(verdict_prog_fd, map, BPF_SK_SKB_STREAM_VERDICT, 0);
if (err){
fprintf(stderr, "Failed to attach_verdict_prog_to_map: %s\n", strerror(errno));
goto out;
}
err = bpf_map_update_elem(map, &index, &sock, BPF_NOEXIST);
if (err) {
fprintf(stderr, "Failed to update sockmap: %s\n", strerror(errno));
goto out;
}
out:
close(map);
return err;
}
Helper functions
Supported helper functions
- bpf_skb_store_bytes
- bpf_skb_load_bytes
- bpf_skb_pull_data
- bpf_skb_change_tail
- bpf_skb_change_head
- bpf_skb_adjust_room
- bpf_get_socket_cookie
- bpf_get_socket_uid
- bpf_sk_redirect_map
- bpf_sk_redirect_hash
- bpf_perf_event_output
- bpf_sk_lookup_tcp
- bpf_sk_lookup_udp
- bpf_sk_release
- bpf_skc_lookup_tcp
- bpf_skc_to_tcp6_sock
- bpf_skc_to_tcp_sock
- bpf_skc_to_tcp_timewait_sock
- bpf_skc_to_tcp_request_sock
- bpf_skc_to_udp6_sock
- bpf_skc_to_unix_sock
- bpf_ktime_get_coarse_ns
- bpf_map_lookup_elem
- bpf_map_update_elem
- bpf_map_delete_elem
- bpf_map_push_elem
- bpf_map_pop_elem
- bpf_map_peek_elem
- bpf_map_lookup_percpu_elem
- bpf_get_prandom_u32
- bpf_get_smp_processor_id
- bpf_get_numa_node_id
- bpf_tail_call
- bpf_ktime_get_ns
- bpf_ktime_get_boot_ns
- bpf_ringbuf_output
- bpf_ringbuf_reserve
- bpf_ringbuf_submit
- bpf_ringbuf_discard
- bpf_ringbuf_query
- bpf_for_each_map_elem
- bpf_loop
- bpf_strncmp
- bpf_spin_lock
- bpf_spin_unlock
- bpf_jiffies64
- bpf_per_cpu_ptr
- bpf_this_cpu_ptr
- bpf_timer_init
- bpf_timer_set_callback
- bpf_timer_start
- bpf_timer_cancel
- bpf_trace_printk
- bpf_get_current_task
- bpf_get_current_task_btf
- bpf_probe_read_user
- bpf_probe_read_kernel
- bpf_probe_read_user_str
- bpf_probe_read_kernel_str
- bpf_snprintf_btf
- bpf_snprintf
- bpf_task_pt_regs
- bpf_trace_vprintk
KFuncs
Supported kfuncs
- bpf_cast_to_kern_ctx
- bpf_dynptr_adjust
- bpf_dynptr_clone
- bpf_dynptr_from_skb
- bpf_dynptr_is_null
- bpf_dynptr_is_rdonly
- bpf_dynptr_size
- bpf_dynptr_slice
- bpf_dynptr_slice_rdwr
- bpf_iter_css_destroy
- bpf_iter_css_new
- bpf_iter_css_next
- bpf_iter_css_task_destroy
- bpf_iter_css_task_new
- bpf_iter_css_task_next
- bpf_iter_num_destroy
- bpf_iter_num_new
- bpf_iter_num_next
- bpf_iter_task_destroy
- bpf_iter_task_new
- bpf_iter_task_next
- bpf_iter_task_vma_destroy
- bpf_iter_task_vma_new
- bpf_iter_task_vma_next
- bpf_map_sum_elem_count
- bpf_rcu_read_lock
- bpf_rcu_read_unlock
- bpf_rdonly_cast