Skip to content

Program context __sk_buff

The socket buffer context is provided to program types that deal with network packets when there already is a socket buffer created/allocated. The struct __sk_buff is a "mirror" of the struct sk_buff program type which is actually used by the kernel.

Accesses to the struct __sk_buff pointer are seamlessly transformed into accesses into the real socket buffer. This indirection exists to provide a stable ABI for programs since the struct sk_buff may change between kernel versions and to provide a layer of checks. Not all program types are allowed to read and/or write to certain fields for a number of reasons.

Direct packet access

Fields

len

v4.1

This field holds the total length of the packet. It is important to know that this doesn't indicate the amount of data that is available via direct packet access. In some cases the packet is larger than a single memory page, in which case the packet data lives in non-linear in which case the len might be larger than data_end-data and specialized helpers are needed to access the rest of the memory.

pkt_type

v4.1

This field indicates the type of the packet which informs "who" the packet is for. Possible values of this field are the PACKET_* values defined in include/uapi/linux/if_packet.h.

  • PACKET_HOST - indicates the packet is addresses to the MAC address of this host
  • PACKET_BROADCAST - indicates the packet is addressed to a broadcast address.
  • PACKET_MULTICAST - indicates the packet is addressed to a multicast address.
  • PACKET_OTHERHOST - indicates the packet to addressed to some other host that it has been caught by a device driver in promiscuous mode
  • PACKET_OUTGOING - indicates the packet originating from the local host that is looped back to a packet socket

Note

This is not an exhaustive list of possible values.

mark

v4.1

This field is a general purpose 32 bit tag used in the network subsystem to carry metadata with global implications across network sub-subsystem. As an example, a driver could mark on incoming packet to be used by the ingress traffic control classifier-action sub-subsystem, netfilter, IPsec all to execute provisioned policies.1

queue_mapping

v4.1

This field indicates via which TX queue on the NIC this packet should be sent. Typically this field is set by TC but can be overwritten by certain eBPF programs to implement custom balancing logic. 2

protocol

v4.1

This field indicates the Layer 3 protocol of the packet and is one of the ETH_P_* values defined in include/uapi/linux/if_ether.h.

vlan_present

v4.1

This field is a boolean 0 or 1 and indicates if the packet has a VLAN header.

vlan_tci

v4.1

This field contains the VLAN TCI (Tag Control Information), if the packet included a VLAN header.

vlan_proto

v4.1

This field contains the protocol ID of the used VLAN protocol which will be one of the ETH_P_* values defined in include/uapi/linux/if_ether.h.

priority

v4.1

This field indicates the queuing priority of the packet. Packets with higher priority will be send out first. Only values between 0 and 63 are effective, values of 64 and above will be converted to 63. This field only takes effect if the skbprio queueing discipline has been configured in TC. 3

This only effects egress traffic since ingress traffic is never queued.

ingress_ifindex

v4.2

This field contains the interface index of the network devices this packet arrived on. It may be 0 if a process on the host originated the packet.

ifindex

v4.2

This field contains the interface index of the network device the packet is currently "on", so if a packet has been redirected to another device and a eBPF program is invoked on it again, this field should be updated to the new device.

On egress this will be the device picked for sending the packet.

tc_index

v4.2

This field is used to carry Type of Service (TOS) information. This field is populated by the dsmark qdisc and can subsequently be used with tcindex filters to classify packets based on their TOS value.

The dsmark uses the differentiated services (DS) fields in IPv4 (aka DSCP) and IPv6 (aka traffic class) headers.

BPF_PROG_TYPE_SCHED_CLS programs can also modify this value to implement a custom TOS value extraction from packets.

cb

v4.2

This field is an array of 5 u32 values with no pre-defined meaning. Network subsystems and eBPF programs can read from and write to this field to share information associated with the socket buffer across programs and subsystem boundaries.

hash

v4.3

This field contains the calculated from the flow information of the packet. The fields used to calculate the hash can differ depending on the protocol. This hash is optionally calculated by network interface devices that support it. 4

tc_classid

v4.4

This field can be used by BPF_PROG_TYPE_SCHED_CLS in direct action mode to set the class id. This value is only useful if the program returns a TC_ACT_OK and the qdisc has classes.

data

v4.4

This field contains the pointer to the start address of the linear packet data. This will be the first byte of the layer 3 header the type of which is indicated by protocol.

data_end

v4.4

This field contains the pointer to the last address of the packet data linear packet data. This pointer is used in combination with data to indicate accessible data.

napi_id

v4.12

This field contains the id of the NAPI struct this socket buffer came from.

family

v4.14

This field contains the address family of the socket associated this this socket buffer. Its value is one of AF_* values defined in include/linux/socket.h.

remote_ip4

v4.14

The IPv4 address of the remote end of the socket.

local_ip4

v4.14

The locally bound IPv4 address of the socket.

remote_ip6

v4.14

The IPv6 address of the remote end of the socket.

local_ip6

v4.14

The locally bound IPv6 address of the socket.

remote_port

v4.14

The L4 port number of the remote side of the socket.

local_port

v4.14

The L4 port number of the local side of the socket.

data_meta

v4.15

This field contains a pointer to the start of a metadata region in the socket buffer. If no metadata room is set, so the value of data_meta and data will be the same. A XDP program can request metadata to be allocated with the bpf_xdp_adjust_meta helper after which it can write arbitrary data into it.

If the packet with metadata is passed to the kernel, that metadata will be available in the __sk_buff via this pointer. The region being between data_meta and data.

This means that XDP programs can communicate information to for example BPF_PROG_TYPE_SCHED_CLS programs which can then manipulate the socket buffer to change __sk_buff->mark or __sk_buff->priority on behalf of an XDP program.

flow_keys

v4.20

This field is a pointer to a struct bpf_flow_keys which like the name implies hold the keys that identify the network flow of the socket buffer. This field is only accessible from within BPF_PROG_TYPE_FLOW_DISSECTOR programs. More details can be found in its context section.

tstamp

v5.0

This field indicates the time when this packet should be transmitted in nanoseconds since boot. BPF_PROG_TYPE_SCHED_CLS programs can set this time to some time in the future to add delay to packets for the purposes of bandwidth limiting or simulating latency. Setting this value only works on egress if the fq (Fair Queue) qdisc is used.

Note

The fq qdisc has a "drop horizon" if packets are set to transmit to far into the future they will be dropped to avoid queueing to many packets.

Note

After v5.18 1 / 2 the meaning of this field can also be "received time" and the tstamp_type field will indicate one or the other.

wire_len

v5.0

This field contains the length of the data as it will appear on the wire.

gso_segs

v5.1

This field indicates the number of GSO segments that are contained within the current socket buffer.

sk

v5.1

This field is a pointer to a struct bpf_sock which holds information about the socket associated with this socket buffer. More details can be found in the dedicated section

This field is always read-only.

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

gso_size

v5.7

This field indicates the size of GSO segments that are contained within the current socket buffer.

tstamp_type

v5.18

This field indicates what the meaning of tstamp is. The field can have the following values:

  • BPF_SKB_TSTAMP_UNSPEC - The tstamp field contains has the (rcv) tstamp at ingress and the delivery time at egress.
  • BPF_SKB_TSTAMP_DELIVERY_MONO - The tstamp field contains the requested to to deliver the packet, see tstamp for details.

hwtstamp

v5.16

This field contains the time the packet was received at as reported by the NIC if it supports this feature.

Socket

This section describes the fields of the struct bpf_sock type which is a mirror of the kernels struct sock type.

bound_dev_if

v4.10

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

family

v4.10

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

type

v4.10

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

protocol

v4.10

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

mark

v4.14

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

priority

v4.14

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

src_ip4

v5.1

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

src_ip6

v5.1

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

src_port

v5.1

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

dst_port

v5.1

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

dst_ip4

v5.1

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

dst_ip6

v5.1

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

state

v5.1

Docs could be improved

This part of the docs is incomplete, contributions are very welcome

rx_queue_mapping

v5.8

Docs could be improved

This part of the docs is incomplete, contributions are very welcome