Skip to content

Map type BPF_MAP_TYPE_LPM_TRIE

v4.11

The LPM (Largest Prefix Match) map is a generic map type which does prefix matching on the key upon lookup.

Usage

One of the main use cases for this map type is to implement routing tables or policies for IP ranges. Take the following key-value pairs:

  • 10.0.0.0/8 -> 1
  • 10.0.10.0/24 -> 2
  • 10.0.10.123/32 -> 3

A lookup for 10.0.10.123 will return value 3 because we have a specific entry for it in the map. A lookup for 10.0.10.200 will return value 2, because the /24 key is more specific than the /8 key. A lookup for 10.12.0.1 would return 1. And a lookup for 12.0.0.1 will not return any entry.

Attributes

The value_size is unrestricted. The key_size is variable, but must adhere to a specific structure. The first 4 bytes of the key indicate the prefix length of the key (the number after the / in a CIDR), the rest of the bytes are the value.

The bpf.h header file provides the following struct as example:

struct bpf_lpm_trie_key {
    __u32   prefixlen;  /* up to 32 for AF_INET, 128 for AF_INET6 */
    __u8    data[0];    /* Arbitrary size */
};

A common trick is to embed this struct in a custom key type:

struct lpm_key {
    struct bpf_lpm_trie_key trie_key;
    __u32 data;
};

But this is not required, as long as the first field is a __u32:

struct lpm_key {
    __u32 prefixlen;
    __be32 addr;
};

LPM tries may be created with a maximum prefix length that is a multiple of 8, in the range from 8 to 2048. So the data part of the key can't be larger than 256 bytes (4 or 16 is normal for IPv4 and IPv6 addresses).

Data stored in @data of struct bpf_lpm_key and struct lpm_trie_node is interpreted as big endian, so data[0] stores the most significant byte. [...] one single element that matches 192.168.0.0/16. The data array would hence contain [0xc0, 0xa8, 0x00, 0x00] in big-endian notation.

Note

The BPF_F_NO_PREALLOC flag must always be set when creating this map type since the implementation cannot pre-allocate the map.

Syscall commands

The following syscall commands work with this map type:

Helper functions

The following helper functions work with this map type:

Flags

The following flags are supported by this map type.

BPF_F_NO_PREALLOC

v4.6

Hash maps are pre-allocated by default, this means that even a completely empty hash map will use the same amount of kernel memory as a full map.

If this flag is set, pre-allocation is disabled. Users might consider this for large maps since allocating large amounts of memory takes a lot of time during creation and might be undesirable.

Note

Setting this flag during map creation is mandatory for the LPM map type.

BPF_F_NUMA_NODE

v4.14

When set, the numa_node attribute is respected during map creation.

BPF_F_RDONLY

v4.15

Setting this flag will make it so the map can only be read via the syscall interface, but not written to.

For details please check the generic description.

BPF_F_WRONLY

v4.15

Setting this flag will make it so the map can only be written to via the syscall interface, but not read from.

For details please check the generic description.

BPF_F_RDONLY_PROG

v5.2

Setting this flag will make it so the map can only be read via helper functions, but not written to.

For details please check the generic description.

BPF_F_WRONLY_PROG

v5.2

Setting this flag will make it so the map can only be written to via helper functions, but not read from.

For details please check the generic description.

Example

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

struct ipv4_lpm_key {
        __u32 prefixlen;
        __u32 data;
};

struct {
        __uint(type, BPF_MAP_TYPE_LPM_TRIE);
        __type(key, struct ipv4_lpm_key);
        __type(value, __u32);
        __uint(map_flags, BPF_F_NO_PREALLOC);
        __uint(max_entries, 255);
} ipv4_lpm_map SEC(".maps");

void *lookup(__u32 ipaddr)
{
        struct ipv4_lpm_key key = {
                .prefixlen = 32,
                .data = ipaddr
        };

        return bpf_map_lookup_elem(&ipv4_lpm_map, &key);
}