Skip to content

BPF Syscall BPF_MAP_CREATE command

v3.18

The BPF_MAP_CREATE command is used to create a new BPF map.

Return value

This command will return a file descriptor to the created map on success (positive integer) or an error number (negative integer) if something went wrong.

Attributes

map_type

This attribute specifies which type of map should be created, this should be one of the pre-defined map types.

key_size

This attribute specifies the size of the key in bytes.

Info

Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.

value_size

This attribute specifies the size of the value in bytes.

Info

Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.

max_entries

This attribute specifies the maximum amount of entries the map can hold.

Info

Some map types have restrictions on which values are allowed, check the documentation of the specific map type for more details.

map_flags

This attribute is a bitmask of flags, see the flags section below for details.

inner_map_fd

v4.12

This attribute should be set to the FD of another map when creating map-in-map type maps. Doing so doesn't link the specified inner map to this new map we are creating, rather it is used as a mechanism to inform the kernel of the inner-maps attributes like type, key size, value size. When writing map references as values to this map, the kernel will verify that those maps are compatible with the attributes of the map given via this field.

A known technique is to create a pseudo/temporary map just for the purpose of informing this field and then releasing all references to it.

numa_node

v4.14

This attribute specifies on which NUMA node the map should be located. Memory access within the same node is typically faster, which can lead to optimization if applied correctly.

map_name

v4.15

This attribute allows the map creator to give it a human readable name. The attribute is an array of 16 bytes in which a null terminated string can be placed (thus limiting the name to 15 actual characters). This name will stay associated with the map and is reported back in the results of BPF_OBJ_GET_INFO_BY_* syscall commands.

map_ifindex

v4.16

This attribute can be set to the index of a network interface to request that the map be offloaded to that network device. This does require that network interface to support eBPF offloading.

btf_fd

v4.18

This attribute specifies the file descriptor of the BTF object which contains the key and value type info which will be referenced in btf_key_type_id and btf_key_value_id.

Adding BTF information about the key and value types of the map allows tools like bpftool to pretty-print the map keys and values instead of just the binary blobs.

btf_key_type_id

v4.18

This attribute specifies the BTF type ID of the map key within the BTF object indicated by btf_id.

btf_key_value_id

v4.18

This attribute specifies the BTF type ID of the map value within the BTF object indicated by btf_id.

btf_vmlinux_value_type_id

v5.6

This attribute is specifically used for the BPF_MAP_TYPE_STRUCT_OPS map type to indicate which structure in the kernel we wish to replicate using eBPF. For more details please check the struct ops map page.

map_extra

v5.16

This attribute specifies additional settings, the meaning of which is map type specific.

It has the following meanings per map type:

  • BPF_MAP_TYPE_BLOOM_FILTER - The lowest 4 bits indicate the number of hash functions (if 0, the bloom filter will default to using 5 hash functions).

Flags

BPF_F_NO_PREALLOC

v4.6

Before kernel version v4.6, BPF_MAP_TYPE_HASH and BPF_MAP_TYPE_PERCPU_HASH hash maps were lazily allocated. To improve performance, the default has been switched to pre-allocation of such map types. However, this means that for large max_entries values a lot of unused memory is kept in reserve. Setting this flag will not pre-allocate these maps.

Some map types require the loader to set this flag when creating maps to explicitly make clear that memory for such map types is always lazily allocated (also to guarantee stable behavior in case pre-allocation for those maps is ever added).

BPF_F_NO_COMMON_LRU

v4.10

By default, LRU maps have a single LRU list (even per-CPU LRU maps). When set, the an LRU map will use a percpu LRU list which can scale and perform better.

Note

The LRU nodes (including free nodes) cannot be moved across different LRU lists.

BPF_F_NUMA_NODE

v4.14

When set, the numa_node attribute is respected during map creation.

BPF_F_RDONLY

v4.15

Setting this flag will make it so the map can only be read via the syscall interface, but not written to.

This flag is mutually exclusive with BPF_F_WRONLY, one of them can be used, not both.

BPF_F_WRONLY

v4.15

Setting this flag will make it so the map can only be written to via the syscall interface, but not read from.

This flag is mutually exclusive with BPF_F_RDONLY, one of them can be used, not both.

BPF_F_STACK_BUILD_ID

v4.17

By default, BPF_MAP_TYPE_STACK_TRACE maps store address for each entry in the call trace. To map these addresses to user space files, it is necessary to maintain the mapping from these virtual address to symbols in the binary.

When setting this flag, the stack trace map will instead store the variation stores ELF file build_id + offset.

For more details, check the stack trace map map page.

BPF_F_ZERO_SEED

v5.0

This flag can be used in the following map types:

BPF_F_RDONLY_PROG

v5.2

Setting this flag will make it so the map can only be read via helper functions, but not written to.

This flag is mutually exclusive with BPF_F_WRONLY_PROG, one of them can be used, not both.

BPF_F_WRONLY_PROG

v5.2

Setting this flag will make it so the map can only be written to via helper functions, but not read from.

This flag is mutually exclusive with BPF_F_RDONLY_PROG, one of them can be used, not both.

BPF_F_CLONE

v5.4

This flag specifically applies to BPF_MAP_TYPE_SK_STORAGE maps. Sockets can be cloned. Setting this flag on the socket storage allows it to be cloned along with the socket itself when this happens. By default the storage is not cloned and the socket storage on the cloned socket will stay empty.

BPF_F_MMAPABLE

v5.5

Setting this flag on a BPF_MAP_TYPE_ARRAY will allow userspace programs to mmap the array values into the userspace process, effectively making a shared memory region between eBPF programs and a userspace program.

This can significantly improve read and write performance since there is no sycall overhead to access the map.

Using this flag is only supported on BPF_MAP_TYPE_ARRAY maps, for more details check the array map page.

BPF_F_PRESERVE_ELEMS

v5.10

Maps of type BPF_MAP_TYPE_PERF_EVENT_ARRAY by default will clear all unread perf events when the original map file descriptor is closed, even if the map still exists. Setting this flag will make it so any pending elements will stay until explicitly removed or the map is freed. This makes sharing the perf event array between userspace programs easier.

BPF_F_INNER_MAP

v5.10

Map-in-Map maps normally require that all inner maps have the same max_entries value and that this value matches the max_entries of the map specified by inner_map_fd. Setting this flag on the inner map value when loading will allow you to assign that map to the outer map even if it has a different max_entries value. This is at the cost of a slight hit to performance during lookups.

Example

union bpf_attr my_map {
    .map_type = BPF_MAP_TYPE_HASH,
    .key_size = sizeof(int),
    .value_size = sizeof(int),
    .max_entries = 100,
    .map_flags = BPF_F_NO_PREALLOC,
};
int fd = bpf(BPF_MAP_CREATE, &my_map, sizeof(my_map));