Skip to content

Commit

Permalink
Polishing up a few pages
Browse files Browse the repository at this point in the history
This commit polishes up a few pages, adding missing links, rewording,
ect.

Signed-off-by: Dylan Reimerink <[email protected]>
  • Loading branch information
dylandreimerink committed Dec 20, 2024
1 parent 2bd6f50 commit 2b3a49a
Show file tree
Hide file tree
Showing 13 changed files with 57 additions and 57 deletions.
4 changes: 2 additions & 2 deletions docs/ebpf-library/libbpf/ebpf/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ Libbpf contains a number of C header files containing mostly pre-processor macro

The `bpf_helper_defs.h` file is automatically generated from the kernel sources. It contains forward declarations for every type that is used by [eBPF helper functions](../../../linux/helper-function/index.md) and somewhat special forward declarations for the helper functions themselves.

For example, the `bpf_map_lookup_elem` function is declared as:
For example, the [`bpf_map_lookup_elem`](../../../linux/helper-function/bpf_map_lookup_elem.md) function is declared as:

`#!c static void *(* const bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1;`

The normal forward declaration of this function would be

`#!c void *bpf_map_lookup_elem(void *map, const void *key);`.

But what the special declaration does is it casts a pointer of value `1` to a const static function pointer. This causes the compiler to emit a `call 1` instruction which the kernel recognizes as a call to the `bpf_map_lookup_elem` function.
But what the special declaration does is it casts a pointer of value `1` to a const static function pointer. This causes the compiler to emit a `call 1` instruction which the kernel recognizes as a call to the [`bpf_map_lookup_elem`](../../../linux/helper-function/bpf_map_lookup_elem.md) function.

It is entirely possible to copy parts of this file if you are only interested in specific helper functions and their types and even modify their definitions to suit your needs. Though for most people it will be best to include the whole file.

Expand Down
4 changes: 2 additions & 2 deletions docs/linux/concepts/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@
* [Resource Limit](resource-limit.md)
* [AF_XDP](af_xdp.md)
* [KFuncs](kfuncs.md)
* [dynptrs](dynptrs.md)
* [token](token.md)
* [Dynptrs](dynptrs.md)
* [Token](token.md)
14 changes: 7 additions & 7 deletions docs/linux/concepts/af_xdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ description: "This page explains the concept of AF_XDP in depth, AF_XDP being a
---
# AF_XDP

The kernel allows process to create sockets under the Address Family Express Data Path (AF_XDP) address family. This is a special socket type which in combination with an XDP program can perform full or partial kernel bypass. Bypassing the kernel network stack can increase performance in certain use cases. A socket created under the AF_XDP address family is also referred to as a XSK (XDP Socket).
The kernel allows process to create sockets under the Address Family Express Data Path (AF_XDP) address family. This is a special socket type which in combination with an [XDP program](../program-type/BPF_PROG_TYPE_XDP.md) can perform full or partial kernel bypass. Bypassing the kernel network stack can increase performance in certain use cases. A socket created under the AF_XDP address family is also referred to as a XSK (XDP Socket).

Examples of such use cases are:

* Custom protocol implementations - If a kernel does not understand a custom protocol, it will do a lot of unnecessary work, bypassing the kernel and giving the traffic to a process which handles it correctly avoids overhead.
* DDoS protection - If complex processing across multiple packets is required, eBPF programs can't keep up, thus forwarding traffic to user space for analysis might be needed.
* Application specific optimization - The Linux network stack by necessity needs to handle a lot of protocols and edge cases which are not applicable to workloads you are running. This means paying performance cost for features you are not using. While not easy, one can implement a custom network stack specific to their needs, to eke out every drop of performance.

All ingress traffic is first processes by an XDP program, it can make a decision on which traffic to pass to the stack and which to bypass. This is powerful since it allows a user to bypass traffic for very specific applications, ports and/or protocols without disrupting the normal packet processing. Unlike other kernel bypass techniques such as `PACKET_MMAP` or `PF_RING` which require you to handle all traffic and re-implement every protocol needed for the host to function.
All ingress traffic is first processes by an [XDP program](../program-type/BPF_PROG_TYPE_XDP.md), it can make a decision on which traffic to pass to the stack and which to bypass. This is powerful since it allows a user to bypass traffic for very specific applications, ports and/or protocols without disrupting the normal packet processing. Unlike other kernel bypass techniques such as `PACKET_MMAP` or `PF_RING` which require you to handle all traffic and re-implement every protocol needed for the host to function.

## Usage

Expand Down Expand Up @@ -47,7 +47,7 @@ static const int umem_len = chunk_size * chunk_count;
unsigned char[chunk_count][chunk_size] umem = malloc(umem_len);
```

Now that we have a UMEM, link it to the socket via the `setsockopt` syscall:
Now that we have a UMEM, link it to the socket via the [`setsockopt`](https://man7.org/linux/man-pages/man3/setsockopt.3p.html) syscall:

```c
struct xdp_umem_reg {
Expand All @@ -69,21 +69,21 @@ if (!setsockopt(fd, SOL_XDP, XDP_UMEM_REG, &umem_reg, sizeof(umem_reg)))
// handle error
```

Next up are our ring buffers. These are allocated by the kernel when we tell the kernel how large we want each ring buffer to be via a `setsockopt` syscall. After allocation, we can map the ring buffer into the memory of our process via the `mmap` syscall.
Next up are our ring buffers. These are allocated by the kernel when we tell the kernel how large we want each ring buffer to be via a [`setsockopt`](https://man7.org/linux/man-pages/man3/setsockopt.3p.html) syscall. After allocation, we can map the ring buffer into the memory of our process via the [`mmap`](https://man7.org/linux/man-pages/man2/mmap.2.html) syscall.

The following process should be repeated for each ring buffer (with different options, which will be pointed out):

We have to determine the desired ring buffer size, which must be a power of 2 for example `128`, `256`, `512`, `1024` etc. The sizes of the ring buffers can be tweaked and can differ from ring buffer to ring buffer, we will pick `512` for this example.

We inform the kernel of our chosen size via a `setsockopt` syscall:
We inform the kernel of our chosen size via a [`setsockopt`](https://man7.org/linux/man-pages/man3/setsockopt.3p.html) syscall:

```c
static const int ring_size = 512;
if (!setsockopt(fd, SOL_XDP, {XDP_RX_RING,XDP_TX_RING,XDP_UMEM_FILL_RING,XDP_UMEM_COMPLETION_RING}, &ring_size, sizeof(ring_size)))
// handle error
```
After we have set the sizes for all ring buffers we can request the `mmap` offsets with a `getsockopt` syscall:
After we have set the sizes for all ring buffers we can request the [`mmap`](https://man7.org/linux/man-pages/man2/mmap.2.html) offsets with a [`getsockopt`](https://man7.org/linux/man-pages/man3/getsockopt.3p.html) syscall:
```c
struct xdp_ring_offset {
Expand Down Expand Up @@ -308,7 +308,7 @@ The process of transferring data between the NIC and UMEM can work in copy or ze

You can request an explicit mode by specifying the `XDP_COPY` or `XDP_ZEROCOPY` flags when performing the bind syscall. If zero-copy mode is requested but not available, the bind syscall will result in an error.

Additionally, a bound socket can be queried with `getsockopt` and the `XDP_OPTIONS` option and `struct xdp_options` value. If the flag `XDP_OPTIONS_ZEROCOPY` is set, then the socket operates in zero-copy mode.
Additionally, a bound socket can be queried with [`getsockopt`](https://man7.org/linux/man-pages/man3/getsockopt.3p.html) and the `XDP_OPTIONS` option and `struct xdp_options` value. If the flag `XDP_OPTIONS_ZEROCOPY` is set, then the socket operates in zero-copy mode.

#### Headroom

Expand Down
12 changes: 6 additions & 6 deletions docs/linux/concepts/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In [:octicons-tag-24: v4.16](https://github.com/torvalds/linux/commit/cc8b0b92a1

### Function inlining

By default, the compiler will chose inline a function or to keep it a separate function. Compilers can be encouraged to inline or not inline a function with arguments like `__attribute__((always_inline))` or `__attribute__((noinline))`. Inlined functions do not incur the overhead of a function call as they will become part of the calling function. Inlined functions can also be optimized per call site since arguments are known.
By default, the compiler will chose inline a function or to keep it a separate function. Compilers can be encouraged to inline or not inline a function with arguments like `__attribute__((always_inline))`/[`__always_inline`](../../ebpf-library/libbpf/ebpf/__always_inline.md) or `__attribute__((noinline))`/[`__noinline`](../../ebpf-library/libbpf/ebpf/__noinline.md). Inlined functions do not incur the overhead of a function call as they will become part of the calling function. Inlined functions can also be optimized per call site since arguments are known.

### Tail calls

Expand All @@ -42,8 +42,8 @@ In [:octicons-tag-24: v5.13](https://github.com/torvalds/linux/commit/69c087ba62

In [:octicons-tag-24: v6.8](https://github.com/torvalds/linux/commit/94e1c70a34523b5e1529e4ec508316acc6a26a2b) global function argument annotation were added. These are a set of annotations (in practice these are BTF decl tags), which if added to an attribute, tell the verifier to restrict the input values to the function. Possible tags are:

* `__arg_ctx` - The argument is a pointer to a program context.
* `__arg_nonnull` - The argument can not be NULL.
* `__arg_nullable` - The argument can be NULL.
* `__arg_trusted` - The argument must be a [trusted value](kfuncs.md#kf_trusted_args).
* `__arg_arena` - The argument must be a pointer to a [memory arena](../map-type/BPF_MAP_TYPE_ARENA.md).
* [`__arg_ctx`](../../ebpf-library/libbpf/ebpf/__arg_ctx.md) - The argument is a pointer to a program context.
* [`__arg_nonnull`](../../ebpf-library/libbpf/ebpf/__arg_nonnull.md) - The argument can not be NULL.
* [`__arg_nullable`](../../ebpf-library/libbpf/ebpf/__arg_nullable.md) - The argument can be NULL.
* [`__arg_trusted`](../../ebpf-library/libbpf/ebpf/__arg_trusted.md) - The argument must be a [trusted value](kfuncs.md#kf_trusted_args).
* [`__arg_arena`](../../ebpf-library/libbpf/ebpf/__arg_arena.md) - The argument must be a pointer to a [memory arena](../map-type/BPF_MAP_TYPE_ARENA.md).
2 changes: 1 addition & 1 deletion docs/linux/concepts/kfuncs.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ char _license[] SEC("license") = "GPL";
```
!!! note
The definition of `__ksym` is `#define __ksym __attribute__((section(".ksyms")))`
The definition of [`__ksym`](../../ebpf-library/libbpf/ebpf/__ksym.md) is `#define __ksym __attribute__((section(".ksyms")))`
### Kernel modules
Expand Down
2 changes: 1 addition & 1 deletion docs/linux/concepts/loops.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ In [:octicons-tag-24: v6.4](https://github.com/torvalds/linux/commit/06accc8779c

The advantage of this method is that the verifier only has to check two states as opposed to the amount of iterations like with a bounded loop and we don't require a callback function like with the loop helper.

Every iterator type has a `bpf_iter_<type>_new` function to initialize the iterator, a `bpf_iter_<type>_next` function to get the next element, and a `bpf_iter_<type>_destroy` function to clean up the iterator. In the case of the numeric iterator, the `bpf_iter_num_new`, `bpf_iter_num_next` and `bpf_iter_num_destroy` functions are used.
Every iterator type has a `bpf_iter_<type>_new` function to initialize the iterator, a `bpf_iter_<type>_next` function to get the next element, and a `bpf_iter_<type>_destroy` function to clean up the iterator. In the case of the numeric iterator, the [`bpf_iter_num_new`](../kfuncs/bpf_iter_num_new.md), [`bpf_iter_num_next`](../kfuncs/bpf_iter_num_next.md) and [`bpf_iter_num_destroy`](../kfuncs/bpf_iter_bits_destroy.md) functions are used.

The most basic example of a numeric iterator is:

Expand Down
30 changes: 15 additions & 15 deletions docs/linux/concepts/maps.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,36 +47,36 @@ struct {

```
The `__uint` and `__type` macros used in the above example are typically used to make the type definition easier to read.
They are defined in [`tools/lib/bpf/bpf_helpers.h`](https://elixir.bootlin.com/linux/v6.2.2/source/tools/lib/bpf/bpf_helpers.h).
The [`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md), [`__type`](../../ebpf-library/libbpf/ebpf/__type.md), [`__array`](../../ebpf-library/libbpf/ebpf/__array.md) and [`__ulong`](../../ebpf-library/libbpf/ebpf/__ulong.md) macros used in the above example are typically used to make the type definition easier to read.
```c
#define __uint(name, val) int (*name)[val]
#define __type(name, val) typeof(val) *name
#define __array(name, val) typeof(val) *name[]
#define __ulong(name, val) enum { ___bpf_concat(__unique_value, __COUNTER__) = val } name
```

The `name` part of these macros refers to field names of the to be created structure. Not all names are recognized by libbpf and compatible libraries. However, the following are:

* `type` (`__uint`) - enum, see the [map types](../map-type/index.md) index for all valid options.
* `max_entries` (`__uint`) - int indicating the maximum amount of entries.
* `map_flags` (`__uint`) - a bitfield of flags, see [flags section](../syscall/BPF_MAP_CREATE.md#flags) in map load syscall command for valid options.
* `numa_node` (`__uint`) - the ID of the NUMA node on which to place the map.
* `key_size` (`__uint`) - the size of the key in bytes. This field is mutually exclusive with the `key` field.
* `key` (`__type`) - the type of the key. This field is mutually exclusive with the `key_size` field.
* `value_size` (`__uint`) - the size of the value in bytes. This field is mutually exclusive with the `value` and `values` fields.
* `value` (`__type`) - the type of the value. This field is mutually exclusive with the `value` and `value_size` fields.
* `values` (`__array`) - see [static values section](#static-values). This field is mutually exclusive with the `value` and `value_size` field.
* `pinning` (`__uint`) - `LIBBPF_PIN_BY_NAME` or `LIBBPF_PIN_NONE` see [pinning page](pinning.md) for details.
* `map_extra` (`__uint`) - Addition settings, currently only used by bloom filters which use the lowest 4 bits to indicate the amount of hashes used in the bloom filter.
* `type` ([`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md)) - enum, see the [map types](../map-type/index.md) index for all valid options.
* `max_entries` ([`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md)) - int indicating the maximum amount of entries.
* `map_flags` ([`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md)) - a bitfield of flags, see [flags section](../syscall/BPF_MAP_CREATE.md#flags) in map load syscall command for valid options.
* `numa_node` ([`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md)) - the ID of the NUMA node on which to place the map.
* `key_size` ([`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md)) - the size of the key in bytes. This field is mutually exclusive with the `key` field.
* `key` ([`__type`](../../ebpf-library/libbpf/ebpf/__type.md)) - the type of the key. This field is mutually exclusive with the `key_size` field.
* `value_size` ([`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md)) - the size of the value in bytes. This field is mutually exclusive with the `value` and `values` fields.
* `value` ([`__type`](../../ebpf-library/libbpf/ebpf/__type.md))) - the type of the value. This field is mutually exclusive with the `value` and `value_size` fields.
* `values` ([`__array`](../../ebpf-library/libbpf/ebpf/__array.md)) - see [static values section](#static-values). This field is mutually exclusive with the `value` and `value_size` field.
* `pinning` ([`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md)) - `LIBBPF_PIN_BY_NAME` or `LIBBPF_PIN_NONE` see [pinning page](pinning.md) for details.
* `map_extra` ([`__uint`](../../ebpf-library/libbpf/ebpf/__uint.md)) - Addition settings, currently only used by bloom filters which use the lowest 4 bits to indicate the amount of hashes used in the bloom filter.

Typically, only the `type`, `key`/`key_size`, `value`/`values`/`value_size`, and `max_entries` fields are required.

#### Static values

The `values` map field has a syntax when used, it is the only field to use the `__array` macro and requires us to initialize our map constant with a value. Its purpose is to populate the contents of the map during loading without having to do so manually via a userspace application. This is especially handy for users who use `ip`, `tc`, or `bpftool` to load their programs.
The `values` map field has a syntax when used, it is the only field to use the [`__array`](../../ebpf-library/libbpf/ebpf/__array.md) macro and requires us to initialize our map constant with a value. Its purpose is to populate the contents of the map during loading without having to do so manually via a userspace application. This is especially handy for users who use `ip`, `tc`, or `bpftool` to load their programs.

The `val` part of the `__array` parameter should contain a type describing the individual array elements. The values we would like to pre-populate should go into the value part of the struct initialization.
The `val` part of the [`__array`](../../ebpf-library/libbpf/ebpf/__array.md) parameter should contain a type describing the individual array elements. The values we would like to pre-populate should go into the value part of the struct initialization.

The following examples show how to pre-populate a map-in-map:

Expand Down
2 changes: 1 addition & 1 deletion docs/linux/concepts/resource-limit.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The Linux kernel has protection mechanisms that prevent processes from taking up

## Rlimit

rlimit or "resource limit" is a system to track and limit the amount of certain resources you are allowed to use. One of the things it limits is the amount of "locked memory" https://man7.org/linux/man-pages/man2/getrlimit.2.html
rlimit or "resource limit" is a system to track and limit the amount of certain resources you are allowed to use. One of the things it limits is the amount of "locked memory" [https://man7.org/linux/man-pages/man2/getrlimit.2.html](https://man7.org/linux/man-pages/man2/getrlimit.2.html)

Until kernel version v5.11 this mechanism was used to track and limit the memory usage of BPF maps which count towards the locked memory limit, so you commonly would have to increase or disable this rlimit which requires an additional capability `CAP_SYS_RESOURCE`.

Expand Down
2 changes: 1 addition & 1 deletion docs/linux/concepts/verifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,4 @@ Additionally, global functions can be replaced by [`freplace`](../program-type/B

The [`bpf_for_each_map_elem`](../helper-function/bpf_for_each_map_elem.md) helper also introduced the concept of callbacks. This allows users to declare a static function that is not directly called by the BPF program but is passed as function pointer to a helper to be called.

In later versions this mechanism is also used for [timers](timers.md), `bpf_find_vma`, and [loops](loops.md).
In later versions this mechanism is also used for [timers](timers.md), [`bpf_find_vma`](../helper-function/bpf_find_vma.md), and [loops](loops.md).
Loading

0 comments on commit 2b3a49a

Please sign in to comment.