Alexei Starovoitov 49b5300f1f Merge branch 'Support stashing local kptrs with bpf_kptr_xchg'
Dave Marchevsky says:

====================

Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program
BTF. A BPF program which creates a local kptr has exclusive control of the
lifetime of the kptr, and, prior to terminating, must:

  * free the kptr via bpf_obj_drop
  * If the kptr is a {list,rbtree} node, add the node to a {list, rbtree},
    thereby passing control of the lifetime to the collection

This series adds a third option:

  * stash the kptr in a map value using bpf_kptr_xchg

As indicated by the use of "stash" to describe this behavior, the intended use
of this feature is temporary storage of local kptrs. For example, a sched_ext
([0]) scheduler may want to create an rbtree node for each new cgroup on cgroup
init, but to add that node to the rbtree as part of a separate program which
runs on enqueue. Stashing the node in a map_value allows its lifetime to outlive
the execution of the cgroup_init program.

Behavior:

There is no semantic difference between adding a kptr to a graph collection and
"stashing" it in a map. In both cases exclusive ownership of the kptr's lifetime
is passed to some containing data structure, which is responsible for
bpf_obj_drop'ing it when the container goes away.

Since graph collections also expect exclusive ownership of the nodes they
contain, graph nodes cannot be both stashed in a map_value and contained by
their corresponding collection.

Implementation:

Two observations simplify the verifier changes for this feature. First, kptrs
("referenced kptrs" until a recent renaming) require registration of a
dtor function as part of their acquire/release semantics, so that a referenced
kptr which is placed in a map_value is properly released when the map goes away.
We want this exact behavior for local kptrs, but with bpf_obj_drop as the dtor
instead of a per-btf_id dtor.

The second observation is that, in terms of identification, "referenced kptr"
and "local kptr" already don't interfere with one another. Consider the
following example:

  struct node_data {
          long key;
          long data;
          struct bpf_rb_node node;
  };

  struct map_value {
          struct node_data __kptr *node;
  };

  struct {
          __uint(type, BPF_MAP_TYPE_ARRAY);
          __type(key, int);
          __type(value, struct map_value);
          __uint(max_entries, 1);
  } some_nodes SEC(".maps");

  struct map_value *mapval;
  struct node_data *res;
  int key = 0;

  res = bpf_obj_new(typeof(*res));
  if (!res) { /* err handling */ }

  mapval = bpf_map_lookup_elem(&some_nodes, &key);
  if (!mapval) { /* err handling */ }

  res = bpf_kptr_xchg(&mapval->node, res);
  if (res)
          bpf_obj_drop(res);

The __kptr tag identifies map_value's node as a referenced kptr, while the
PTR_TO_BTF_ID which bpf_obj_new returns - a type in some non-vmlinux,
non-module BTF - identifies res as a local kptr. Type tag on the pointer
indicates referenced kptr, while the type of the pointee indicates local kptr.
So using existing facilities we can tell the verifier about a "referenced kptr"
pointer to a "local kptr" pointee.

When kptr_xchg'ing a kptr into a map_value, the verifier can recognize local
kptr types and treat them like referenced kptrs with a properly-typed
bpf_obj_drop as a dtor.

Other implementation notes:
  * We don't need to do anything special to enforce "graph nodes cannot be
    both stashed in a map_value and contained by their corresponding collection"
    * bpf_kptr_xchg both returns and takes as input a (possibly-null) owning
      reference. It does not accept non-owning references as input by virtue
      of requiring a ref_obj_id. By definition, if a program has an owning
      ref to a node, the node isn't in a collection, so it's safe to pass
      ownership via bpf_kptr_xchg.

Summary of patches:

  * Patch 1 modifies BTF plumbing to support using bpf_obj_drop as a dtor
  * Patch 2 adds verifier plumbing to support MEM_ALLOC-flagged param for
    bpf_kptr_xchg
  * Patch 3 adds selftests exercising the new behavior

Changelog:

v1 -> v2: https://lore.kernel.org/bpf/20230309180111.1618459-1-davemarchevsky@fb.com/

Patch #s used below refer to the patch's position in v1 unless otherwise
specified.

Patches 1-3 were applied and are not included in v2.
Rebase onto latest bpf-next: "libbpf: Revert poisoning of strlcpy"

Patch 4: "bpf: Support __kptr to local kptrs"
  * Remove !btf_is_kernel(btf) check, WARN_ON_ONCE instead (Alexei)

Patch 6: "selftests/bpf: Add local kptr stashing test"
  * Add test which stashes 2 nodes and later unstashes one of them using a
    separate BPF program (Alexei)
  * Fix incorrect runner subtest name for original test (was
    "rbtree_add_nodes")
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 16:38:06 -08:00
2023-03-10 11:05:28 -08:00
2022-09-28 09:02:20 +02:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Description
No description provided
Readme 7.9 GiB
Languages
C 97.7%
Assembly 1.6%
Makefile 0.3%
Perl 0.1%