Commit Graph

5219 Commits

Author SHA1 Message Date
Saket Kumar Bhaskar
590699d858 selftests/bpf: Fix htab_update/reenter_update selftest failure
Since commit 31158ad02d ("rqspinlock: Add deadlock detection
and recovery") the updated path on re-entrancy now reports deadlock
via -EDEADLK instead of the previous -EBUSY.

Also, the way reentrancy was exercised (via fentry/lookup_elem_raw)
has been fragile because lookup_elem_raw may be inlined
(find_kernel_btf_id() will return -ESRCH).

To fix this fentry is attached to bpf_obj_free_fields() instead of
lookup_elem_raw() and:

- The htab map is made to use a BTF-described struct val with a
  struct bpf_timer so that check_and_free_fields() reliably calls
  bpf_obj_free_fields() on element replacement.

- The selftest is updated to do two updates to the same key (insert +
  replace) in prog_test.

- The selftest is updated to align with expected errno with the
  kernel’s current behavior.

Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20251117060752.129648-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-24 16:48:28 -08:00
Alan Maguire
ad93ba0267 selftests/bpf: Allow selftests to build with older xxd
Currently selftests require xxd with the "-n <name>" option
which allows the user to specify a name not derived from
the input object path.  Instead of relying on this newer
feature, older xxd can be used if we link our desired name
("test_progs_verification_cert") to the input object.

Many distros ship xxd in vim-common package and do not have
the latest xxd with -n support.

Fixes: b720903e2b ("selftests/bpf: Enable signature verification for some lskel tests")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20251120084754.640405-3-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-24 10:00:16 -08:00
Puranjay Mohan
cf49ec5705 selftests: bpf: Add tests for unbalanced rcu_read_lock
As verifier now supports nested rcu critical sections, add new test
cases to make sure unbalanced usage of rcu_read_lock()/unlock() is
rejected.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 18:34:59 -08:00
Puranjay Mohan
4167096cb9 bpf: support nested rcu critical sections
Currently, nested rcu critical sections are rejected by the verifier and
rcu_lock state is managed by a boolean variable. Add support for nested
rcu critical sections by make active_rcu_locks a counter similar to
active_preempt_locks. bpf_rcu_read_lock() increments this counter and
bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition
happens when active_rcu_locks drops to 0.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 18:34:59 -08:00
Eduard Zingerman
8f7cf305a1 bpf: test the correct stack liveness of tail calls
A new test is added: caller_stack_write_tail_call tests that the live
stack is correctly tracked for a tail call.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Link: https://lore.kernel.org/r/20251119160355.1160932-5-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 17:45:30 -08:00
Martin Teichmann
978da762ea bpf: test the proper verification of tail calls
Three tests are added:

- invalidate_pkt_pointers_by_tail_call checks that one can use the
  packet pointer after a tail call. This was originally possible
  and also poses not problems, but was made impossible by 1a4607ffba.

- invalidate_pkt_pointers_by_static_tail_call tests a corner case
  found by Eduard Zingerman during the discussion of the original fix,
  which was broken in that fix.

- subprog_result_tail_call tests that precision propagation works
  correctly across tail calls. This did not work before.

Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251119160355.1160932-3-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 17:45:30 -08:00
Xing Guo
b7f7d76d6e selftests/bpf: Update test_tag to use sha256
commit 603b441623 ("bpf: Update the bpf_prog_calc_tag to use SHA256")
changed digest of prog_tag to SHA256 but forgot to update tests
correspondingly. Fix it.

Fixes: 603b441623 ("bpf: Update the bpf_prog_calc_tag to use SHA256")
Signed-off-by: Xing Guo <higuoxing@gmail.com>
Link: https://lore.kernel.org/r/20251121061458.3145167-1-higuoxing@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 16:56:08 -08:00
Matt Bobrowski
ae24fc8a16 selftests/bpf: Improve reliability of test_perf_branches_no_hw()
Currently, test_perf_branches_no_hw() relies on the busy loop within
test_perf_branches_common() being slow enough to allow at least one
perf event sample tick to occur before starting to tear down the
backing perf event BPF program. With a relatively small fixed
iteration count of 1,000,000, this is not guaranteed on modern fast
CPUs, resulting in the test run to subsequently fail with the
following:

bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_good_sample:PASS:output not valid 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_stack 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_global 0 nsec
check_good_sample:PASS:read_branches_size 0 nsec
test_perf_branches_no_hw:PASS:perf_event_open 0 nsec
test_perf_branches_common:PASS:test_perf_branches_load 0 nsec
test_perf_branches_common:PASS:attach_perf_event 0 nsec
test_perf_branches_common:PASS:set_affinity 0 nsec
check_bad_sample:FAIL:output not valid no valid sample from prog
Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.

On a modern CPU (i.e. one with a 3.5 GHz clock rate), executing 1
million increments of a volatile integer can take significantly less
than 1 millisecond. If the spin loop and detachment of the perf event
BPF program elapses before the first 1 ms sampling interval elapses,
the perf event will never end up firing. Fix this by bumping the loop
iteration counter a little within test_perf_branches_common(), along
with ensuring adding another loop termination condition which is
directly influenced by the backing perf event BPF program
executing. Notably, a concious decision was made to not adjust the
sample_freq value as that is just not a reliable way to go about
fixing the problem. It effectively still leaves the race window open.

Fixes: 67306f84ca ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251119143540.2911424-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 16:49:16 -08:00
Matt Bobrowski
27746aaf1b selftests/bpf: skip test_perf_branches_hw() on unsupported platforms
Gracefully skip the test_perf_branches_hw subtest on platforms that
do not support LBR or require specialized perf event attributes
to enable branch sampling.

For example, AMD's Milan (Zen 3) supports BRS rather than traditional
LBR. This requires specific configurations (attr.type = PERF_TYPE_RAW,
attr.config = RETIRED_TAKEN_BRANCH_INSTRUCTIONS) that differ from the
generic setup used within this test. Notably, it also probably doesn't
hold much value to special case perf event configurations for selected
micro architectures.

Fixes: 67306f84ca ("selftests/bpf: Add bpf_read_branch_records() selftest")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251120142059.2836181-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 16:47:58 -08:00
Puranjay Mohan
d8774a3623 selftests: bpf: Enable gotox tests from arm64
arm64 JIT now supports gotox instruction and jumptables, so run tests in
verifier_gotox.c for arm64.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251117130732.11107-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21 16:40:21 -08:00
Hoyeon Lee
db354a1577 selftests/bpf: Use sockaddr_storage instead of sa46 in select_reuseport test
The select_reuseport selftest uses a custom sa46 union to represent
IPv4 and IPv6 addresses. This custom wrapper requires extra manual
handling for address family and field extraction.

Replace sa46 with sockaddr_storage and update the helper functions to
operate on native socket structures. This simplifies the code and
removes unnecessary custom address-handling logic. No functional
changes intended.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251121081332.2309838-3-hoyeon.lee@suse.com
2025-11-21 10:46:31 -08:00
Hoyeon Lee
fd6ed07a05 selftests/bpf: Use sockaddr_storage directly in cls_redirect test
The cls_redirect test uses a custom addr_port/tuple wrapper to represent
IPv4/IPv6 addresses and ports. This custom wrapper requires extra
conversion logic and specific helpers such as fill_addr_port(), which
are no longer necessary when using standard socket address structures.

This commit replaces addr_port/tuple with the standard sockaddr_storage
so test handles address families and ports using native socket types.
It removes the custom helper, eliminates redundant casts, and simplifies
the setup helpers without functional changes. set_up_conn() and
build_input() now take src/dst sockaddr_storage directly.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251121081332.2309838-2-hoyeon.lee@suse.com
2025-11-21 10:46:01 -08:00
Matt Bobrowski
d088da9042 selftests/bpf: Use ASSERT_STRNEQ to factor in long slab cache names
subtest_kmem_cache_iter_check_slabinfo() fundamentally compares slab
cache names parsed out from /proc/slabinfo against those stored within
struct kmem_cache_result. The current problem is that the slab cache
name within struct kmem_cache_result is stored within a bounded
fixed-length array (sized to SLAB_NAME_MAX(32)), whereas the name
parsed out from /proc/slabinfo is not. Meaning, using ASSERT_STREQ()
can certainly lead to test failures, particularly when dealing with
slab cache names that are longer than SLAB_NAME_MAX(32)
bytes. Notably, kmem_cache_create() allows callers to create slab
caches with somewhat arbitrarily sized names via its __name identifier
argument, so exceeding the SLAB_NAME_MAX(32) limit that is in place
now can certainly happen.

Make subtest_kmem_cache_iter_check_slabinfo() more reliable by only
checking up to sizeof(struct kmem_cache_result.name) - 1 using
ASSERT_STRNEQ().

Fixes: a496d0cdc8 ("selftests/bpf: Add a test for kmem_cache_iter")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20251118073734.4188710-1-mattbobrowski@google.com
2025-11-20 09:26:06 -08:00
Jakub Kicinski
9e203721ec Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc7).

No conflicts, adjacent changes:

tools/testing/selftests/net/af_unix/Makefile
  e1bb28bf13 ("selftest: af_unix: Add test for SO_PEEK_OFF.")
  45a1cd8346 ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20 09:13:26 -08:00
Hoyeon Lee
ec12ab2cda selftests/bpf: Replace TCP CC string comparisons with bpf_strncmp
The connect4_prog and bpf_iter_setsockopt tests duplicate the same
open-coded TCP congestion control string comparison logic. Since
bpf_strncmp() provides the same functionality, use it instead to
avoid repeated open-coded loops.

This change applies only to functional BPF tests and does not affect
the verifier performance benchmarks (veristat.cfg). No functional
changes intended.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251115225550.1086693-5-hoyeon.lee@suse.com
2025-11-18 14:57:53 -08:00
Hoyeon Lee
f700b37314 selftests/bpf: Move common TCP helpers into bpf_tracing_net.h
Some BPF selftests contain identical copies of the min(), max(),
before(), and after() helpers. These repeated snippets are the same
across the tests and do not need to be defined separately.

Move these helpers into bpf_tracing_net.h so they can be shared by
TCP related BPF programs. This removes repeated code and keeps the
helpers in a single place.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251115225550.1086693-4-hoyeon.lee@suse.com
2025-11-18 14:57:45 -08:00
Martin KaFai Lau
6cc73f3540 selftests/bpf: Test bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) when transport_header is not set
Add a test to check that bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) is
rejected (-EINVAL) if skb->transport_header is not set. The test
needs to lower the MTU of the loopback device. Thus, take this
opportunity to run the test in a netns by adding "ns_" to the test
name. The "serial_" prefix can then be removed.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20251112232331.1566074-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14 18:49:18 -08:00
Mykyta Yatsenko
a4d31f451d selftests/bpf: Align kfuncs renamed in bpf tree
bpf_task_work_schedule_resume() and bpf_task_work_schedule_signal() have
been renamed in bpf tree to bpf_task_work_schedule_resume_impl() and
bpf_task_work_schedule_signal_impl() accordingly.
There are few uses of these kfuncs in selftests that are not in bpf
tree, so that when we port [1] into bpf-next, those BPF programs will
not compile.
This patch aligns those remaining callsites with the kfunc renaming.
It should go on top of [1] when applying on bpf-next.

1: https://lore.kernel.org/all/20251104-implv2-v3-0-4772b9ae0e06@meta.com/

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251105132105.597344-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14 17:45:43 -08:00
Alexei Starovoitov
e47b68bda4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc5+
Cross-merge BPF and other fixes after downstream PR.

Minor conflict in kernel/bpf/helpers.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14 17:43:41 -08:00
Paul Houssel
a69e09823e selftests/bpf: Add BTF dedup tests for recursive typedef definitions
Add several ./test_progs tests:
    1.  btf/dedup:recursive typedef ensures that deduplication no
	longer fails on recursive typedefs.
    2.  btf/dedup:typedef ensures that typedefs are deduplicated correctly
	just as they were before this patch.

Signed-off-by: Paul Houssel <paul.houssel@orange.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/9fac2f744089f6090257d4c881914b79f6cd6c6a.1763037045.git.paul.houssel@orange.com
2025-11-14 17:07:20 -08:00
Alexei Starovoitov
c133390398 selftests/bpf: Fix failure paths in send_signal test
When test_send_signal_kern__open_and_load() fails parent closes the
pipe which cases ASSERT_EQ(read(pipe_p2c...)) to fail, but child
continues and enters infinite loop, while parent is stuck in wait(NULL).
Other error paths have similar issue, so kill the child before waiting on it.

The bug was discovered while compiling all of selftests with -O1 instead of -O2
which caused progs/test_send_signal_kern.c to fail to load.

Fixes: ab8b7f0cb3 ("tools/bpf: Add self tests for bpf_send_signal_thread()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251113171153.2583-1-alexei.starovoitov@gmail.com
2025-11-14 17:02:25 -08:00
Alexei Starovoitov
63066b7a8e selftests/bpf: Convert glob_match() to bpf arena
Increase arena test coverage.
Convert glob_match() to bpf arena in two steps:
1.
Copy paste lib/glob.c into bpf_arena_strsearch.h
Copy paste lib/globtests.c into progs/arena_strsearch.c

2.
Add __arena to pointers
Add __arg_arena to global functions that accept arena pointers
Add cond_break to loops

The test also serves as a good example of what's possible
with bpf arena and how existing algorithms can be converted.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251111032931.21430-1-alexei.starovoitov@gmail.com
2025-11-14 13:57:28 -08:00
Eduard Zingerman
6c762611fe selftests/bpf: Test widen_imprecise_scalars() with different stack depth
A test case for a situation when widen_imprecise_scalars() is called
with old->allocated_stack > cur->allocated_stack. Test structure:

    def widening_stack_size_bug():
      r1 = 0
      for r6 in 0..1:
        iterator_with_diff_stack_depth(r1)
        r1 = 42

    def iterator_with_diff_stack_depth(r1):
      if r1 != 42:
        use 128 bytes of stack
      iterator based loop

iterator_with_diff_stack_depth() is verified with r1 == 0 first and
r1 == 42 next. Causing stack usage of 128 bytes on a first visit and 8
bytes on a second. Such arrangement triggered a KASAN error in
widen_imprecise_scalars().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251114025730.772723-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14 09:26:28 -08:00
Matt Bobrowski
93ce3bee31 selftests/bpf: retry bpf_map_update_elem() when E2BIG is returned
Executing the test_maps binary on platforms with extremely high core
counts may cause intermittent assertion failures in
test_update_delete() (called via test_map_parallel()). This can occur
because bpf_map_update_elem() under some circumstances (specifically
in this case while performing bpf_map_update_elem() with BPF_NOEXIST
on a BPF_MAP_TYPE_HASH with its map_flags set to BPF_F_NO_PREALLOC)
can return an E2BIG error code i.e.

error -7 7 tools/testing/selftests/bpf/test_maps.c:#: void
test_update_delete(unsigned int, void *): Assertion `err == 0' failed.
tools/testing/selftests/bpf/test_maps.c:#: void
__run_parallel(unsigned int, void (*)(unsigned int, void *), void *):
Assertion `status == 0' failed.

As it turns out, is_map_full() which is called from alloc_htab_elem()
can take on a conservative approach when htab->use_percpu_counter is
true (which is the case here because the percpu_counter is used when a
BPF_MAP_TYPE_HASH is created with its map_flags set to
BPF_F_NO_PREALLOC). This conservative approach prioritizes preventing
over-allocation and potential issues that could arise from possibly
exceeding htab->map.max_entries in highly concurrent environments,
even if it means slightly under-utilizing the htab map's capacity.

Given that bpf_map_update_elem() from test_update_delete() can return
E2BIG, update can_retry() such that it also accounts for the E2BIG
error code (specifically only when running with map_flags being set to
BPF_F_NO_PREALLOC). The retry loop will allow the global count
belonging to the percpu_counter to become synchronized and better
reflect the current htab map's capacity.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251113092519.2632079-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13 14:36:28 -08:00
Jiayuan Chen
cb730e4ac1 selftests/bpf: Add mptcp test with sockmap
Add test cases to verify that when MPTCP falls back to plain TCP sockets,
they can properly work with sockmap.

Additionally, add test cases to ensure that sockmap correctly rejects
MPTCP sockets as expected.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-4-jiayuan.chen@linux.dev
2025-11-13 13:18:25 -08:00
Leon Hwang
c1cbf0d21c selftests/bpf: Add test to verify freeing the special fields in pcpu maps
Add test to verify that updating [lru_,]percpu_hash maps decrements
refcount when BPF_KPTR_REF objects are involved.

The tests perform the following steps:
. Call update_elem() to insert an initial value.
. Use bpf_refcount_acquire() to increment the refcount.
. Store the node pointer in the map value.
. Add the node to a linked list.
. Probe-read the refcount and verify it is *2*.
. Call update_elem() again to trigger refcount decrement.
. Probe-read the refcount and verify it is *1*.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251105151407.12723-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13 09:15:33 -08:00
D. Wythe
beb3c67297 bpf/selftests: Add selftest for bpf_smc_hs_ctrl
This tests introduces a tiny smc_hs_ctrl for filtering SMC connections
based on IP pairs, and also adds a realistic topology model to verify it.

Also, we can only use SMC loopback under CI test, so an additional
configuration needs to be enabled.

Follow the steps below to run this test.

make -C tools/testing/selftests/bpf
cd tools/testing/selftests/bpf
sudo ./test_progs -t smc

Results shows:
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20251107035632.115950-4-alibuda@linux.alibaba.com
2025-11-10 12:00:45 -08:00
Jakub Sitnicki
d2c5cca3fb selftests/bpf: Cover skb metadata access after bpf_skb_change_proto
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_proto(), which modifies packet headroom to accommodate
different IP header sizes.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-16-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:33 -08:00
Jakub Sitnicki
85d454afef selftests/bpf: Cover skb metadata access after change_head/tail helper
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_head() and bpf_skb_change_tail(), which modify packet
headroom/tailroom and can trigger head reallocation.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-15-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:33 -08:00
Jakub Sitnicki
29960e635b selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_adjust_room(), which modifies the packet headroom and can trigger
head reallocation.

The helper expects an Ethernet frame carrying an IP packet so switch test
packet identification by source MAC address since we can no longer rely on
Ethernet proto being set to zero.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-14-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:33 -08:00
Jakub Sitnicki
354d020c29 selftests/bpf: Cover skb metadata access after vlan push/pop helper
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_vlan_push() and bpf_skb_vlan_pop(), which modify the packet
headroom.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-13-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
1e1357fde8 selftests/bpf: Expect unclone to preserve skb metadata
Since pskb_expand_head() no longer clears metadata on unclone, update tests
for cloned packets to expect metadata to remain intact.

Also simplify the clone_dynptr_kept_on_{data,meta}_slice_write tests.
Creating an r/w dynptr slice is sufficient to trigger an unclone in the
prologue, so remove the extraneous writes to the data/meta slice.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-12-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
9ef9ac15a5 selftests/bpf: Dump skb metadata on verification failure
Add diagnostic output when metadata verification fails to help with
troubleshooting test failures. Introduce a check_metadata() helper that
prints both expected and received metadata to the BPF program's stderr
stream on mismatch. The userspace test reads and dumps this stream on
failure.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-11-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Jakub Sitnicki
967534e57c selftests/bpf: Verify skb metadata in BPF instead of userspace
Move metadata verification into the BPF TC programs. Previously,
userspace read metadata from a map and verified it once at test end.

Now TC programs compare metadata directly using __builtin_memcmp() and
set a test_pass flag. This enables verification at multiple points during
test execution rather than a single final check.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-10-5ceb08a9b37b@cloudflare.com
2025-11-10 10:52:32 -08:00
Alexis Lothoré (eBPF Foundation)
5b7d6c9198 selftests/bpf: Use start_server_str rather than start_reuseport_server in tc_tunnel
Now that start_server_str enforces SO_REUSEADDR, there's no need to keep
using start_reusport_server in tc_tunnel, especially since it only uses
one server at a time.

Replace start_reuseport_server with start_server_str in tc_tunnel test.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-start-server-soreuseaddr-v1-2-1bbd9c1f8d65@bootlin.com
2025-11-06 15:23:04 -08:00
Alexis Lothoré (eBPF Foundation)
38e36514fc selftests/bpf: Systematically add SO_REUSEADDR in start_server_addr
Some tests have to stop/start a server multiple time with the same
listening address. Doing so without SO_REUSADDR leads to failures due to
the socket still being in TIME_WAIT right after the first instance
stop/before the second instance start. Instead of letting each test
manually set SO_REUSEADDR on their servers, it can be done automatically
by start_server_addr for all tests (and without any major downside).

Enforce SO_REUSEADDR in start_server_addr for all tests.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-start-server-soreuseaddr-v1-1-1bbd9c1f8d65@bootlin.com
2025-11-06 15:23:00 -08:00
Anton Protopopov
ac4d838ce1 selftests/bpf: add C-level selftests for indirect jumps
Add C-level selftests for indirect jumps to validate LLVM and libbpf
functionality. The tests are intentionally disabled, to be run
locally by developers, but will not make the CI red.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-13-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-05 17:53:56 -08:00
Anton Protopopov
ccbdb48ce5 selftests/bpf: add new verifier_gotox test
Add a set of tests to validate core gotox functionality
without need to rely on compilers.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-12-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-05 17:53:23 -08:00
Anton Protopopov
ae48162a66 selftests/bpf: test instructions arrays with blinding
Add a specific test for instructions arrays with blinding enabled.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-7-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-05 17:53:22 -08:00
Anton Protopopov
218edd6db6 selftests/bpf: add selftests for new insn_array map
Add the following selftests for new insn_array map:

  * Incorrect instruction indexes are rejected
  * Two programs can't use the same map
  * BPF progs can't operate the map
  * no changes to code => map is the same
  * expected changes when instructions are added
  * expected changes when instructions are deleted
  * expected changes when multiple functions are present

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251105090410.1250500-5-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-05 17:53:18 -08:00
Jiri Olsa
3490d29964 selftests/bpf: Add stacktrace ips test for raw_tp
Adding test that verifies we get expected initial 2 entries from
stacktrace for rawtp probe via ORC unwind.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-05 17:05:19 -08:00
Jiri Olsa
c9e208fa93 selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi
Adding test that attaches kprobe/kretprobe multi and verifies the
ORC stacktrace matches expected functions.

Adding bpf_testmod_stacktrace_test function to bpf_testmod kernel
module which is called through several functions so we get reliable
call path for stacktrace.

The test is only for ORC unwinder to keep it simple.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251104215405.168643-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-05 17:05:19 -08:00
Kees Cook
85cb0757d7 net: Convert proto_ops connect() callbacks to use sockaddr_unsized
Update all struct proto_ops connect() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.

No binary changes expected.

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:32 -08:00
Kees Cook
0e50474fa5 net: Convert proto_ops bind() callbacks to use sockaddr_unsized
Update all struct proto_ops bind() callback function prototypes from
"struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the
compiler about object sizes. Calls into struct proto handlers gain casts
that will be removed in the struct proto conversion patch.

No binary changes expected.

Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04 19:10:32 -08:00
Mykyta Yatsenko
137cc92ffe bpf: add _impl suffix for bpf_stream_vprintk() kfunc
Rename bpf_stream_vprintk() to bpf_stream_vprintk_impl().

This makes bpf_stream_vprintk() follow the already established "_impl"
suffix-based naming convention for kfuncs with the bpf_prog_aux
argument provided by the verifier implicitly. This convention will be
taken advantage of with the upcoming KF_IMPLICIT_ARGS feature to
preserve backwards compatibility to BPF programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251104-implv2-v3-2-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
2025-11-04 17:50:25 -08:00
Mykyta Yatsenko
ea0714d61d bpf:add _impl suffix for bpf_task_work_schedule* kfuncs
Rename:
bpf_task_work_schedule_resume()->bpf_task_work_schedule_resume_impl()
bpf_task_work_schedule_signal()->bpf_task_work_schedule_signal_impl()

This aligns task work scheduling kfuncs with the established naming
scheme for kfuncs with the bpf_prog_aux argument provided by the
verifier implicitly. This convention will be taken advantage of with the
upcoming KF_IMPLICIT_ARGS feature to preserve backwards compatibility to
BPF programs.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251104-implv2-v3-1-4772b9ae0e06@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
2025-11-04 17:50:25 -08:00
Alan Maguire
cc77a20389 selftests/bpf: Test parsing of (multi-)split BTF
Write raw BTF to files, parse it and compare to original;
this allows us to test parsing of (multi-)split BTF code.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251104203309.318429-3-alan.maguire@oracle.com
2025-11-04 13:44:12 -08:00
KaFai Wan
9f32bfec54 selftests/bpf: Add test for conditional jumps on same scalar register
Add test cases to verify the correctness of the BPF verifier's branch analysis
when conditional jumps are performed on the same scalar register. And make sure
that JGT does not trigger verifier BUG.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251103063108.1111764-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-03 17:44:53 -08:00
Song Liu
62d2d0a338 selftests/bpf: Add tests for livepatch + bpf trampoline
Both livepatch and BPF trampoline use ftrace. Special attention is needed
when livepatch and fexit program touch the same function at the same
time, because livepatch updates a kernel function and the BPF trampoline
need to call into the right version of the kernel function.

Use samples/livepatch/livepatch-sample.ko for the test.

The test covers two cases:
  1) When a fentry program is loaded first. This exercises the
     modify_ftrace_direct code path.
  2) When a fentry program is loaded first. This exercises the
     register_ftrace_direct code path.

Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20251027175023.1521602-4-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-03 17:22:06 -08:00
Alexis Lothoré (eBPF Foundation)
e6e10c51fb selftests/bpf: Add checks in tc_tunnel when entering net namespaces
test_tc_tunnel is missing checks on any open_netns. Add those checks
anytime we try to enter a net namespace, and skip the related operations
if we fail. While at it, reduce the number of open_netns/close_netns for
cases involving operations in two distinct namespaces: the test
currently does the following:

  nstoken = open_netns("foo")
  do_operation();
  close(nstoken);
  nstoken = open_netns("bar")
  do_another_operation();
  close(nstoken);

As already stated in reviews for the initial test, we don't need to go
back to the root net namespace to enter a second namespace, so just do:

  ntoken_client = open_netns("foo")
  do_operation();
  nstoken_server = open_netns("bar")
  do_another_operation();
  close(nstoken_server);
  close(nstoken_client);

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251031-tc_tunnel_improv-v1-2-0ffe44d27eda@bootlin.com
2025-10-31 15:00:30 -07:00
Alexis Lothoré (eBPF Foundation)
c076fd5bb4 selftests/bpf: Skip tc_tunnel subtest if its setup fails
A subtest setup can fail in a wide variety of ways, so make sure not to
run it if an issue occurs during its setup. The return value is
already representing whether the setup succeeds or fails, it is just
about wiring it.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251031-tc_tunnel_improv-v1-1-0ffe44d27eda@bootlin.com
2025-10-31 15:00:30 -07:00
Bastien Curutchet (eBPF Foundation)
d1aec26fce selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework
test_xsk.c isn't part of the test_progs framework.

Integrate the tests defined by test_xsk.c into the test_progs framework
through a new file : prog_tests/xsk.c. ZeroCopy mode isn't tested in it
as veth peers don't support it.

Move test_xsk{.c/.h} to prog_tests/.

Add the find_bit library to test_progs sources in the Makefile as it is
is used by test_xsk.c

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-15-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
75fc630867 selftests/bpf: test_xsk: Isolate non-CI tests
Following tests won't fit in the CI:
- XDP_ADJUST_TAIL_* and SEND_RECEIVE_9K_PACKETS because of their
  flakyness
- UNALIGNED_* because they depend on huge page allocations
- *_RING_SIZE because they depend on HW rings
- TEARDOWN because it's too long

Remove these tests from the nominal tests table so they won't be
run by the CI in upcoming patch.
Create a skip_ci_tests table to hold them.
Use this skip_ci table in xskxceiver.c to keep all the tests available
from the test_xsk.sh script.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-14-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
7a96615f2e selftests/bpf: test_xsk: Don't exit immediately on allocation failures
If any allocation in the pkt_stream_*() helpers fail, exit_with_error() is
called. This terminates the program immediately. It prevents the following
tests from running and isn't compliant with the CI.

Return NULL in case of allocation failure.
Return TEST_FAILURE when something goes wrong in the packet generation.
Clean up the resources if a failure happens between two steps of a test.

Move exit_with_error()'s definition into xskxceiver.c as it isn't used
anywhere else now.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-13-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
844b13a9ff selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
__testapp_validate_traffic() calls exit_with_error() on failures. This
exits the program immediately. It prevents the following tests from
running and isn't compliant with the CI.

Return TEST_FAILURE instead of calling exit_with_error().
Release the resource of the 1st thread if a failure happens between its
creation and the creation of the second thread.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-12-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
5b2a757a16 selftests/bpf: test_xsk: Don't exit immediately when workers fail
TX and RX workers can fail in many places. These failures trigger a call
to exit_with_error() which exits the program immediately. It prevents the
following tests from running and isn't compliant with the CI.

Add return value to functions that can fail.
Handle failures more smoothly through report_failure().

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-11-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:39 -07:00
Bastien Curutchet (eBPF Foundation)
3f09728f90 selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails
exit_with_error() is called when gettimeofday() fails. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.

Return TEST_FAILURE instead of calling exit_on_error().

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-10-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
f12f1b5d14 selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails
xsk_reattach_xdp calls exit_with_error() on failures. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.

Add a return value to the functions handling XDP attachments to handle
errors more smoothly.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-9-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
e645bcfb16 selftests/bpf: test_xsk: Add return value to init_iface()
init_iface() doesn't have any return value while it can fail. In case of
failure it calls exit_on_error() which exits the application
immediately. This prevents the following tests from being run and isn't
compliant with the CI

Add a return value to init_iface() so errors can be handled more
smoothly.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-8-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
f477b0fd75 selftests/bpf: test_xsk: Release resources when swap fails
testapp_validate_traffic() doesn't release the sockets and the umem
created by the threads if the test isn't currently in its last step.
Thus, if the swap_xsk_resources() fails before the last step, the
created resources aren't cleaned up.

Clean the sockets and the umem in case of swap_xsk_resources() failure.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-7-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
e3dfa0faf1 selftests/bpf: test_xsk: Wrap test clean-up in functions
The clean-up done at the end of a test in __testapp_validate_traffic()
isn't wrapped in a function. It isn't convenient if we want to use it
somewhere else in the code.

Wrap the clean-up in two new functions : the first deletes the sockets,
the second releases the umem.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-6-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
bea4f03897 selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem()
testapp_xdp_shared_umem() generates pkt_stream on each xsk from xsk_arr,
where normally xsk_arr[0] gets pkt_streams and xsk_arr[1] have them NULLed.
At the end of the test pkt_stream_restore_default() only releases
xsk_arr[0] which leads to memory leaks.

Release the missing pkt_stream at the end of testapp_xdp_shared_umem()

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-5-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
d66e49ffa0 selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped()
testapp_stats_rx_dropped() generates pkt_stream twice. The last
generated is released by pkt_stream_restore_default() at the end of the
test but we lose the pointer of the first pkt_stream.

Release the 'middle' pkt_stream when it's getting replaced to prevent
memory leaks.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-4-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
cadc0c1fd7 selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value
__testapp_validate_traffic is supposed to return an integer value that
tells if the test passed (0), failed (-1) or was skiped (2). It actually
returns a boolean in the end. This doesn't harm when the test is
successful but can lead to misinterpretation in case of failure as 1
will be returned instead of -1.

Return TEST_FAILURE (-1) in case of failure, TEST_PASS (0) otherwise.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-3-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
2233ef8bba selftests/bpf: test_xsk: Initialize bitmap before use
bitmap is used before being initialized.

Initialize it to zero before using it.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-2-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Bastien Curutchet (eBPF Foundation)
3ab77f35a7 selftests/bpf: test_xsk: Split xskxceiver
AF_XDP features are tested by the test_xsk.sh script but not by the
test_progs framework. The tests used by the script are defined in
xksxceiver.c which can't be integrated in the test_progs framework as is.

Extract these test definitions from xskxceiver{.c/.h} to put them in new
test_xsk{.c/.h} files.
Keep the main() function and its unshared dependencies in xksxceiver to
avoid impacting the test_xsk.sh script which is often used to test real
hardware.
Move ksft_test_result_*() calls to xskxceiver.c to keep the kselftest's
report valid

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20251031-xsk-v7-1-39fe486593a3@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-31 09:24:38 -07:00
Kumar Kartikeya Dwivedi
a8a0abf097 selftests/bpf: Add ABBCCA case for rqspinlock stress test
Introduce a new mode for the rqspinlock stress test that exercises a
deadlock that won't be detected by the AA and ABBA checks, such that we
always reliably trigger the timeout fallback. We need 4 CPUs for this
particular case, as CPU 0 is untouched, and three participant CPUs for
triggering the ABBCCA case.

Refactor the lock acquisition paths in the module to better reflect the
three modes and choose the right lock depending on the context.

Also drop ABBA case from running by default as part of test progs, since
the stress test can consume a significant amount of time.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251029181828.231529-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-29 18:17:56 -07:00
Mykyta Yatsenko
5913e936f6 selftests/bpf: Fix intermittent failures in file_reader test
file_reader/on_open_expect_fault intermittently fails when test_progs
runs tests in parallel, because it expects a page fault on first read.
Another file_reader test running concurrently may have already pulled
the same pages into the page cache, eliminating the fault and causing a
spurious failure.

Make file_reader/on_open_expect_fault read from a file region that does
not overlap with other file_reader tests, so the initial access still
faults even under parallel execution.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251029195907.858217-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-29 18:15:30 -07:00
Alexis Lothoré (eBPF Foundation)
5d3591607d selftests/bpf: Remove test_tc_tunnel.sh
Now that test_tc_tunnel.sh scope has been ported to the test_progs
framework, remove it.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-4-505c12019f9d@bootlin.com
2025-10-29 12:17:24 -07:00
Alexis Lothoré (eBPF Foundation)
8517b1abe5 selftests/bpf: Integrate test_tc_tunnel.sh tests into test_progs
The test_tc_tunnel.sh script checks that a large variety of tunneling
mechanisms handled by the kernel can be handled as well by eBPF
programs. While this test shares similarities with test_tunnel.c (which
is already integrated in test_progs), those are testing slightly
different things:
- test_tunnel.c creates a tunnel interface, and then get and set tunnel
  keys in packet metadata, from BPF programs.
- test_tc_tunnels.sh manually parses/crafts packets content

Bring the tests covered by test_tc_tunnel.sh into the test_progs
framework, by creating a dedicated test_tc_tunnel.sh. This new test
defines a "generic" runner which, for each test configuration:
- will configure the relevant veth pair, each of those isolated in a
  dedicated namespace
- will check that traffic will fail if there is only an encapsulating
  program attached to one veth egress
- will check that traffic succeed if we enable some decapsulation module
  on kernel side
- will check that traffic still succeeds if we replace the kernel
  decapsulation with some eBPF ingress decapsulation.

Example of the new test execution:

  # ./test_progs -a tc_tunnel
  #447/1   tc_tunnel/ipip_none:OK
  #447/2   tc_tunnel/ipip6_none:OK
  #447/3   tc_tunnel/ip6tnl_none:OK
  #447/4   tc_tunnel/sit_none:OK
  #447/5   tc_tunnel/vxlan_eth:OK
  #447/6   tc_tunnel/ip6vxlan_eth:OK
  #447/7   tc_tunnel/gre_none:OK
  #447/8   tc_tunnel/gre_eth:OK
  #447/9   tc_tunnel/gre_mpls:OK
  #447/10  tc_tunnel/ip6gre_none:OK
  #447/11  tc_tunnel/ip6gre_eth:OK
  #447/12  tc_tunnel/ip6gre_mpls:OK
  #447/13  tc_tunnel/udp_none:OK
  #447/14  tc_tunnel/udp_eth:OK
  #447/15  tc_tunnel/udp_mpls:OK
  #447/16  tc_tunnel/ip6udp_none:OK
  #447/17  tc_tunnel/ip6udp_eth:OK
  #447/18  tc_tunnel/ip6udp_mpls:OK
  #447     tc_tunnel:OK
  Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-3-505c12019f9d@bootlin.com
2025-10-29 12:17:22 -07:00
Alexis Lothoré (eBPF Foundation)
86433db932 selftests/bpf: Make test_tc_tunnel.bpf.c compatible with big endian platforms
When trying to run bpf-based encapsulation in a s390x environment, some
parts of test_tc_tunnel.bpf.o do not encapsulate correctly the traffic,
leading to tests failures. Adding some logs shows for example that
packets about to be sent on an interface with the ip6vxlan_eth program
attached do not have the expected value 5 in the ip header ihl field,
and so are ignored by the program.

This phenomenon appears when trying to cross-compile the selftests,
rather than compiling it from a virtualized host: the selftests build
system may then wrongly pick some host headers. If <asm/byteorder.h>
ends up being picked on the host (and if the host has a endianness
different from the target one), it will then expose wrong endianness
defines (e.g __LITTLE_ENDIAN_BITFIELD instead of __BIT_ENDIAN_BITFIELD),
and it will for example mess up the iphdr structure layout used in the
ebpf program.

To prevent this, directly use the vmlinux.h header generated by the
selftests build system rather than including directly specific kernel
headers. As a consequence, add some missing definitions that are not
exposed by vmlinux.h, and adapt the bitfield manipulations to allow
building and using the program on both types of platforms.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-2-505c12019f9d@bootlin.com
2025-10-29 11:07:26 -07:00
Alexis Lothoré (eBPF Foundation)
1d5137c8d1 selftests/bpf: Add tc helpers
The test_tunnel.c file defines small fonctions to easily attach eBPF
programs to tc hooks, either on egress, ingress or both.

Create a shared helper in network_helpers.c so that other tests can
benefit from it.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-1-505c12019f9d@bootlin.com
2025-10-29 11:07:24 -07:00
Xu Kuohai
f9db3a3822 selftests/bpf/benchs: Add overwrite mode benchmark for BPF ring buffer
Add --rb-overwrite option to benchmark BPF ring buffer in overwrite mode.
Since overwrite mode is not yet supported by libbpf for consumer, also add
--rb-bench-producer option to benchmark producer directly without a consumer.

Benchmarks on an x86_64 and an arm64 CPU are shown below for reference.

- AMD EPYC 9654 (x86_64)

Ringbuf, multi-producer contention in overwrite mode, no consumer
=================================================================
rb-prod nr_prod 1    32.180 ± 0.033M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 2    9.617 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 3    8.810 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 4    9.272 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 8    9.173 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 12   3.086 ± 0.032M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 16   2.945 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 20   2.519 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 24   2.545 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 28   2.363 ± 0.024M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 32   2.357 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 36   2.267 ± 0.011M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 40   2.284 ± 0.020M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 44   2.215 ± 0.025M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 48   2.193 ± 0.023M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 52   2.208 ± 0.024M/s (drops 0.000 ± 0.000M/s)

- HiSilicon Kunpeng 920 (arm64)

Ringbuf, multi-producer contention in overwrite mode, no consumer
=================================================================
rb-prod nr_prod 1    14.478 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 2    21.787 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 3    6.045 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 4    5.352 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 8    4.850 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 12   3.542 ± 0.016M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 16   3.509 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 20   3.171 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 24   3.154 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 28   2.974 ± 0.015M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 32   3.167 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 36   2.903 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 40   2.866 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 44   2.914 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 48   2.806 ± 0.012M/s (drops 0.000 ± 0.000M/s)
Rb-prod nr_prod 52   2.840 ± 0.012M/s (drops 0.000 ± 0.000M/s)

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251018035738.4039621-4-xukuohai@huaweicloud.com
2025-10-27 19:47:32 -07:00
Xu Kuohai
8f7a86ecde selftests/bpf: Add overwrite mode test for BPF ring buffer
Add overwrite mode test for BPF ring buffer. The test creates a BPF ring
buffer in overwrite mode, then repeatedly reserves and commits records
to check if the ring buffer works as expected both before and after
overwriting occurs.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251018035738.4039621-3-xukuohai@huaweicloud.com
2025-10-27 19:46:32 -07:00
Mykyta Yatsenko
784cdf9315 selftests/bpf: add file dynptr tests
Introducing selftests for validating file-backed dynptr works as
expected.
 * validate implementation supports dynptr slice and read operations
 * validate destructors should be paired with initializers
 * validate sleepable progs can page in.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-11-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-27 09:56:27 -07:00
Mykyta Yatsenko
531b87d865 bpf: widen dynptr size/offset to 64 bit
Dynptr currently caps size and offset at 24 bits, which isn’t sufficient
for file-backed use cases; even 32 bits can be limiting. Refactor dynptr
helpers/kfuncs to use 64-bit size and offset, ensuring consistency
across the APIs.

This change does not affect internals of xdp, skb or other dynptrs,
which continue to behave as before. Also it does not break binary
compatibility.

The widening enables large-file access support via dynptr, implemented
in the next patches.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-3-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-27 09:56:26 -07:00
Mykyta Yatsenko
a61a257ff5 selftests/bpf: remove unnecessary kfunc prototypes
Remove unnecessary kfunc prototypes from test programs, these are
provided by vmlinux.h

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-2-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-27 09:56:26 -07:00
Jakub Kicinski
2b7553db91 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.18-rc3).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23 10:53:08 -07:00
Puranjay Mohan
7361c86485 selftests/bpf: Fix list_del() in arena list
The __list_del fuction doesn't set the previous node's next pointer to
the next node of the node to be deleted. It just updates the local variable
and not the actual pointer in the previous node.

The test was passing up till now because the bpf code is doing bpf_free()
after list_del and therfore reading head->first from the userspace will
read all zeroes. But after arena_list_del() is finished, head->first should
point to NULL;

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251017141727.51355-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 19:27:26 -07:00
Yonghong Song
4f8543b5f2 selftests/bpf: Fix selftest verif_scale_strobemeta failure with llvm22
With latest llvm22, I hit the verif_scale_strobemeta selftest failure
below:
  $ ./test_progs -n 618
  libbpf: prog 'on_event': BPF program load failed: -E2BIG
  libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG --
  BPF program is too large. Processed 1000001 insn
  verification time 7019091 usec
  stack depth 488
  processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0
  -- END PROG LOAD LOG --
  libbpf: prog 'on_event': failed to load: -E2BIG
  libbpf: failed to load object 'strobemeta.bpf.o'
  scale_test:FAIL:expect_success unexpected error: -7 (errno 7)
  #618     verif_scale_strobemeta:FAIL

But if I increase the verificaiton insn limit from 1M to 10M, the above
test_progs run actually will succeed. The below is the result from veristat:
  $ ./veristat strobemeta.bpf.o
  Processing 'strobemeta.bpf.o'...
  File              Program   Verdict  Duration (us)    Insns  States  Program size  Jited size
  ----------------  --------  -------  -------------  -------  ------  ------------  ----------
  strobemeta.bpf.o  on_event  success       90250893  9777685  358230         15954       80794
  ----------------  --------  -------  -------------  -------  ------  ------------  ----------
  Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs.

Further debugging shows the llvm commit [1] is responsible for the verificaiton
failure as it tries to convert certain switch statement to if-condition. Such
change may cause different transformation compared to original switch statement.

In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is
  define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2,
      ptr noundef %3, ptr noundef %4) #2 !dbg !535 {
    %6 = alloca ptr, align 8
    %7 = alloca i64, align 8
    %8 = alloca ptr, align 8
    %9 = alloca ptr, align 8
    %10 = alloca ptr, align 8
    %11 = alloca ptr, align 8
    %12 = alloca i32, align 4
    ...
    %20 = icmp ne ptr %19, null, !dbg !561
    br i1 %20, label %22, label %21, !dbg !562

  21:                                               ; preds = %5
    store i32 1, ptr %12, align 4
    br label %48, !dbg !563

  22:
    %23 = load ptr, ptr %9, align 8, !dbg !564
    ...

  47:                                               ; preds = %38, %22
    store i32 0, ptr %12, align 4, !dbg !588
    br label %48, !dbg !588

  48:                                               ; preds = %47, %21
    call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588
    %49 = load i32, ptr %12, align 4
    switch i32 %49, label %51 [
      i32 0, label %50
      i32 1, label %50
    ]

  50:                                               ; preds = %48, %48
    ret void, !dbg !589

  51:                                               ; preds = %48
    unreachable
  }

Note that the above 'switch' statement is added by clang frontend.
Without [1], the switch statement will survive until SelectionDag,
so the switch statement acts like a 'barrier' and prevents some
transformation involved with both 'before' and 'after' the switch statement.

But with [1], the switch statement will be removed during middle end
optimization and later middle end passes (esp. after inlining) have more
freedom to reorder the code.

The following is the related source code:

  static void *calc_location(struct strobe_value_loc *loc, void *tls_base):
        bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv);
        /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */
        return tls_ptr && tls_ptr != (void *)-1
                ? tls_ptr + tls_index.offset
                : NULL;

  In read_int_var() func, we have:
        void *location = calc_location(&cfg->int_locs[idx], tls_base);
        if (!location)
                return;

        bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location);
        ...

The static func calc_location() is called inside read_int_var(). The asm code
without [1]:
     77: .123....89 (85) call bpf_probe_read_user#112
     78: ........89 (79) r1 = *(u64 *)(r10 -368)
     79: .1......89 (79) r2 = *(u64 *)(r10 -8)
     80: .12.....89 (bf) r3 = r2
     81: .123....89 (0f) r3 += r1
     82: ..23....89 (07) r2 += 1
     83: ..23....89 (79) r4 = *(u64 *)(r10 -464)
     84: ..234...89 (a5) if r2 < 0x2 goto pc+13
     85: ...34...89 (15) if r3 == 0x0 goto pc+12
     86: ...3....89 (bf) r1 = r10
     87: .1.3....89 (07) r1 += -400
     88: .1.3....89 (b4) w2 = 16
In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place,
so the verifier actually prefers to do verification first at 'r1 = r10' etc.

The asm code with [1]:
    119: .123....89 (85) call bpf_probe_read_user#112
    120: ........89 (79) r1 = *(u64 *)(r10 -368)
    121: .1......89 (79) r2 = *(u64 *)(r10 -8)
    122: .12.....89 (bf) r3 = r2
    123: .123....89 (0f) r3 += r1
    124: ..23....89 (07) r2 += -1
    125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6
    126: ........89 (05) goto pc+17
    ...
    144: ........89 (b4) w1 = 0
    145: .1......89 (6b) *(u16 *)(r8 +80) = r1
In this case, if 'r2 < 0xfffffffe' is true, the control will go to
non-null 'location' branch, so 'goto pc+17' will actually go to
null 'location' branch. This seems causing tremendous amount of
verificaiton state.

To fix the issue, rewrite the following code
  return tls_ptr && tls_ptr != (void *)-1
                ? tls_ptr + tls_index.offset
                : NULL;
to if/then statement and hopefully these explicit if/then statements
are sticky during middle-end optimizations.

Test with llvm20 and llvm21 as well and all strobemeta related selftests
are passed.

  [1] https://github.com/llvm/llvm-project/pull/161000

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251014051639.1996331-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 19:25:03 -07:00
Yafang Shao
7484e7cd8a bpf: mark vma->{vm_mm,vm_file} as __safe_trusted_or_null
The vma->vm_mm might be NULL and it can be accessed outside of RCU. Thus,
we can mark it as trusted_or_null. With this change, BPF helpers can safely
access vma->vm_mm to retrieve the associated mm_struct from the VMA.
Then we can make policy decision from the VMA.

The "trusted" annotation enables direct access to vma->vm_mm within kfuncs
marked with KF_TRUSTED_ARGS or KF_RCU, such as bpf_task_get_cgroup1() and
bpf_task_under_cgroup(). Conversely, "null" enforcement requires all
callsites using vma->vm_mm to perform NULL checks.

The lsm selftest must be modified because it directly accesses vma->vm_mm
without a NULL pointer check; otherwise it will break due to this
change.

For the VMA based THP policy, the use case is as follows,

  @mm = @vma->vm_mm; // vm_area_struct::vm_mm is trusted or null
  if (!@mm)
      return;
  bpf_rcu_read_lock(); // rcu lock must be held to dereference the owner
  @owner = @mm->owner; // mm_struct::owner is rcu trusted or null
  if (!@owner)
    goto out;
  @cgroup1 = bpf_task_get_cgroup1(@owner, MEMCG_HIERARCHY_ID);

  /* make the decision based on the @cgroup1 attribute */

  bpf_cgroup_release(@cgroup1); // release the associated cgroup
out:
  bpf_rcu_read_unlock();

PSI memory information can be obtained from the associated cgroup to inform
policy decisions. Since upstream PSI support is currently limited to cgroup
v2, the following example demonstrates cgroup v2 implementation:

  @owner = @mm->owner;
  if (@owner) {
      // @ancestor_cgid is user-configured
      @ancestor = bpf_cgroup_from_id(@ancestor_cgid);
      if (bpf_task_under_cgroup(@owner, @ancestor)) {
          @psi_group = @ancestor->psi;

          /* Extract PSI metrics from @psi_group and
           * implement policy logic based on the values
           */

      }
  }

The vma::vm_file can also be marked with __safe_trusted_or_null.

No additional selftests are required since vma->vm_file and vma->vm_mm are
already validated in the existing selftest suite.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/r/20251016063929.13830-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 19:23:08 -07:00
Tiezhu Yang
c67f4ae737 selftests/bpf: Silence unused-but-set build warnings
There are some set but not used build errors when compiling bpf selftests
with the latest upstream mainline GCC, at the beginning add the attribute
__maybe_unused for the variables, but it is better to just add the option
-Wno-unused-but-set-variable to CFLAGS in Makefile to disable the errors
instead of hacking the tests.

  tools/testing/selftests/bpf/map_tests/lpm_trie_map_basic_ops.c:229:36:
  error: variable ‘n_matches_after_delete’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/map_tests/lpm_trie_map_basic_ops.c:229:25:
  error: variable ‘n_matches’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/bpf_cookie.c:426:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/find_vma.c:52:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/perf_branches.c:67:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/perf_link.c:15:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20251018082815.20622-1-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 19:21:29 -07:00
Alexei Starovoitov
50de48a4dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf at 6.18-rc2
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-18 18:20:57 -07:00
Brahmajit Das
a1e83d4c03 selftests/bpf: Fix redefinition of 'off' as different kind of symbol
This fixes the following build error

   CLNG-BPF [test_progs] verifier_global_ptr_args.bpf.o
progs/verifier_global_ptr_args.c:228:5: error: redefinition of 'off' as
different kind of symbol
   228 | u32 off;
       |     ^

The symbol 'off' was previously defined in
tools/testing/selftests/bpf/tools/include/vmlinux.h, which includes an
enum i40e_ptp_gpio_pin_state from
drivers/net/ethernet/intel/i40e/i40e_ptp.c:

	enum i40e_ptp_gpio_pin_state {
		end = -2,
		invalid = -1,
		off = 0,
		in_A = 1,
		in_B = 2,
		out_A = 3,
		out_B = 4,
	};

This enum is included when CONFIG_I40E is enabled. As of commit
032676ff82 ("LoongArch: Update Loongson-3 default config file"),
CONFIG_I40E is set in the defconfig, which leads to the conflict.

Renaming the local variable avoids the redefinition and allows the
build to succeed.

Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251017171551.53142-1-listout@listout.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-17 11:33:23 -07:00
Kuniyuki Iwashima
5f941dd87b selftests/bpf: Add test for sk->sk_bypass_prot_mem.
The test does the following for IPv4/IPv6 x TCP/UDP sockets
with/without sk->sk_bypass_prot_mem, which can be turned on by
net.core.bypass_prot_mem or bpf_setsockopt(SK_BPF_BYPASS_PROT_MEM).

  1. Create socket pairs
  2. Send NR_PAGES (32) of data (TCP consumes around 35 pages,
     and UDP consuems 66 pages due to skb overhead)
  3. Read memory_allocated from sk->sk_prot->memory_allocated and
     sk->sk_prot->memory_per_cpu_fw_alloc
  4. Check if unread data is charged to memory_allocated

If sk->sk_bypass_prot_mem is set, memory_allocated should not be
changed, but we allow a small error (up to 10 pages) in case
other processes on the host use some amounts of TCP/UDP memory.

The amount of allocated pages are buffered to per-cpu variable
{tcp,udp}_memory_per_cpu_fw_alloc up to +/- net.core.mem_pcpu_rsv
before reported to {tcp,udp}_memory_allocated.

At 3., memory_allocated is calculated from the 2 variables at
fentry of socket create function.

We drain the receive queue only for UDP before close() because UDP
recv queue is destroyed after RCU grace period.  When I printed
memory_allocated, UDP bypass cases sometimes saw the no-bypass
case's leftover, but it's still in the small error range (<10 pages).

  bpf_trace_printk: memory_allocated: 0   <-- TCP no-bypass
  bpf_trace_printk: memory_allocated: 35
  bpf_trace_printk: memory_allocated: 0   <-- TCP w/ sysctl
  bpf_trace_printk: memory_allocated: 0
  bpf_trace_printk: memory_allocated: 0   <-- TCP w/ bpf
  bpf_trace_printk: memory_allocated: 0
  bpf_trace_printk: memory_allocated: 0   <-- UDP no-bypass
  bpf_trace_printk: memory_allocated: 66
  bpf_trace_printk: memory_allocated: 2   <-- UDP w/ sysctl (2 pages leftover)
  bpf_trace_printk: memory_allocated: 2
  bpf_trace_printk: memory_allocated: 2   <-- UDP w/ bpf (2 pages leftover)
  bpf_trace_printk: memory_allocated: 2

We prefer finishing tests faster than oversleeping for call_rcu()
 + sk_destruct().

The test completes within 2s on QEMU (64 CPUs) w/ KVM.

  # time ./test_progs -t sk_bypass
  #371/1   sk_bypass_prot_mem/TCP  :OK
  #371/2   sk_bypass_prot_mem/UDP  :OK
  #371/3   sk_bypass_prot_mem/TCPv6:OK
  #371/4   sk_bypass_prot_mem/UDPv6:OK
  #371     sk_bypass_prot_mem:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

  real	0m1.481s
  user	0m0.181s
  sys	0m0.441s

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://patch.msgid.link/20251014235604.3057003-7-kuniyu@google.com
2025-10-16 12:04:47 -07:00
Xing Guo
0c1999ed33 selftests: arg_parsing: Ensure data is flushed to disk before reading.
test_parse_test_list_file writes some data to
/tmp/bpf_arg_parsing_test.XXXXXX and parse_test_list_file() will read
the data back.  However, after writing data to that file, we forget to
call fsync() and it's causing testing failure in my laptop.  This patch
helps fix it by adding the missing fsync() call.

Fixes: 64276f01dc ("selftests/bpf: Test_progs can read test lists from file")
Signed-off-by: Xing Guo <higuoxing@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251016035330.3217145-1-higuoxing@gmail.com
2025-10-16 09:34:39 -07:00
Andrii Nakryiko
e603a342cf selftests/bpf: make arg_parsing.c more robust to crashes
We started getting a crash in BPF CI, which seems to originate from
test_parse_test_list_file() test and is happening at this line:

  ASSERT_OK(strcmp("test_with_spaces", set.tests[0].name), "test 0 name");

One way we can crash there is if set.cnt zero, which is checked for with
ASSERT_EQ() above, but we proceed after this regardless of the outcome.
Instead of crashing, we should bail out with test failure early.

Similarly, if parse_test_list_file() fails, we shouldn't be even looking
at set, so bail even earlier if ASSERT_OK() fails.

Fixes: 64276f01dc ("selftests/bpf: Test_progs can read test lists from file")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251014202037.72922-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-14 16:39:33 -07:00
Alexei Starovoitov
39e9d5f630 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf before 6.18-rc1
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-11 18:27:47 -07:00
Mykyta Yatsenko
bca2b74ea9 selftests/bpf: Add more bpf_wq tests
Add bpf_wq selftests to verify:
 * BPF program using non-constant offset of struct bpf_wq is rejected
 * BPF program using map with no BTF for storing struct bpf_wq is rejected

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251010164606.147298-2-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 11:12:51 -07:00
Paul Chaignon
bc3eeb4259 selftests/bpf: Test direct packet access on non-linear skbs
This patch adds new selftests in the direct packet access suite, to
cover the non-linear case. The first six tests cover the behavior of
the bounds check with a non-linear skb. The last test adds a call to
bpf_skb_pull_data() to be able to access the packet.

Note that the size of the linear area includes the L2 header, but for
some program types like cgroup_skb, ctx->data points to the L3 header.
Therefore, a linear area of 22 bytes will have only 8 bytes accessible
to the BPF program (22 - ETH_HLEN). For that reason, the cgroup_skb test
cases access the packet at an offset of 8 bytes.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/ceedbfd719e58f0d49dcceb8592f5e6bd38ce5fe.1760037899.git.paul.chaignon@gmail.com
2025-10-10 10:43:04 -07:00
Paul Chaignon
8d45d0398d selftests/bpf: Support non-linear flag in test loader
This patch adds support for a new tag __linear_size in the test loader,
to specify the size of the linear area in case of non-linear skbs. If
the tag is absent or null, a linear skb is crafted.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/7ad928ec7591daef4f1b84032aeb86c918b3e5a7.1760037899.git.paul.chaignon@gmail.com
2025-10-10 10:43:04 -07:00
KaFai Wan
accb9a7e87 selftests/bpf: Add test for unpinning htab with internal timer struct
Add test to verify that unpinning hash tables containing internal timer
structures does not trigger context warnings.

Each subtest (timer_prealloc and timer_no_prealloc) can trigger the
context warning when unpinning, but the warning cannot be triggered
twice within a short time interval (a HZ), which is expected behavior.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251008102628.808045-3-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 10:10:08 -07:00
Rong Tao
eca0b643ef selftests/bpf: Test bpf_strcasestr,bpf_strncasestr kfuncs
Add tests for new kfuncs bpf_strcasestr() and bpf_strncasestr().

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_4F1A340A8966155C52AA9CBDB68FD221FE0A@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 10:05:32 -07:00
Kumar Kartikeya Dwivedi
5b1b5d380a selftests/bpf: Add tests for async cb context
Add tests to verify that async callback's sleepable attribute is
correctly determined by the callback type, not the arming program's
context, reflecting its true execution context.

Introduce verifier_async_cb_context.c with tests for all three async
callback primitives: bpf_timer, bpf_wq, and bpf_task_work. Each
primitive is tested when armed from both sleepable (lsm.s/file_open) and
non-sleepable (fentry) programs.

Test coverage:
- bpf_timer callbacks: Verify they are never sleepable, even when armed
  from sleepable programs. Both tests should fail when attempting to use
  sleepable helper bpf_copy_from_user() in the callback.

- bpf_wq callbacks: Verify they are always sleepable, even when armed
  from non-sleepable programs. Both tests should succeed when using
  sleepable helpers in the callback.

- bpf_task_work callbacks: Verify they are always sleepable, even when
  armed from non-sleepable programs. Both tests should succeed when
  using sleepable helpers in the callback.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251007220349.3852807-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 10:04:51 -07:00
Rong Tao
de7342228b bpf: Finish constification of 1st parameter of bpf_d_path()
The commit 1b8abbb121 ("bpf...d_path(): constify path argument")
constified the first parameter of the bpf_d_path(), but failed to
update it in all places. Finish constification.

Otherwise the selftest fail to build:
.../selftests/bpf/bpf_experimental.h:222:12: error: conflicting types for 'bpf_path_d_path'
  222 | extern int bpf_path_d_path(const struct path *path, char *buf, size_t buf__sz) __ksym;
      |            ^
.../selftests/bpf/tools/include/vmlinux.h:153922:12: note: previous declaration is here
 153922 | extern int bpf_path_d_path(struct path *path, char *buf, size_t buf__sz) __weak __ksym;

Fixes: 1b8abbb121 ("bpf...d_path(): constify path argument")
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-04 09:05:23 -07:00
Linus Torvalds
cbf33b8e0b bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjfBtMACgkQ6rmadz2v
 bToTuA/+JizieMIW6eqzCDrrhj45DmuKu6gUMrkmM0xte3QCoLly5FDGJ9HMmrqf
 L5xfz/TAExg/Nh9a0zpCGE7/fMr9P10Z5vVHMe/I06rcW+/fDrAERXDurlfpfliv
 /7KqF/lq1D6kp7bcBzWX4dzkT66Nh/Ax6ZJKIXF/K3iwFXBb1A/95Wwgqxuc3xNM
 n0kthEhnX6x5cojy6fla2zYNoebNLVxPvafTcAtnE0QM8vxdOl0Q7KWrdQpVE5V/
 z6WF2M822eTQeUIy6k/ZRckFlmEwa++I+w5EnygBFP8INUHARdYNBQpQVFkg31jz
 GYGlbTs7jaDbMNWkEhKyDbuhL5Xq9/5FGOaLI9CvDtpPjbfYTZMGtIn9L902XtqE
 VEfBcTA/g/qd8Urx5622sfPwau64LrSAKBRE/4CJ7/dsnPx4BdKavsA5DMVuyRIm
 Au6tOxlRFhPWhultnmmbFBjwn33xTleYNUF9wpWQnijNq+Ph29gKZk8Ac5eKKS6A
 dTSCxv7LGafezN/evr779vXohZyF9vWfqHdITVzTo40tsSbi5v4uyfJ2qzoaS7Gf
 UQ4F2nkjqddhG/O1W2TY8aFzmWXP2q92V6yqkcq5zjA3D1Ue6JOxRzG/3kNzgsLX
 Gf1DmPosUyeLNpwdDYRD83Dk7tJ7rAI+x3i6Iwh8qivIhC4CzrU=
 =Kzf9
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix selftests/bpf (typo, conflicts) and unbreak BPF CI (Jiri Olsa)

 - Remove linux/unaligned.h dependency for libbpf_sha256 (Andrii
   Nakryiko) and add a test (Eric Biggers)

 - Reject negative offsets for ALU operations in the verifier (Yazhou
   Tang) and add a test (Eduard Zingerman)

 - Skip scalar adjustment for BPF_NEG operation if destination register
   is a pointer (Brahmajit Das) and add a test (KaFai Wan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  libbpf: Fix missing #pragma in libbpf_utils.c
  selftests/bpf: Add tests for rejection of ALU ops with negative offsets
  selftests/bpf: Add test for libbpf_sha256()
  bpf: Reject negative offsets for ALU ops
  libbpf: remove linux/unaligned.h dependency for libbpf_sha256()
  libbpf: move libbpf_sha256() implementation into libbpf_utils.c
  libbpf: move libbpf_errstr() into libbpf_utils.c
  libbpf: remove unused libbpf_strerror_r and STRERR_BUFSIZE
  libbpf: make libbpf_errno.c into more generic libbpf_utils.c
  selftests/bpf: Add test for BPF_NEG alu on CONST_PTR_TO_MAP
  bpf: Skip scalar adjustment for BPF_NEG if dst is a pointer
  selftests/bpf: Fix realloc size in bpf_get_addrs
  selftests/bpf: Fix typo in subtest_basic_usdt after merge conflict
  selftests/bpf: Fix open-coded gettid syscall in uprobe syscall tests
2025-10-03 19:38:19 -07:00
Linus Torvalds
50647a1176 file->f_path constification
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaN3daAAKCRBZ7Krx/gZQ
 6zNWAP9kD6rOJRNqDgea4pibDPa47Tps/WM5tsDv3dsLliY29gEA6sveOWZ3guAj
 4oY3ts/NtHLWXvhI7Vd/1mr2aTKEZQk=
 =YNK+
 -----END PGP SIGNATURE-----

Merge tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull file->f_path constification from Al Viro:
 "Only one thing was modifying ->f_path of an opened file - acct(2).

  Massaging that away and constifying a bunch of struct path * arguments
  in functions that might be given &file->f_path ends up with the
  situation where we can turn ->f_path into an anon union of const
  struct path f_path and struct path __f_path, the latter modified only
  in a few places in fs/{file_table,open,namei}.c, all for struct file
  instances that are yet to be opened"

* tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
  Have cc(1) catch attempts to modify ->f_path
  kernel/acct.c: saner struct file treatment
  configfs:get_target() - release path as soon as we grab configfs_item reference
  apparmor/af_unix: constify struct path * arguments
  ovl_is_real_file: constify realpath argument
  ovl_sync_file(): constify path argument
  ovl_lower_dir(): constify path argument
  ovl_get_verity_digest(): constify path argument
  ovl_validate_verity(): constify {meta,data}path arguments
  ovl_ensure_verity_loaded(): constify datapath argument
  ksmbd_vfs_set_init_posix_acl(): constify path argument
  ksmbd_vfs_inherit_posix_acl(): constify path argument
  ksmbd_vfs_kern_path_unlock(): constify path argument
  ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path *
  check_export(): constify path argument
  export_operations->open(): constify path argument
  rqst_exp_get_by_name(): constify path argument
  nfs: constify path argument of __vfs_getattr()
  bpf...d_path(): constify path argument
  done_path_create(): constify path argument
  ...
2025-10-03 16:32:36 -07:00
Linus Torvalds
07fdad3a93 Networking changes for 6.18.
Core & protocols
 ----------------
 
  - Improve drop account scalability on NUMA hosts for RAW and UDP sockets
    and the backlog, almost doubling the Pps capacity under DoS.
 
  - Optimize the UDP RX performance under stress, reducing contention,
    revisiting the binary layout of the involved data structs and
    implementing NUMA-aware locking. This improves UDP RX performance by
    an additional 50%, even more under extreme conditions.
 
  - Add support for PSP encryption of TCP connections; this mechanism has
    some similarities with IPsec and TLS, but offers superior HW offloads
    capabilities.
 
  - Ongoing work to support Accurate ECN for TCP. AccECN allows more than
    one congestion notification signal per RTT and is a building block for
    Low Latency, Low Loss, and Scalable Throughput (L4S).
 
  - Reorganize the TCP socket binary layout for data locality, reducing
    the number of touched cachelines in the fastpath.
 
  - Refactor skb deferral free to better scale on large multi-NUMA hosts,
    this improves TCP and UDP RX performances significantly on such HW.
 
  - Increase the default socket memory buffer limits from 256K to 4M to
    better fit modern link speeds.
 
  - Improve handling of setups with a large number of nexthop, making dump
    operating scaling linearly and avoiding unneeded synchronize_rcu() on
    delete.
 
  - Improve bridge handling of VLAN FDB, storing a single entry per bridge
    instead of one entry per port; this makes the dump order of magnitude
    faster on large switches.
 
  - Restore IP ID correctly for encapsulated packets at GSO segmentation
    time, allowing GRO to merge packets in more scenarios.
 
  - Improve netfilter matching performance on large sets.
 
  - Improve MPTCP receive path performance by leveraging recently
    introduced core infrastructure (skb deferral free) and adopting recent
    TCP autotuning changes.
 
  - Allow bridges to redirect to a backup port when the bridge port is
    administratively down.
 
  - Introduce MPTCP 'laminar' endpoint that con be used only once per
    connection and simplify common MPTCP setups.
 
  - Add RCU safety to dst->dev, closing a lot of possible races.
 
  - A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing
    code duplication.
 
  - Supports pulling data from an skb frag into the linear area of an XDP
    buffer.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Generate netlink documentation from YAML using an integrated
    YAML parser.
 
 Driver API
 ----------
 
  - Support using IPv6 Flow Label in Rx hash computation and RSS queue
    selection.
 
  - Introduce API for fetching the DMA device for a given queue, allowing
    TCP zerocopy RX on more H/W setups.
 
  - Make XDP helpers compatible with unreadable memory, allowing more
    easily building DevMem-enabled drivers with a unified XDP/skbs
    datapath.
 
  - Add a new dedicated ethtool callback enabling drivers to provide the
    number of RX rings directly, improving efficiency and clarity in RX
    ring queries and RSS configuration.
 
  - Introduce a burst period for the health reporter, allowing better
    handling of multiple errors due to the same root cause.
 
  - Support for DPLL phase offset exponential moving average, controlling
    the average smoothing factor.
 
 Device drivers
 --------------
 
  - Add a new Huawei driver for 3rd gen NIC (hinic3).
 
  - Add a new SpacemiT driver for K1 ethernet MAC.
 
  - Add a generic abstraction for shared memory communication devices
    (dibps)
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox:
      - Use multiple per-queue doorbell, to avoid MMIO contention issues
      - support adjacent functions, allowing them to delegate their
        SR-IOV VFs to sibling PFs
      - support RSS for IPSec offload
      - support exposing raw cycle counters in PTP and mlx5
      - support for disabling host PFs.
    - Intel (100G, ice, idpf):
      - ice: support for SRIOV VFs over an Active-Active link aggregate
      - ice: support for firmware logging via debugfs
      - ice: support for Earliest TxTime First (ETF) hardware offload
      - idpf: support basic XDP functionalities and XSk
    - Broadcom (bnxt):
      - support Hyper-V VF ID
      - dynamic SRIOV resource allocations for RoCE
    - Meta (fbnic):
      - support queue API, zero-copy Rx and Tx
      - support basic XDP functionalities
      - devlink health support for FW crashes and OTP mem corruptions
      - expand hardware stats coverage to FEC, PHY, and Pause
    - Wangxun:
      - support ethtool coalesce options
      - support for multiple RSS contexts
 
  - Ethernet virtual:
    - Macsec:
      - replace custom netlink attribute checks with policy-level checks
    - Bonding:
      - support aggregator selection based on port priority
    - Microsoft vNIC:
      - use page pool fragments for RX buffers instead of full pages to
        improve memory efficiency
 
  - Ethernet NICs consumer, and embedded:
    - Qualcomm: support Ethernet function for IPQ9574 SoC
    - Airoha: implement wlan offloading via NPU
    - Freescale
      - enetc: add NETC timer PTP driver and add PTP support
      - fec: enable the Jumbo frame support for i.MX8QM
    - Renesas (R-Car S4): support HW offloading for layer 2 switching
      - support for RZ/{T2H, N2H} SoCs
    - Cadence (macb): support TAPRIO traffic scheduling
    - TI:
      - support for Gigabit ICSS ethernet SoC (icssm-prueth)
    - Synopsys (stmmac): a lot of cleanups
 
  - Ethernet PHYs:
    - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
      driver
    - Support bcm63268 GPHY power control
    - Support for Micrel lan8842 PHY and PTP
    - Support for Aquantia AQR412 and AQR115
 
  - CAN:
    - a large CAN-XL preparation work
    - reorganize raw_sock and uniqframe struct to minimize memory usage
    - rcar_canfd: update the CAN-FD handling
 
  - WiFi:
    - extended Neighbor Awareness Networking (NAN) support
    - S1G channel representation cleanup
    - improve S1G support
 
  - WiFi drivers:
    - Intel (iwlwifi):
      - major refactor and cleanup
    - Broadcom (brcm80211):
      - support for AP isolation
    - RealTek (rtw88/89) rtw88/89:
      - preparation work for RTL8922DE support
    - MediaTek (mt76):
      - HW restart improvements
      - MLO support
    - Qualcomm/Atheros (ath10k_
      - GTK rekey fixes
 
  - Bluetooth drivers:
    - btusb: support for several new IDs for MT7925
    - btintel: support for BlazarIW core
    - btintel_pcie: support for _suspend() / _resume()
    - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjdBdsSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkWnAP/2Jz0ExnYMDxLxkkKqMte+3JNeF/KNjB
 zVIslYN/5ekkd1TMXyDLFlWHqUOUMSF9+cK5ZRMHG6/lzOueSzFuuuDFsWs6Kf2f
 q7Q9MzlXhR9++YCsdES1uS5x3PPjMInzo2ZivMFa6fUDLLFSzeAOKL9k+RS0EggU
 VlXv2Wkm73R0O6KAssgDsHke9cnNz+F0DzhQ0S3qkyZF9tS5NrDeUl7fZ47XZgwb
 ZuXdEzKmTTepo2XvZGxJoF2D7nekWFlHhLjEPpDJtET19nwhukCry41/FplJwlKR
 dWsYkqLkrOEQKFQ+q++/5c3BgFOG+IrQLbP5bGQF1tZRMl4Dqp9PDxK5fKJfccbS
 0VY3Y2qWOSYejT2Ji2kEHR5E5rPyFm3Y5A4Rnpz0yEHj14vL2v4zf7CZRkCyyDfD
 doIZXFGkM0+N7QeXLEN833cje9zjaXuP9GAE+2bb+wAWAZAqof4KX8JgHh+y5Rwm
 pvUtvFxmEtntlMwNBap8aT3FquGtfZncU8pzAE5kvWjuMvyF0NVWiE5rB2kSQM/X
 NLmUdvDyjwwJqthXAJG0fl+a0mNJ/kOAqSOKJDJrfKDGWa+ovwY0iY06SpK0wIbO
 Wz7tpMk5MSlYXW8xWMlbyhvvU6T9xuoQ2KV4QTdMxc6Ir3sNX6YkQr+gjQjxB0gx
 ST2QF6lZeWFh
 =w2Kz
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core & protocols:

   - Improve drop account scalability on NUMA hosts for RAW and UDP
     sockets and the backlog, almost doubling the Pps capacity under DoS

   - Optimize the UDP RX performance under stress, reducing contention,
     revisiting the binary layout of the involved data structs and
     implementing NUMA-aware locking. This improves UDP RX performance
     by an additional 50%, even more under extreme conditions

   - Add support for PSP encryption of TCP connections; this mechanism
     has some similarities with IPsec and TLS, but offers superior HW
     offloads capabilities

   - Ongoing work to support Accurate ECN for TCP. AccECN allows more
     than one congestion notification signal per RTT and is a building
     block for Low Latency, Low Loss, and Scalable Throughput (L4S)

   - Reorganize the TCP socket binary layout for data locality, reducing
     the number of touched cachelines in the fastpath

   - Refactor skb deferral free to better scale on large multi-NUMA
     hosts, this improves TCP and UDP RX performances significantly on
     such HW

   - Increase the default socket memory buffer limits from 256K to 4M to
     better fit modern link speeds

   - Improve handling of setups with a large number of nexthop, making
     dump operating scaling linearly and avoiding unneeded
     synchronize_rcu() on delete

   - Improve bridge handling of VLAN FDB, storing a single entry per
     bridge instead of one entry per port; this makes the dump order of
     magnitude faster on large switches

   - Restore IP ID correctly for encapsulated packets at GSO
     segmentation time, allowing GRO to merge packets in more scenarios

   - Improve netfilter matching performance on large sets

   - Improve MPTCP receive path performance by leveraging recently
     introduced core infrastructure (skb deferral free) and adopting
     recent TCP autotuning changes

   - Allow bridges to redirect to a backup port when the bridge port is
     administratively down

   - Introduce MPTCP 'laminar' endpoint that con be used only once per
     connection and simplify common MPTCP setups

   - Add RCU safety to dst->dev, closing a lot of possible races

   - A significant crypto library API for SCTP, MPTCP and IPv6 SR,
     reducing code duplication

   - Supports pulling data from an skb frag into the linear area of an
     XDP buffer

  Things we sprinkled into general kernel code:

   - Generate netlink documentation from YAML using an integrated YAML
     parser

  Driver API:

   - Support using IPv6 Flow Label in Rx hash computation and RSS queue
     selection

   - Introduce API for fetching the DMA device for a given queue,
     allowing TCP zerocopy RX on more H/W setups

   - Make XDP helpers compatible with unreadable memory, allowing more
     easily building DevMem-enabled drivers with a unified XDP/skbs
     datapath

   - Add a new dedicated ethtool callback enabling drivers to provide
     the number of RX rings directly, improving efficiency and clarity
     in RX ring queries and RSS configuration

   - Introduce a burst period for the health reporter, allowing better
     handling of multiple errors due to the same root cause

   - Support for DPLL phase offset exponential moving average,
     controlling the average smoothing factor

  Device drivers:

   - Add a new Huawei driver for 3rd gen NIC (hinic3)

   - Add a new SpacemiT driver for K1 ethernet MAC

   - Add a generic abstraction for shared memory communication
     devices (dibps)

   - Ethernet high-speed NICs:
      - nVidia/Mellanox:
         - Use multiple per-queue doorbell, to avoid MMIO contention
           issues
         - support adjacent functions, allowing them to delegate their
           SR-IOV VFs to sibling PFs
         - support RSS for IPSec offload
         - support exposing raw cycle counters in PTP and mlx5
         - support for disabling host PFs.
      - Intel (100G, ice, idpf):
         - ice: support for SRIOV VFs over an Active-Active link
           aggregate
         - ice: support for firmware logging via debugfs
         - ice: support for Earliest TxTime First (ETF) hardware offload
         - idpf: support basic XDP functionalities and XSk
      - Broadcom (bnxt):
         - support Hyper-V VF ID
         - dynamic SRIOV resource allocations for RoCE
      - Meta (fbnic):
         - support queue API, zero-copy Rx and Tx
         - support basic XDP functionalities
         - devlink health support for FW crashes and OTP mem corruptions
         - expand hardware stats coverage to FEC, PHY, and Pause
      - Wangxun:
         - support ethtool coalesce options
         - support for multiple RSS contexts

   - Ethernet virtual:
      - Macsec:
         - replace custom netlink attribute checks with policy-level
           checks
      - Bonding:
         - support aggregator selection based on port priority
      - Microsoft vNIC:
         - use page pool fragments for RX buffers instead of full pages
           to improve memory efficiency

   - Ethernet NICs consumer, and embedded:
      - Qualcomm: support Ethernet function for IPQ9574 SoC
      - Airoha: implement wlan offloading via NPU
      - Freescale
         - enetc: add NETC timer PTP driver and add PTP support
         - fec: enable the Jumbo frame support for i.MX8QM
      - Renesas (R-Car S4):
         - support HW offloading for layer 2 switching
         - support for RZ/{T2H, N2H} SoCs
      - Cadence (macb): support TAPRIO traffic scheduling
      - TI:
         - support for Gigabit ICSS ethernet SoC (icssm-prueth)
      - Synopsys (stmmac): a lot of cleanups

   - Ethernet PHYs:
      - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
        driver
      - Support bcm63268 GPHY power control
      - Support for Micrel lan8842 PHY and PTP
      - Support for Aquantia AQR412 and AQR115

   - CAN:
      - a large CAN-XL preparation work
      - reorganize raw_sock and uniqframe struct to minimize memory
        usage
      - rcar_canfd: update the CAN-FD handling

   - WiFi:
      - extended Neighbor Awareness Networking (NAN) support
      - S1G channel representation cleanup
      - improve S1G support

   - WiFi drivers:
      - Intel (iwlwifi):
         - major refactor and cleanup
      - Broadcom (brcm80211):
         - support for AP isolation
      - RealTek (rtw88/89) rtw88/89:
         - preparation work for RTL8922DE support
      - MediaTek (mt76):
         - HW restart improvements
         - MLO support
      - Qualcomm/Atheros (ath10k):
         - GTK rekey fixes

   - Bluetooth drivers:
      - btusb: support for several new IDs for MT7925
      - btintel: support for BlazarIW core
      - btintel_pcie: support for _suspend() / _resume()
      - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs"

* tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits)
  net: stmmac: Add support for Allwinner A523 GMAC200
  dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible
  Revert "Documentation: net: add flow control guide and document ethtool API"
  octeontx2-pf: fix bitmap leak
  octeontx2-vf: fix bitmap leak
  net/mlx5e: Use extack in set rxfh callback
  net/mlx5e: Introduce mlx5e_rss_params for RSS configuration
  net/mlx5e: Introduce mlx5e_rss_init_params
  net/mlx5e: Remove unused mdev param from RSS indir init
  net/mlx5: Improve QoS error messages with actual depth values
  net/mlx5e: Prevent entering switchdev mode with inconsistent netns
  net/mlx5: HWS, Generalize complex matchers
  net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  selftests/net: add tcp_port_share to .gitignore
  Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set"
  net: add NUMA awareness to skb_attempt_defer_free()
  net: use llist for sd->defer_list
  net: make softnet_data.defer_count an atomic
  selftests: drv-net: psp: add tests for destroying devices
  selftests: drv-net: psp: add test for auto-adjusting TCP MSS
  ...
2025-10-02 15:17:01 -07:00
Eduard Zingerman
2ce61c63e7 selftests/bpf: Add tests for rejection of ALU ops with negative offsets
Define a simple program using MOD, DIV, ADD instructions,
make sure that the program is rejected if invalid offset
field is used for instruction.

These are test cases for
commit 55c0ced59f ("bpf: Reject negative offsets for ALU ops")
Link: https://lore.kernel.org/all/tencent_70D024BAE70A0A309A4781694C7B764B0608@qq.com/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 13:34:08 -07:00
Eric Biggers
f09f57c746 selftests/bpf: Add test for libbpf_sha256()
Test that libbpf_sha256() calculates SHA-256 digests correctly.

Tested with:
    make -C tools/testing/selftests/bpf/
    ./tools/testing/selftests/bpf/test_progs -t sha256 -v

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 16:00:25 -07:00
KaFai Wan
8709c16852 selftests/bpf: Add test for BPF_NEG alu on CONST_PTR_TO_MAP
Add a test case for BPF_NEG operation on CONST_PTR_TO_MAP. Tests if
BPF_NEG operation on map_ptr is rejected in unprivileged mode and is a
scalar value and do not trigger Oops in privileged mode.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251001191739.2323644-3-listout@listout.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 13:55:00 -07:00
Jiri Olsa
0c342bfc99 selftests/bpf: Fix realloc size in bpf_get_addrs
We will segfault once we call realloc in bpf_get_addrs due to
wrong size argument.

Fixes: 6302bdeb91 ("selftests/bpf: Add a kprobe_multi subtest to use addrs instead of syms")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 13:37:30 -07:00
Jiri Olsa
4b2b38ea20 selftests/bpf: Fix typo in subtest_basic_usdt after merge conflict
Use proper 'called' variable name.

Fixes: ae28ed4578 ("Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/aN0JVRynHxqKy4lw@krava/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 13:37:30 -07:00
Jiri Olsa
f83fcec784 selftests/bpf: Fix open-coded gettid syscall in uprobe syscall tests
Commit 0e2fb011a0 ("selftests/bpf: Clean up open-coded gettid syscall
invocations") addressed the issue that older libc may not have a gettid()
function call wrapper for the associated syscall.

The uprobe syscall tests got in from tip tree, using sys_gettid in there.

Fixes: 0e2fb011a0 ("selftests/bpf: Clean up open-coded gettid syscall invocations")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-01 13:37:29 -07:00
Linus Torvalds
ae28ed4578 bpf-next-6.18
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjZH40ACgkQ6rmadz2v
 bTrG7w//X/5CyDoKIYJCqynYRdMtfqYuCe8Jhud4p5++iBVqkDyS6Y8EFLqZVyg/
 UHTqaSE4Nz8/pma0WSjhUYn6Chs1AeH+Rw/g109SovE/YGkek2KNwY3o2hDrtPMX
 +oD0my8qF2HLKgEyteXXyZ5Ju+AaF92JFiGko4/wNTX8O99F9nyz2pTkrctS9Vl9
 VwuTxrEXpmhqrhP3WCxkfNfcbs9HP+AALpgOXZKdMI6T4KI0N1gnJ0ZWJbiXZ8oT
 tug0MTPkNRidYMl0wHY2LZ6ZG8Q3a7Sgc+M0xFzaHGvGlJbBg1HjsDMtT6j34CrG
 TIVJ/O8F6EJzAnQ5Hio0FJk8IIgMRgvng5Kd5GXidU+mE6zokTyHIHOXitYkBQNH
 Hk+lGA7+E2cYqUqKvB5PFoyo+jlucuIH7YwrQlyGfqz+98n65xCgZKcmdVXr0hdB
 9v3WmwJFtVIoPErUvBC3KRANQYhFk4eVk1eiGV/20+eIVyUuNbX6wqSWSA9uEXLy
 n5fm/vlk4RjZmrPZHxcJ0dsl9LTF1VvQQHkgoC1Sz/Cc+jA6k4I+ECVHAqEbk36p
 1TUF52yPOD2ViaJKkj+962JaaaXlUn6+Dq7f1GMP6VuyHjz4gsI3mOo4XarqNdWd
 c7TnYmlGO/cGwqd4DdbmWiF1DDsrBcBzdbC8+FgffxQHLPXGzUg=
 =LeQi
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc
   (Amery Hung)

   Applied as a stable branch in bpf-next and net-next trees.

 - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki)

   Also a stable branch in bpf-next and net-next trees.

 - Enforce expected_attach_type for tailcall compatibility (Daniel
   Borkmann)

 - Replace path-sensitive with path-insensitive live stack analysis in
   the verifier (Eduard Zingerman)

   This is a significant change in the verification logic. More details,
   motivation, long term plans are in the cover letter/merge commit.

 - Support signed BPF programs (KP Singh)

   This is another major feature that took years to materialize.

   Algorithm details are in the cover letter/marge commit

 - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich)

 - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan)

 - Fix USDT SIB argument handling in libbpf (Jiawei Zhao)

 - Allow uprobe-bpf program to change context registers (Jiri Olsa)

 - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and
   Puranjay Mohan)

 - Allow access to union arguments in tracing programs (Leon Hwang)

 - Optimize rcu_read_lock() + migrate_disable() combination where it's
   used in BPF subsystem (Menglong Dong)

 - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred
   execution of BPF callback in the context of a specific task using the
   kernel’s task_work infrastructure (Mykyta Yatsenko)

 - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya
   Dwivedi)

 - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi)

 - Improve the precision of tnum multiplier verifier operation
   (Nandakumar Edamana)

 - Use tnums to improve is_branch_taken() logic (Paul Chaignon)

 - Add support for atomic operations in arena in riscv JIT (Pu Lehui)

 - Report arena faults to BPF error stream (Puranjay Mohan)

 - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin
   Monnet)

 - Add bpf_strcasecmp() kfunc (Rong Tao)

 - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao
   Chen)

* tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits)
  libbpf: Replace AF_ALG with open coded SHA-256
  selftests/bpf: Add stress test for rqspinlock in NMI
  selftests/bpf: Add test case for different expected_attach_type
  bpf: Enforce expected_attach_type for tailcall compatibility
  bpftool: Remove duplicate string.h header
  bpf: Remove duplicate crypto/sha2.h header
  libbpf: Fix error when st-prefix_ops and ops from differ btf
  selftests/bpf: Test changing packet data from kfunc
  selftests/bpf: Add stacktrace map lookup_and_delete_elem test case
  selftests/bpf: Refactor stacktrace_map case with skeleton
  bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE
  selftests/bpf: Fix flaky bpf_cookie selftest
  selftests/bpf: Test changing packet data from global functions with a kfunc
  bpf: Emit struct bpf_xdp_sock type in vmlinux BTF
  selftests/bpf: Task_work selftest cleanup fixes
  MAINTAINERS: Delete inactive maintainers from AF_XDP
  bpf: Mark kfuncs as __noclone
  selftests/bpf: Add kprobe multi write ctx attach test
  selftests/bpf: Add kprobe write ctx attach test
  selftests/bpf: Add uprobe context ip register change test
  ...
2025-09-30 17:58:11 -07:00
Linus Torvalds
e4dcbdff11 Performance events updates for v6.18:
Core perf code updates:
 
  - Convert mmap() related reference counts to refcount_t. This
    is in reaction to the recently fixed refcount bugs, which
    could have been detected earlier and could have mitigated
    the bug somewhat. (Thomas Gleixner, Peter Zijlstra)
 
  - Clean up and simplify the callchain code, in preparation
    for sframes. (Steven Rostedt, Josh Poimboeuf)
 
 Uprobes updates:
 
  - Add support to optimize usdt probes on x86-64, which
    gives a substantial speedup. (Jiri Olsa)
 
  - Cleanups and fixes on x86 (Peter Zijlstra)
 
 PMU driver updates:
 
  - Various optimizations and fixes to the Intel PMU driver
    (Dapeng Mi)
 
 Misc cleanups and fixes:
 
  - Remove redundant __GFP_NOWARN (Qianfeng Rong)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmjWpGIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iHvxAAvO8qWbbhUdF3EZaFU0Wx6oh5KBhImU49
 VZ107xe9llA0Szy3hIl1YpdOQA2NAHtma6We/ebonrPVTTkcSCGq8absc+GahA3I
 CHIomx2hjD0OQ01aHvTqgHJUdFUQQ0yzE3+FY6Tsn05JsNZvDmqpAMIoMQT0LuuG
 7VvVRLBuDXtuMtNmGaGCvfDGKTZkGGxD6iZS1iWHuixvVAz4IECK0vYqSyh31UGA
 w9Jwa0thwjKm2EZTmcSKaHSM2zw3N8QXJ3SNPPThuMrtO6QDz2+3Da9kO+vhGcRP
 Jls9KnWC2wxNxqIs3dr80Mzn4qMplc67Ekx2tUqX4tYEGGtJQxW6tm3JOKKIgFMI
 g/KF9/WJPXp0rVI9mtoQkgndzyswR/ZJBAwfEQu+nAqlp3gmmQR9+MeYPCyNnyhB
 2g22PTMbXkihJmRPAVeH+WhwFy1YY3nsRhh61ha3/N0ULXTHUh0E+hWwUVMifYSV
 SwXqQx4srlo6RJJNTji1d6R3muNjXCQNEsJ0lCOX6ajVoxWZsPH2x7/W1A8LKmY+
 FLYQUi6X9ogQbOO3WxCjUhzp5nMTNA2vvo87MUzDlZOCLPqYZmqcjntHuXwdjPyO
 lPcfTzc2nK1Ud26bG3+p2Bk3fjqkX9XcTMFniOvjKfffEfwpAq4xRPBQ3uRlzn0V
 pf9067JYF+c=
 =sVXH
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull performance events updates from Ingo Molnar:
 "Core perf code updates:

   - Convert mmap() related reference counts to refcount_t. This is in
     reaction to the recently fixed refcount bugs, which could have been
     detected earlier and could have mitigated the bug somewhat (Thomas
     Gleixner, Peter Zijlstra)

   - Clean up and simplify the callchain code, in preparation for
     sframes (Steven Rostedt, Josh Poimboeuf)

  Uprobes updates:

   - Add support to optimize usdt probes on x86-64, which gives a
     substantial speedup (Jiri Olsa)

   - Cleanups and fixes on x86 (Peter Zijlstra)

  PMU driver updates:

   - Various optimizations and fixes to the Intel PMU driver (Dapeng Mi)

  Misc cleanups and fixes:

   - Remove redundant __GFP_NOWARN (Qianfeng Rong)"

* tag 'perf-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  selftests/bpf: Fix uprobe_sigill test for uprobe syscall error value
  uprobes/x86: Return error from uprobe syscall when not called from trampoline
  perf: Skip user unwind if the task is a kernel thread
  perf: Simplify get_perf_callchain() user logic
  perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL
  perf: Have get_perf_callchain() return NULL if crosstask and user are set
  perf: Remove get_perf_callchain() init_nr argument
  perf/x86: Print PMU counters bitmap in x86_pmu_show_pmu_cap()
  perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK
  perf/x86/intel: Change macro GLOBAL_CTRL_EN_PERF_METRICS to BIT_ULL(48)
  perf/x86: Add PERF_CAP_PEBS_TIMING_INFO flag
  perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
  perf/x86/intel: Use early_initcall() to hook bts_init()
  uprobes: Remove redundant __GFP_NOWARN
  selftests/seccomp: validate uprobe syscall passes through seccomp
  seccomp: passthrough uprobe systemcall without filtering
  selftests/bpf: Fix uprobe syscall shadow stack test
  selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
  selftests/bpf: Add uprobe_regs_equal test
  selftests/bpf: Add optimized usdt variant for basic usdt test
  ...
2025-09-30 11:11:21 -07:00
Kumar Kartikeya Dwivedi
15cf39221e selftests/bpf: Add stress test for rqspinlock in NMI
Introduce a kernel module that will exercise lock acquisition in the NMI
path, and bias toward creating contention such that NMI waiters end up
being non-head waiters. Prior to the rqspinlock fix made in the commit
0d80e7f951 ("rqspinlock: Choose trylock fallback for NMI waiters"), it
was possible for the queueing path of non-head waiters to get stuck in
NMI, which this stress test reproduces fairly easily with just 3 CPUs.

Both AA and ABBA flavors are supported, and it will serve as a test case
for future fixes that address this corner case. More information about
the problem in question is available in the commit cited above. When the
fix is reverted, this stress test will lock up the system.

To enable this test automatically through the test_progs infrastructure,
add a load_module_params API to exercise both AA and ABBA cases when
running the test.

Note that the test runs for at most 5 seconds, and becomes a noop after
that, in order to allow the system to make forward progress. In
addition, CPU 0 is always kept untouched by the created threads and
NMIs. The test will automatically scale to the number of available
online CPUs.

Note that at least 3 CPUs are necessary to run this test, hence skip the
selftest in case the environment has less than 3 CPUs available.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250927205304.199760-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-28 03:18:40 -07:00
Daniel Borkmann
0e8e60e86c selftests/bpf: Add test case for different expected_attach_type
Add a small test case which adds two programs - one calling the other
through a tailcall - and check that BPF rejects them in case of different
expected_attach_type values:

  # ./vmtest.sh -- ./test_progs -t xdp_devmap
  [...]
  #641/1   xdp_devmap_attach/DEVMAP with programs in entries:OK
  #641/2   xdp_devmap_attach/DEVMAP with frags programs in entries:OK
  #641/3   xdp_devmap_attach/Verifier check of DEVMAP programs:OK
  #641/4   xdp_devmap_attach/DEVMAP with programs in entries on veth:OK
  #641     xdp_devmap_attach:OK
  Summary: 2/4 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250926171201.188490-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-27 06:24:27 -07:00
Matthieu Baerts (NGI0)
c5273f6ca1 mptcp: pm: rename 'subflows' to 'extra_subflows'
A few variables linked to the Path-Managers are confusing, and it would
help current and future developers, to clarify them.

One of them is 'subflows', which in fact represents the number of extra
subflows: all the additional subflows created after the initial one, and
not the total number of subflows.

While at it, add an additional name for the corresponding variable in
MPTCP INFO: mptcpi_extra_subflows. Not to break the current uAPI, the
new name is added as a 'define' pointing to the former name. This will
then also help userspace devs.

No functional changes intended.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-5-ad126cc47c6b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-26 17:44:04 -07:00
Amery Hung
991e555eff selftests/bpf: Test changing packet data from kfunc
bpf_xdp_pull_data() is the first kfunc that changes packet data. Make
sure the verifier clear all packet pointers after calling packet data
changing kfunc.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250926164142.1850176-1-ameryhung@gmail.com
2025-09-26 10:44:51 -07:00
Tao Chen
d43029ff7d selftests/bpf: Add stacktrace map lookup_and_delete_elem test case
Add tests for stacktrace map lookup and delete:
1. use bpf_map_lookup_and_delete_elem to lookup and delete the target
   stack_id,
2. lookup the deleted stack_id again to double check.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250925175030.1615837-3-chen.dylane@linux.dev
2025-09-25 16:17:30 -07:00
Tao Chen
363b17e273 selftests/bpf: Refactor stacktrace_map case with skeleton
The loading method of the stacktrace_map test case looks too outdated,
refactor it with skeleton, and we can use global variable feature in
the next patch.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250925175030.1615837-2-chen.dylane@linux.dev
2025-09-25 16:17:14 -07:00
Mykyta Yatsenko
105eb5dc74 selftests/bpf: Fix flaky bpf_cookie selftest
bpf_cookie can fail on perf_event_open(), when it runs after the task_work
selftest. The task_work test causes perf to lower
sysctl_perf_event_sample_rate, and bpf_cookie uses sample_freq,
which is validated against that sysctl. As a result,
perf_event_open() rejects the attr if the (now tighter) limit is
exceeded.

>From perf_event_open():
if (attr.freq) {
	if (attr.sample_freq > sysctl_perf_event_sample_rate)
		return -EINVAL;
} else {
	if (attr.sample_period & (1ULL << 63))
		return -EINVAL;
}

Switch bpf_cookie to use sample_period, which is not checked against
sysctl_perf_event_sample_rate.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250925215230.265501-1-mykyta.yatsenko5@gmail.com
2025-09-25 15:55:44 -07:00
Amery Hung
1193c46c17 selftests/bpf: Test changing packet data from global functions with a kfunc
The verifier should invalidate all packet pointers after a packet data
changing kfunc is called. So, similar to commit 3f23ee5590
("selftests/bpf: test for changing packet data from global functions"),
test changing packet data from global functions to make sure packet
pointers are indeed invalidated.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250925170013.1752561-2-ameryhung@gmail.com
2025-09-25 14:52:09 -07:00
Mykyta Yatsenko
5730dacb3f selftests/bpf: Task_work selftest cleanup fixes
task_work selftest does not properly handle cleanup during failures:
 * destroy bpf_link
 * perf event fd is passed to bpf_link, no need to close it if link was
 created successfully
 * goto cleanup if fork() failed, close pipe.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250924142954.129519-2-mykyta.yatsenko5@gmail.com
2025-09-25 11:00:01 -07:00
Jakub Kicinski
5e3fee34f6 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaNNwBQAKCRAraS+Z+3y5
 E8heAQDdJTR9rwAL7gD79cldlHP5PTmjyidLIoFG/efaGSbN1AD9EdvrykDU4xOG
 aGaO8TooGUZf7vAL8tIFuMeydYvi/gM=
 =Qu4T
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-09-23

We've added 9 non-merge commits during the last 33 day(s) which contain
a total of 10 files changed, 480 insertions(+), 53 deletions(-).

The main changes are:

1) A new bpf_xdp_pull_data kfunc that supports pulling data from
   a frag into the linear area of a xdp_buff, from Amery Hung.

   This includes changes in the xdp_native.bpf.c selftest, which
   Nimrod's future work depends on.

   It is a merge from a stable branch 'xdp_pull_data' which has
   also been merged to bpf-next.

   There is a conflict with recent changes in 'include/net/xdp.h'
   in the net-next tree that will need to be resolved.

2) A compiler warning fix when CONFIG_NET=n in the recent dynptr
   skb_meta support, from Jakub Sitnicki.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests: drv-net: Pull data before parsing headers
  selftests/bpf: Test bpf_xdp_pull_data
  bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN
  bpf: Make variables in bpf_prog_test_run_xdp less confusing
  bpf: Clear packet pointers after changing packet data in kfuncs
  bpf: Support pulling non-linear xdp data
  bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail
  bpf: Clear pfmemalloc flag when freeing all fragments
  bpf: Return an error pointer for skb metadata when CONFIG_NET=n
====================

Link: https://patch.msgid.link/20250924050303.2466356-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24 10:22:37 -07:00
Jiri Olsa
3d237467a4 selftests/bpf: Add kprobe multi write ctx attach test
Adding test to check we can't attach kprobe multi program
that writes to the context.

It's x86_64 specific test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-24 02:25:06 -07:00
Jiri Olsa
1b881ee294 selftests/bpf: Add kprobe write ctx attach test
Adding test to check we can't attach standard kprobe program that
writes to the context.

It's x86_64 specific test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-24 02:25:06 -07:00
Jiri Olsa
6a4ea0d1cb selftests/bpf: Add uprobe context ip register change test
Adding test to check we can change the application execution
through instruction pointer change through uprobe program.

It's x86_64 specific test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-24 02:25:06 -07:00
Jiri Olsa
7f8a05c5d3 selftests/bpf: Add uprobe context registers changes test
Adding test to check we can change common register values through
uprobe program.

It's x86_64 specific test.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250916215301.664963-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-24 02:25:06 -07:00
Martin KaFai Lau
34f033a6c9 Merge branch 'bpf-next/xdp_pull_data' into 'bpf-next/master'
Merge the xdp_pull_data stable branch into the master branch. No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-09-23 16:23:58 -07:00
Amery Hung
323302f54d selftests/bpf: Test bpf_xdp_pull_data
Test bpf_xdp_pull_data() with xdp packets with different layouts. The
xdp bpf program first checks if the layout is as expected. Then, it
calls bpf_xdp_pull_data(). Finally, it checks the 0xbb marker at offset
1024 using directly packet access.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250922233356.3356453-8-ameryhung@gmail.com
2025-09-23 13:35:12 -07:00
Amery Hung
fe9544ed1a bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN
To test bpf_xdp_pull_data(), an xdp packet containing fragments as well
as free linear data area after xdp->data_end needs to be created.
However, bpf_prog_test_run_xdp() always fills the linear area with
data_in before creating fragments, leaving no space to pull data. This
patch will allow users to specify the linear data size through
ctx->data_end.

Currently, ctx_in->data_end must match data_size_in and will not be the
final ctx->data_end seen by xdp programs. This is because ctx->data_end
is populated according to the xdp_buff passed to test_run. The linear
data area available in an xdp_buff, max_linear_sz, is alawys filled up
before copying data_in into fragments.

This patch will allow users to specify the size of data that goes into
the linear area. When ctx_in->data_end is different from data_size_in,
only ctx_in->data_end bytes of data will be put into the linear area when
creating the xdp_buff.

While ctx_in->data_end will be allowed to be different from data_size_in,
it cannot be larger than the data_size_in as there will be no data to
copy from user space. If it is larger than the maximum linear data area
size, the layout suggested by the user will not be honored. Data beyond
max_linear_sz bytes will still be copied into fragments.

Finally, since it is possible for a NIC to produce a xdp_buff with empty
linear data area, allow it when calling bpf_test_init() from
bpf_prog_test_run_xdp() so that we can test XDP kfuncs with such
xdp_buff. This is done by moving lower-bound check to callers as most of
them already do except bpf_prog_test_run_skb(). The change also fixes a
bug that allows passing an xdp_buff with data < ETH_HLEN. This can
happen when ctx is used and metadata is at least ETH_HLEN.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250922233356.3356453-7-ameryhung@gmail.com
2025-09-23 13:35:12 -07:00
Leon Hwang
1c6686bf7f selftests/bpf: Add union argument tests using fexit programs
Add test coverage for union argument support using fexit programs:

* 8B union argument - verify that the verifier accepts it and that fexit
  programs can trace such functions.
* 16B union argument - verify that the verifier accepts it and that
  fexit programs can access the argument, which is passed using two
  registers.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250919044110.23729-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 12:07:47 -07:00
Puranjay Mohan
f616549124 selftests: bpf: Add tests for signed loads from arena
Add tests for loading 8, 16, and 32 bits with sign extension from arena,
also verify that exception handling is working correctly and correct
assembly is being generated by the x86 and arm64 JITs.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20250923110157.18326-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 12:00:23 -07:00
Mykyta Yatsenko
c6ae18e0af selftests/bpf: add bpf task work stress tests
Add stress tests for BPF task-work scheduling kfuncs. The tests spawn
multiple threads that concurrently schedule task_work callbacks against
the same and different map values to exercise the kfuncs under high
contention.
Verify callbacks are reliably enqueued and executed with no drops.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20250923112404.668720-10-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 07:34:39 -07:00
Mykyta Yatsenko
39fd74dfd5 selftests/bpf: BPF task work scheduling tests
Introducing selftests that check BPF task work scheduling mechanism.
Validate that verifier does not accepts incorrect calls to
bpf_task_work_schedule kfunc.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250923112404.668720-9-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-23 07:34:39 -07:00
KP Singh
b720903e2b selftests/bpf: Enable signature verification for some lskel tests
The test harness uses the verify_sig_setup.sh to generate the required
key material for program signing.

Generate key material for signing LSKEL some lskel programs and use
xxd to convert the verification certificate into a C header file.

Finally, update the main test runner to load this
certificate into the session keyring via the add_key() syscall before
executing any tests. Use the session keyring in the tests with signed
programs.

Signed-off-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20250921160120.9711-6-kpsingh@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-22 19:17:55 -07:00
Yonghong Song
5a427fddec selftests/bpf: Fix selftest verifier_arena_large failure
With latest llvm22, I got the following verification failure:

  ...
  ; int big_alloc2(void *ctx) @ verifier_arena_large.c:207
  0: (b4) w6 = 1                        ; R6_w=1
  ...
  ; if (err) @ verifier_arena_large.c:233
  53: (56) if w6 != 0x0 goto pc+62      ; R6=0
  54: (b7) r7 = -4                      ; R7_w=-4
  55: (18) r8 = 0x7f4000000000          ; R8_w=scalar()
  57: (bf) r9 = addr_space_cast(r8, 0, 1)       ; R8_w=scalar() R9_w=arena
  58: (b4) w6 = 5                       ; R6_w=5
  ; pg = page[i]; @ verifier_arena_large.c:238
  59: (bf) r1 = r7                      ; R1_w=-4 R7_w=-4
  60: (07) r1 += 4                      ; R1_w=0
  61: (79) r2 = *(u64 *)(r9 +0)         ; R2_w=scalar() R9_w=arena
  ; if (*pg != i) @ verifier_arena_large.c:239
  62: (bf) r3 = addr_space_cast(r2, 0, 1)       ; R2_w=scalar() R3_w=arena
  63: (71) r3 = *(u8 *)(r3 +0)          ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
  64: (5d) if r1 != r3 goto pc+51       ; R1_w=0 R3_w=0
  ; bpf_arena_free_pages(&arena, (void __arena *)pg, 2); @ verifier_arena_large.c:241
  65: (18) r1 = 0xff11000114548000      ; R1_w=map_ptr(map=arena,ks=0,vs=0)
  67: (b4) w3 = 2                       ; R3_w=2
  68: (85) call bpf_arena_free_pages#72675      ;
  69: (b7) r1 = 0                       ; R1_w=0
  ; page[i + 1] = NULL; @ verifier_arena_large.c:243
  70: (7b) *(u64 *)(r8 +8) = r1
  R8 invalid mem access 'scalar'
  processed 61 insns (limit 1000000) max_states_per_insn 0 total_states 6 peak_states 6 mark_read 2
  =============
  #489/5   verifier_arena_large/big_alloc2:FAIL

The main reason is that 'r8' in insn '70' is not an arena pointer.
Further debugging at llvm side shows that llvm commit ([1]) caused
the failure. For the original code:
  page[i] = NULL;
  page[i + 1] = NULL;
the llvm transformed it to something like below at source level:
  __builtin_memset(&page[i], 0, 16)
Such transformation prevents llvm BPFCheckAndAdjustIR pass from
generating proper addr_space_cast insns ([2]).

Adding support in llvm BPFCheckAndAdjustIR pass should work, but
not sure that such a pattern exists or not in real applications.
At the same time, simply adding a memory barrier between two 'page'
assignment can fix the issue.

  [1] https://github.com/llvm/llvm-project/pull/155415
  [2] https://github.com/llvm/llvm-project/pull/84410

Cc: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250920045805.3288551-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-20 15:49:35 -07:00
Eduard Zingerman
fdcecdff90 selftests/bpf: test cases for callchain sensitive live stack tracking
- simple propagation of read/write marks;
- joining read/write marks from conditional branches;
- avoid must_write marks in when same instruction accesses different
  stack offsets on different execution paths;
- avoid must_write marks in case same instruction accesses stack
  and non-stack pointers on different execution paths;
- read/write marks propagation to outer stack frame;
- independent read marks for different callchains ending with the same
  function;
- bpf_calls_callback() dependent logic in
  liveness.c:bpf_stack_slot_alive().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-12-c3cd27bacc60@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19 09:27:24 -07:00
Eduard Zingerman
34c513be3d selftests/bpf: __not_msg() tag for test_loader framework
This patch adds tags __not_msg(<msg>) and __not_msg_unpriv(<msg>).
Test fails if <msg> is found in verifier log.

If __msg_not() is situated between __msg() tags framework matches
__msg() tags first, and then checks that <msg> is not present in a
portion of a log between bracketing __msg() tags.
__msg_not() tags bracketed by a same __msg() group are effectively
unordered.

The idea is borrowed from LLVM's CheckFile with its CHECK-NOT syntax.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-11-c3cd27bacc60@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19 09:27:23 -07:00
Eduard Zingerman
107e169799 bpf: disable and remove registers chain based liveness
Remove register chain based liveness tracking:
- struct bpf_reg_state->{parent,live} fields are no longer needed;
- REG_LIVE_WRITTEN marks are superseded by bpf_mark_stack_write()
  calls;
- mark_reg_read() calls are superseded by bpf_mark_stack_read();
- log.c:print_liveness() is superseded by logging in liveness.c;
- propagate_liveness() is superseded by bpf_update_live_stack();
- no need to establish register chains in is_state_visited() anymore;
- fix a bunch of tests expecting "_w" suffixes in verifier log
  messages.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-9-c3cd27bacc60@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19 09:27:23 -07:00
KP Singh
ea2e6467ac bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD
Currently only array maps are supported, but the implementation can be
extended for other maps and objects. The hash is memoized only for
exclusive and frozen maps as their content is stable until the exclusive
program modifies the map.

This is required for BPF signing, enabling a trusted loader program to
verify a map's integrity. The loader retrieves
the map's runtime hash from the kernel and compares it against an
expected hash computed at build time.

Signed-off-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20250914215141.15144-7-kpsingh@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18 19:11:42 -07:00
KP Singh
6c850cbca8 selftests/bpf: Add tests for exclusive maps
Check if access is denied to another program for an exclusive map

Signed-off-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20250914215141.15144-6-kpsingh@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18 19:11:42 -07:00
Kumar Kartikeya Dwivedi
8b788d6638 selftests/bpf: Add tests for KF_RCU_PROTECTED
Add a couple of test cases to ensure RCU protection is kicked in
automatically, and the return type is as expected.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250917032755.4068726-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18 15:36:17 -07:00
Kumar Kartikeya Dwivedi
1512231b6c bpf: Enforce RCU protection for KF_RCU_PROTECTED
Currently, KF_RCU_PROTECTED only applies to iterator APIs and that too
in a convoluted fashion: the presence of this flag on the kfunc is used
to set MEM_RCU in iterator type, and the lack of RCU protection results
in an error only later, once next() or destroy() methods are invoked on
the iterator. While there is no bug, this is certainly a bit
unintuitive, and makes the enforcement of the flag iterator specific.

In the interest of making this flag useful for other upcoming kfuncs,
e.g. scx_bpf_cpu_curr() [0][1], add enforcement for invoking the kfunc
in an RCU critical section in general.

This would also mean that iterator APIs using KF_RCU_PROTECTED will
error out earlier, instead of throwing an error for lack of RCU CS
protection when next() or destroy() methods are invoked.

In addition to this, if the kfuncs tagged KF_RCU_PROTECTED return a
pointer value, ensure that this pointer value is only usable in an RCU
critical section. There might be edge cases where the return value is
special and doesn't need to imply MEM_RCU semantics, but in general, the
assumption should hold for the majority of kfuncs, and we can revisit
things if necessary later.

  [0]: https://lore.kernel.org/all/20250903212311.369697-3-christian.loehle@arm.com
  [1]: https://lore.kernel.org/all/20250909195709.92669-1-arighi@nvidia.com

Tested-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250917032755.4068726-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18 15:36:17 -07:00
Eduard Zingerman
a24a2dda70 selftests/bpf: trigger verifier.c:maybe_exit_scc() for a speculative state
This is a test case minimized from a syzbot reproducer from [1].
The test case triggers verifier.c:maybe_exit_scc() w/o
preceding call to verifier.c:maybe_enter_scc() on a speculative
symbolic execution path.

Here is verifier log for the test case:

  Live regs before insn:
        0: .......... (b7) r0 = 100
    1   1: 0......... (7b) *(u64 *)(r10 -512) = r0
    1   2: 0......... (b5) if r0 <= 0x0 goto pc-2
        3: 0......... (95) exit
  0: R1=ctx() R10=fp0
  0: (b7) r0 = 100                      ; R0_w=100
  1: (7b) *(u64 *)(r10 -512) = r0       ; R0_w=100 R10=fp0 fp-512_w=100
  2: (b5) if r0 <= 0x0 goto pc-2
  mark_precise: ...
  2: R0_w=100
  3: (95) exit

  from 2 to 1 (speculative execution): R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100
  1: R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100
  1: (7b) *(u64 *)(r10 -512) = r0
  processed 5 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

- Non-speculative execution path 0-3 does not allocate any checkpoints
  (and hence does not call maybe_enter_scc()), and schedules a
  speculative jump from 2 to 1.
- Speculative execution path stops immediately because of an infinite
  loop detection and triggers verifier.c:update_branch_counts() ->
  maybe_exit_scc() calls.

[1] https://lore.kernel.org/bpf/68c85acd.050a0220.2ff435.03a4.GAE@google.com/

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250916212251.3490455-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-17 11:19:58 -07:00
Paul Chaignon
180a46bc1a selftests/bpf: Test accesses to ctx padding
This patch adds tests covering the various paddings in ctx structures.
In case of sk_lookup BPF programs, the behavior is a bit different
because accesses to the padding are explicitly allowed. Other cases
result in a clear reject from the verifier.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/3dc5f025e350aeb2bb1c257b87c577518e574aeb.1758094761.git.paul.chaignon@gmail.com
2025-09-17 16:15:57 +02:00
Paul Chaignon
7c60f6e488 selftests/bpf: Move macros to bpf_misc.h
Move the sizeof_field and offsetofend macros from individual test files
to the common bpf_misc.h to avoid duplication.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/97a3f3788bd3aec309100bc073a5c77130e371fd.1758094761.git.paul.chaignon@gmail.com
2025-09-17 16:15:37 +02:00
Al Viro
1b8abbb121 bpf...d_path(): constify path argument
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15 21:17:08 -04:00
Alan Maguire
3ae4c52708 selftests/bpf: More open-coded gettid syscall cleanup
Commit 0e2fb011a0 ("selftests/bpf: Clean up open-coded gettid syscall
invocations") addressed the issue that older libc may not have a gettid()
function call wrapper for the associated syscall.

A few more instances have crept into tests, use sys_gettid() instead, and
poison raw gettid() usage to avoid future issues.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250911163056.543071-1-alan.maguire@oracle.com
2025-09-15 20:39:24 +02:00
Kumar Kartikeya Dwivedi
a8250d167c selftests/bpf: Add a test for bpf_cgroup_from_id lookup in non-root cgns
Make sure that we only switch the cgroup namespace and enter a new
cgroup in a child process separate from test_progs, to not mess up the
environment for subsequent tests.

To remove this cgroup, we need to wait for the child to exit, and then
rmdir its cgroup. If the read call fails, or waitpid succeeds, we know
the child exited (read call would fail when the last pipe end is closed,
otherwise waitpid waits until exit(2) is called). We then invoke a newly
introduced remove_cgroup_pid() helper, that identifies cgroup path using
the passed in pid of the now dead child, instead of using the current
process pid (getpid()).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250915032618.1551762-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-15 10:53:15 -07:00
Saket Kumar Bhaskar
a9d4e9f0e8 selftests/bpf: Fix arena_spin_lock selftest failure
For systems having CONFIG_NR_CPUS set to > 1024 in kernel config
the selftest fails as arena_spin_lock_irqsave() returns EOPNOTSUPP.
(eg - incase of powerpc default value for CONFIG_NR_CPUS is 8192)

The selftest is skipped incase bpf program returns EOPNOTSUPP,
with a descriptive message logged.

Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Link: https://lore.kernel.org/r/20250913091337.1841916-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-15 10:49:18 -07:00
Leon Hwang
f7528e4412 selftests/bpf: Skip timer_interrupt case when bpf_timer is not supported
Like commit fbdd61c94b ("selftests/bpf: Skip timer cases when bpf_timer is not supported"),
'timer_interrupt' test case should be skipped if verifier rejects
bpf_timer with returning -EOPNOTSUPP.

cd tools/testing/selftests/bpf
./test_progs -t timer
461     timer_interrupt:SKIP
Summary: 6/0 PASSED, 7 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250915121657.28084-1-leon.hwang@linux.dev
2025-09-15 10:22:58 -07:00
Jiri Olsa
6d48436560 selftests/bpf: Fix uprobe_sigill test for uprobe syscall error value
The uprobe syscall now returns -ENXIO errno when called outside
kernel trampoline, fixing the current sigill test to reflect that
and renaming it to uprobe_error.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
2025-09-15 13:46:29 +02:00
Jakub Kicinski
fc3a281041 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc6).

Conflicts:

net/netfilter/nft_set_pipapo.c
net/netfilter/nft_set_pipapo_avx2.c
  c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
  84c1da7b38 ("netfilter: nft_set_pipapo: use avx2 algorithm for insertions too")

Only trivial adjacent changes (in a doc and a Makefile).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-11 17:40:13 -07:00
Puranjay Mohan
86f2225065 selftests/bpf: Add tests for arena fault reporting
Add selftests for testing the reporting of arena page faults through BPF
streams. Two new bpf programs are added that read and write to an
unmapped arena address and the fault reporting is verified in the
userspace through streams.

The added bpf programs need to access the user_vm_start in struct
bpf_arena, this is done by casting &arena to struct bpf_arena *, but
barrier_var() is used on this ptr before accessing ptr->user_vm_start;
to stop GCC from issuing an out-of-bound access due to the cast from
smaller map struct to larger "struct bpf_arena"

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250911145808.58042-7-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-11 13:00:44 -07:00
Puranjay Mohan
edd03fcd76 selftests: bpf: use __stderr in stream error tests
Start using __stderr directly in the bpf programs to test the reporting
of may_goto timeout detection and spin_lock dead lock detection.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250911145808.58042-6-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-11 13:00:44 -07:00
Puranjay Mohan
744eeb2b27 selftests: bpf: introduce __stderr and __stdout
Add __stderr and __stdout to validate the output of BPF streams for bpf
selftests. Similar to __xlated, __jited, etc., __stderr/out can be used
in the BPF progs to compare a string (regex supported) to the output in
the bpf streams.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250911145808.58042-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-11 13:00:44 -07:00
Alexei Starovoitov
5d87e96a49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-11 09:34:37 -07:00
Leon Hwang
fbdd61c94b selftests/bpf: Skip timer cases when bpf_timer is not supported
When enable CONFIG_PREEMPT_RT, verifier will reject bpf_timer with
returning -EOPNOTSUPP.

Therefore, skip test cases when errno is EOPNOTSUPP.

cd tools/testing/selftests/bpf
./test_progs -t timer
125     free_timer:SKIP
456     timer:SKIP
457/1   timer_crash/array:SKIP
457/2   timer_crash/hash:SKIP
457     timer_crash:SKIP
458     timer_lockup:SKIP
459     timer_mim:SKIP
Summary: 5/0 PASSED, 6 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250910125740.52172-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-10 12:34:09 -07:00
Rong Tao
6624fb2f33 selftests/bpf: Add tests for bpf_strnstr
Add tests for bpf_strnstr():

    bpf_strnstr("", "", 0) = 0
    bpf_strnstr("hello world", "hello", 5) = 0
    bpf_strnstr(str, "hello", 4) = -ENOENT
    bpf_strnstr("", "a", 0) = -ENOENT

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_2ED218F8082565C95D86A804BDDA8DBA200A@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-09 15:07:58 -07:00
Ilya Leoshkevich
5d40c038c8 selftests/bpf: Fix "expression result unused" warnings with icecc
icecc is a compiler wrapper that distributes compile jobs over a build
farm [1]. It works by sending toolchain binaries and preprocessed
source code to remote machines.

Unfortunately using it with BPF selftests causes build failures due to
a clang bug [2]. The problem is that clang suppresses the
-Wunused-value warning if the unused expression comes from a macro
expansion. Since icecc compiles preprocessed source code, this
information is not available. This leads to -Wunused-value false
positives.

obj_new_no_struct() and obj_new_acq() use the bpf_obj_new() macro and
discard the result. arena_spin_lock_slowpath() uses two macros that
produce values and ignores the results. Add (void) casts to explicitly
indicate that this is intentional and suppress the warning.

An alternative solution is to change the macros to not produce values.
This would work today for the arena_spin_lock_slowpath() issue, but in
the future there may appear users who need them. Another potential
solution is to replace these macros with functions. Unfortunately this
would not work, because these macros work with unknown types and
control flow.

[1] https://github.com/icecc/icecream
[2] https://github.com/llvm/llvm-project/issues/142614

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250829030017.102615-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-09 15:07:58 -07:00
Daniel Borkmann
3aa9b9a165 selftests/bpf: Extend crypto_sanity selftest with invalid dst buffer
Small cleanup and test extension to probe the bpf_crypto_{encrypt,decrypt}()
kfunc when a bad dst buffer is passed in to assert that an error is returned.

Also, encrypt_sanity() and skb_crypto_setup() were explicit to set the global
status variable to zero before any test, so do the same for decrypt_sanity().
Do not explicitly zero the on-stack err before bpf_crypto_ctx_create() given
the kfunc is expected to do it internally for the success case.

Before kernel fix:

  # ./vmtest.sh -- ./test_progs -t crypto
  [...]
  [    1.531200] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.533388] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #87/1    crypto_basic/crypto_release:OK
  #87/2    crypto_basic/crypto_acquire:OK
  #87      crypto_basic:OK
  test_crypto_sanity:PASS:skel open 0 nsec
  test_crypto_sanity:PASS:ip netns add crypto_sanity_ns 0 nsec
  test_crypto_sanity:PASS:ip -net crypto_sanity_ns -6 addr add face::1/128 dev lo nodad 0 nsec
  test_crypto_sanity:PASS:ip -net crypto_sanity_ns link set dev lo up 0 nsec
  test_crypto_sanity:PASS:open_netns 0 nsec
  test_crypto_sanity:PASS:AF_ALG init fail 0 nsec
  test_crypto_sanity:PASS:if_nametoindex lo 0 nsec
  test_crypto_sanity:PASS:skb_crypto_setup fd 0 nsec
  test_crypto_sanity:PASS:skb_crypto_setup 0 nsec
  test_crypto_sanity:PASS:skb_crypto_setup retval 0 nsec
  test_crypto_sanity:PASS:skb_crypto_setup status 0 nsec
  test_crypto_sanity:PASS:create qdisc hook 0 nsec
  test_crypto_sanity:PASS:make_sockaddr 0 nsec
  test_crypto_sanity:PASS:attach encrypt filter 0 nsec
  test_crypto_sanity:PASS:encrypt socket 0 nsec
  test_crypto_sanity:PASS:encrypt send 0 nsec
  test_crypto_sanity:FAIL:encrypt status unexpected error: -5 (errno 95)
  #88      crypto_sanity:FAIL
  Summary: 1/2 PASSED, 0 SKIPPED, 1 FAILED

After kernel fix:

  # ./vmtest.sh -- ./test_progs -t crypto
  [...]
  [    1.540963] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.542404] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #87/1    crypto_basic/crypto_release:OK
  #87/2    crypto_basic/crypto_acquire:OK
  #87      crypto_basic:OK
  #88      crypto_sanity:OK
  Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20250829143657.318524-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-09 15:07:57 -07:00
Jiayuan Chen
f85981327a selftests/bpf: Fix incorrect array size calculation
The loop in bench_sockmap_prog_destroy() has two issues:

1. Using 'sizeof(ctx.fds)' as the loop bound results in the number of
   bytes, not the number of file descriptors, causing the loop to iterate
   far more times than intended.

2. The condition 'ctx.fds[0] > 0' incorrectly checks only the first fd for
   all iterations, potentially leaving file descriptors unclosed. Change
   it to 'ctx.fds[i] > 0' to check each fd properly.

These fixes ensure correct cleanup of all file descriptors when the
benchmark exits.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250909124721.191555-1-jiayuan.chen@linux.dev

Closes: https://lore.kernel.org/bpf/aLqfWuRR9R_KTe5e@stanley.mountain/
2025-09-09 09:23:47 -07:00
Feng Yang
93a83d0443 selftests/bpf: Fix the issue where the error code is 0
The error message printed here only uses the previous err value,
which results in it being printed as 0.
When bpf_map__attach_struct_ops encounters an error,
it uses libbpf_err_ptr(err) to set errno = -err and returns NULL.
Therefore, Using -errno can fix this issue.

Fix before:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=0

Fix after:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=-9

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250908060810.1054341-1-yangfeng59949@163.com
2025-09-08 09:51:13 -07:00
Mykyta Yatsenko
e12873ee85 selftests/bpf: Add BPF program dump in veristat
Add the ability to dump BPF program instructions directly from veristat.
Previously, inspecting a program required separate bpftool invocations:
one to load and another to dump it, which meant running multiple
commands.
During active development, it's common for developers to use veristat
for testing verification. Integrating instruction dumping into veristat
reduces the need to switch tools and simplifies the workflow.
By making this information more readily accessible, this change aims
to streamline the BPF development cycle and improve usability for
developers.
This implementation leverages bpftool, by running it directly via popen
to avoid any code duplication and keep veristat simple.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250905140835.1416179-1-mykyta.yatsenko5@gmail.com
2025-09-05 12:12:09 -07:00
Leon Hwang
88a3bde432 selftests/bpf: Add case to test bpf_in_interrupt()
Add a timer test case to test 'bpf_in_interrupt()'.

cd tools/testing/selftests/bpf
./test_progs -t timer_interrupt
462     timer_interrupt:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250903140438.59517-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-04 09:05:58 -07:00
Leon Hwang
4b69e31329 selftests/bpf: Introduce experimental bpf_in_interrupt()
Filtering pid_tgid is meanlingless when the current task is preempted by
an interrupt.

To address this, introduce 'bpf_in_interrupt()' helper function, which
allows BPF programs to determine whether they are executing in interrupt
context.

'get_preempt_count()':

* On x86, '*(int *) bpf_this_cpu_ptr(&__preempt_count)'.
* On arm64, 'bpf_get_current_task_btf()->thread_info.preempt.count'.

Then 'bpf_in_interrupt()' will be:

* If !PREEMPT_RT, 'get_preempt_count() & (NMI_MASK | HARDIRQ_MASK
  | SOFTIRQ_MASK)'.
* If PREEMPT_RT, '(get_preempt_count() & (NMI_MASK | HARDIRQ_MASK))
  | (bpf_get_current_task_btf()->softirq_disable_cnt & SOFTIRQ_MASK)'.

As for other archs, it can be added support by updating
'get_preempt_count()'.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250903140438.59517-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-04 09:05:58 -07:00
Rong Tao
abc8a952d4 selftests/bpf: Test kfunc bpf_strcasecmp
Add testsuites for kfunc bpf_strcasecmp.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_81A1A0ACC04B68158C57C4D151C46A832B07@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-04 09:00:57 -07:00
Menglong Dong
a85d888768 selftests/bpf: add benchmark testing for kprobe-multi-all
For now, the benchmark for kprobe-multi is single, which means there is
only 1 function is hooked during testing. Add the testing
"kprobe-multi-all", which will hook all the kernel functions during
the benchmark. And the "kretprobe-multi-all" is added too.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250904021011.14069-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-04 09:00:25 -07:00
Menglong Dong
adf6b57ce4 selftests/bpf: skip recursive functions for kprobe_multi
Some functions is recursive for the kprobe_multi and impact the benchmark
results. So just skip them.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250904021011.14069-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-04 09:00:25 -07:00
Menglong Dong
8bad31edf5 selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
We need to get all the kernel function that can be traced sometimes, so we
move the get_syms() and get_addrs() in kprobe_multi_test.c to
trace_helpers.c and rename it to bpf_get_ksyms() and bpf_get_addrs().

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250904021011.14069-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-04 09:00:25 -07:00
Ricardo B. Marlière
c9110e6f72 selftests/bpf: Fix count write in testapp_xdp_metadata_copy()
Commit 4b30209255 ("selftests/xsk: Add tail adjustment tests and support
check") added a new global to xsk_xdp_progs.c, but left out the access in
the testapp_xdp_metadata_copy() function. Since bpf_map_update_elem() will
write to the whole bss section, it gets truncated. Fix by writing to
skel_rx->bss->count directly.

Fixes: 4b30209255 ("selftests/xsk: Add tail adjustment tests and support check")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250829-selftests-bpf-xsk_regression_fix-v1-1-5f5acdb9fe6b@suse.com
2025-09-03 16:46:36 -07:00
Ricardo B. Marlière
2a912258c9 selftests/bpf: Upon failures, exit with code 1 in test_xsk.sh
Currently, even if some subtests fails, the end result will still yield
"ok 1 selftests: bpf: test_xsk.sh". Fix it by exiting with 1 if there are
any failures.

Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-test_xsk_ret-v1-1-e6656c01f397@suse.com
2025-09-03 16:44:22 -07:00
Ricardo B. Marlière
98857d111c selftests/bpf: Fix bpf_prog_detach2 usage in test_lirc_mode2
Commit e9fc3ce99b ("libbpf: Streamline error reporting for high-level
APIs") redefined the way that bpf_prog_detach2() returns. Therefore, adapt
the usage in test_lirc_mode2_user.c.

Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-v1-1-c7811cd8b98c@suse.com
2025-08-28 09:49:46 -07:00
Eric Dumazet
51132b99f0 udp: add drop_counters to udp socket
When a packet flood hits one or more UDP sockets, many cpus
have to update sk->sk_drops.

This slows down other cpus, because currently
sk_drops is in sock_write_rx group.

Add a socket_drop_counters structure to udp sockets.

Using dedicated cache lines to hold drop counters
makes sure that consumers no longer suffer from
false sharing if/when producers only change sk->sk_drops.

This adds 128 bytes per UDP socket.

Tested with the following stress test, sending about 11 Mpps
to a dual socket AMD EPYC 7B13 64-Core.

super_netperf 20 -t UDP_STREAM -H DUT -l10 -- -n -P,1000 -m 120
Note: due to socket lookup, only one UDP socket is receiving
packets on DUT.

Then measure receiver (DUT) behavior. We can see both
consumer and BH handlers can process more packets per second.

Before:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 615091             0.0
Udp6InErrors                    3904277            0.0
Udp6RcvbufErrors                3904277            0.0

After:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 816281             0.0
Udp6InErrors                    7497093            0.0
Udp6RcvbufErrors                7497093            0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-5-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-28 13:14:50 +02:00
Matt Fleming
737433c6a5 selftests/bpf: Add LPM trie microbenchmarks
Add benchmarks for the standard set of operations: LOOKUP, INSERT,
UPDATE, DELETE. Also include benchmarks to measure the overhead of the
bench framework itself (NOOP) as well as the overhead of generating keys
(BASELINE). Lastly, this includes a benchmark for FREE (trie_free())
which is known to have terrible performance for maps with many entries.

Benchmarks operate on tries without gaps in the key range, i.e. each
test begins or ends with a trie with valid keys in the range [0,
nr_entries). This is intended to cause maximum branching when traversing
the trie.

LOOKUP, UPDATE, DELETE, and FREE fill a BPF LPM trie from userspace
using bpf_map_update_batch() and run the corresponding benchmark
operation via bpf_loop(). INSERT starts with an empty map and fills it
kernel-side from bpf_loop(). FREE records the time to free a filled LPM
trie by attaching and destroying a BPF prog. NOOP measures the overhead
of the test harness by running an empty function with bpf_loop().
BASELINE is similar to NOOP except that the function generates a key.

Each operation runs 10,000 times using bpf_loop(). Note that this value
is intentionally independent of the number of entries in the LPM trie so
that the stability of the results isn't affected by the number of
entries.

For those benchmarks that need to reset the LPM trie once it's full
(INSERT) or empty (DELETE), throughput and latency results are scaled by
the fraction of a second the operation actually ran to ignore any time
spent reinitialising the trie.

By default, benchmarks run using sequential keys in the range [0,
nr_entries). BASELINE, LOOKUP, and UPDATE can use random keys via the
--random parameter but beware there is a runtime cost involved in
generating random keys. Other benchmarks are prohibited from using
random keys because it can skew the results, e.g. when inserting an
existing key or deleting a missing one.

All measurements are recorded from within the kernel to eliminate
syscall overhead. Most benchmarks run an XDP program to generate stats
but FREE needs to collect latencies using fentry/fexit on
map_free_deferred() because it's not possible to use fentry directly on
lpm_trie.c since commit c83508da56 ("bpf: Avoid deadlock caused by
nested kprobe and fentry bpf programs") and there's no way to
create/destroy a map from within an XDP program.

Here is example output from an AMD EPYC 9684X 96-Core machine for each
of the benchmarks using a trie with 10K entries and a 32-bit prefix
length, e.g.

  $ ./bench lpm-trie-$op \
  	--prefix_len=32  \
	--producers=1     \
	--nr_entries=10000

     noop: throughput   74.417 ± 0.032 M ops/s ( 74.417M ops/prod), latency   13.438 ns/op
 baseline: throughput   70.107 ± 0.171 M ops/s ( 70.107M ops/prod), latency   14.264 ns/op
   lookup: throughput    8.467 ± 0.047 M ops/s (  8.467M ops/prod), latency  118.109 ns/op
   insert: throughput    2.440 ± 0.015 M ops/s (  2.440M ops/prod), latency  409.290 ns/op
   update: throughput    2.806 ± 0.042 M ops/s (  2.806M ops/prod), latency  356.322 ns/op
   delete: throughput    4.625 ± 0.011 M ops/s (  4.625M ops/prod), latency  215.613 ns/op
     free: throughput    0.578 ± 0.006 K ops/s (  0.578K ops/prod), latency    1.730 ms/op

And the same benchmarks using random keys:

  $ ./bench lpm-trie-$op \
  	--prefix_len=32  \
	--producers=1     \
	--nr_entries=10000 \
	--random

     noop: throughput   74.259 ± 0.335 M ops/s ( 74.259M ops/prod), latency   13.466 ns/op
 baseline: throughput   35.150 ± 0.144 M ops/s ( 35.150M ops/prod), latency   28.450 ns/op
   lookup: throughput    7.119 ± 0.048 M ops/s (  7.119M ops/prod), latency  140.469 ns/op
   insert: N/A
   update: throughput    2.736 ± 0.012 M ops/s (  2.736M ops/prod), latency  365.523 ns/op
   delete: N/A
     free: N/A

Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20250827140149.1001557-1-matt@readmodwrite.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-27 17:28:14 -07:00
Puranjay Mohan
22b22bf9ee selftests/bpf: Enable timed may_goto tests for arm64
As arm64 JIT now supports timed may_goto instruction, make sure all
relevant tests run on this architecture. Some tests were enabled and
other required modifications to work properly on arm64.

 $ ./test_progs -a "stream*","*may_goto*",verifier_bpf_fastcall

 #404     stream_errors:OK
 [...]
 #406/2   stream_success/stream_cond_break:OK
 [...]
 #494/23  verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP
 #494/24  verifier_bpf_fastcall/may_goto_interaction_arm64:OK
 [...]
 #539/1   verifier_may_goto_1/may_goto 0:OK
 #539/2   verifier_may_goto_1/batch 2 of may_goto 0:OK
 #539/3   verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK
 #539/4   verifier_may_goto_1/may_goto batch with offsets 2/0:OK
 #539     verifier_may_goto_1:OK
 #540/1   verifier_may_goto_2/C code with may_goto 0:OK
 #540     verifier_may_goto_2:OK
 Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20250827113245.52629-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-27 17:16:22 -07:00
Jiawei Zhao
69424097ee selftests/bpf: Enrich subtest_basic_usdt case in selftests to cover SIB handling logic
When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array, e.g. "1@-96(%rbp,%rax,8)".

In this patch:
- enrich subtest_basic_usdt test case to cover SIB addressing usdt argument spec
  handling logic

Signed-off-by: Jiawei Zhao <phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250827053128.1301287-3-phoenix500526@163.com
2025-08-27 15:48:05 -07:00
Shubham Sharma
d3abefe897 selftests/bpf: Fix typos and grammar in test sources
Fix spelling typos and grammar errors in BPF selftests source code.

Signed-off-by: Shubham Sharma <slopixelz@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250826125746.17983-1-slopixelz@gmail.com
2025-08-27 15:13:08 -07:00
Nandakumar Edamana
2660b9d477 bpf: Add selftest to check the verifier's abstract multiplication
Add new selftest to test the abstract multiplication technique(s) used
by the verifier, following the recent improvement in tnum
multiplication (tnum_mul). One of the newly added programs,
verifier_mul/mul_precise, results in a false positive with the old
tnum_mul, while the program passes with the latest one.

Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250826034524.2159515-2-nandakumar@nandakumar.co.in
2025-08-27 15:00:31 -07:00
Ilya Leoshkevich
21bce56940 selftests/bpf: Remove may_goto tests from DENYLIST.s390x
The may_goto instruction is now fully supported on s390x, including the
timed implementation, so remove the respective test from the denylist.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-6-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-26 16:51:52 -07:00
Ilya Leoshkevich
7197dbcba2 selftests/bpf: Enable timed may_goto verifier tests on s390x
Now that the timed may_goto implementation is available on s390x,
enable the respective verifier tests.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-5-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-26 16:51:52 -07:00
Ilya Leoshkevich
1e4e6b9e26 selftests/bpf: Add __arch_s390x macro
Make it possible to limit certain tests to s390x, just like it's
already done for x86_64, arm64, and riscv64.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-4-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-26 16:51:52 -07:00
Ilya Leoshkevich
b68dfcc12a selftests/bpf: Add a missing newline to the "bad arch spec" message
Fix error messages like this one:

  parse_test_spec:FAIL:569 bad arch spec: 's390x'process_subtest:FAIL:1153 Can't parse test spec for program 'may_goto_simple'

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-26 16:51:52 -07:00
Tiezhu Yang
d0f27ff27c selftests/bpf: Remove entries from config.{arch} already present in config
`config.{arch}` had entries already present in `config`.

When generating the config used by vmtest, concatenate the `config` file
with the `config.{arch}` one, making those entries duplicated, so remove
those duplications.

Use the following command to get the differences:

  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.x86_64) <(sort tools/testing/selftests/bpf/config)
  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.aarch64) <(sort tools/testing/selftests/bpf/config)
  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.riscv64) <(sort tools/testing/selftests/bpf/config)
  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.ppc64el) <(sort tools/testing/selftests/bpf/config)
  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.s390x) <(sort tools/testing/selftests/bpf/config)

This is similar with commit 7a42af4b94 ("selftests/bpf: Remove entries
from config.s390x already present in config").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250826065057.11415-1-yangtiezhu@loongson.cn
2025-08-26 13:47:03 +02:00
Paul Chaignon
0780f54ab1 selftests/bpf: Tests for is_scalar_branch_taken tnum logic
This patch adds tests for the new jeq and jne logic in
is_scalar_branch_taken. The following shows the first test failing
before the previous patch is applied. Once the previous patch is
applied, the verifier can use the tnum values to deduce that instruction
7 is dead code.

  0: call bpf_get_prandom_u32#7  ; R0_w=scalar()
  1: w0 = w0                     ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  2: r0 >>= 30                   ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=3,var_off=(0x0; 0x3))
  3: r0 <<= 30                   ; R0_w=scalar(smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000))
  4: r1 = r0                     ; R0_w=scalar(id=1,smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000)) R1_w=scalar(id=1,smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000))
  5: r1 += 1024                  ; R1_w=scalar(smin=umin=umin32=1024,smax=umax=umax32=0xc0000400,smin32=0x80000400,smax32=0x40000400,var_off=(0x400; 0xc0000000))
  6: if r1 != r0 goto pc+1       ; R0_w=scalar(id=1,smin=umin=umin32=1024,smax=umax=umax32=0xc0000000,smin32=0x80000400,smax32=0x40000000,var_off=(0x400; 0xc0000000)) R1_w=scalar(smin=umin=umin32=1024,smax=umax=umax32=0xc0000000,smin32=0x80000400,smax32=0x40000400,var_off=(0x400; 0xc0000000))
  7: r10 = 0
  frame pointer is read only

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/550004f935e2553bdb2fb1f09cbde7d0452112d0.1755694148.git.paul.chaignon@gmail.com
2025-08-22 18:12:34 +02:00
Hengqi Chen
21aeabb682 selftests/bpf: Use vmlinux.h for BPF programs
Some of the bpf test progs still use linux/libc headers.
Let's use vmlinux.h instead like the rest of test progs.
This will also ease cross compiling.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250821030254.398826-1-hengqi.chen@gmail.com
2025-08-21 11:32:25 -07:00
Jiri Olsa
52718438af selftests/bpf: Fix uprobe syscall shadow stack test
Now that we have uprobe syscall working properly with shadow stack,
we can remove testing limitations for shadow stack tests and make
sure uprobe gets properly optimized.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250821141557.13233-1-jolsa@kernel.org
2025-08-21 20:09:25 +02:00
Jiri Olsa
3abf4298c6 selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
Changing the test_uretprobe_regs_change test to test both uprobe
and uretprobe by adding entry consumer handler to the testmod
and making it to change one of the registers.

Making sure that changed values both uprobe and uretprobe handlers
propagate to the user space.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-20-jolsa@kernel.org
2025-08-21 20:09:25 +02:00
Jiri Olsa
275eae6789 selftests/bpf: Add uprobe_regs_equal test
Changing uretprobe_regs_trigger to allow the test for both
uprobe and uretprobe and renaming it to uprobe_regs_equal.

We check that both uprobe and uretprobe probes (bpf programs)
see expected registers with few exceptions.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-19-jolsa@kernel.org
2025-08-21 20:09:25 +02:00
Jiri Olsa
875e1705ad selftests/bpf: Add optimized usdt variant for basic usdt test
Adding optimized usdt variant for basic usdt test to check that
usdt arguments are properly passed in optimized code path.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-18-jolsa@kernel.org
2025-08-21 20:09:24 +02:00
Jiri Olsa
c11661bd9a selftests/bpf: Add uprobe syscall sigill signal test
Make sure that calling uprobe syscall from outside uprobe trampoline
results in sigill signal.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-17-jolsa@kernel.org
2025-08-21 20:09:24 +02:00
Jiri Olsa
c8be59667c selftests/bpf: Add hit/attach/detach race optimized uprobe test
Adding test that makes sure parallel execution of the uprobe and
attach/detach of optimized uprobe on it works properly.

By default the test runs for 500ms, which is adjustable by using
BPF_SELFTESTS_UPROBE_SYSCALL_RACE_MSEC env variable.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-16-jolsa@kernel.org
2025-08-21 20:09:24 +02:00
Jiri Olsa
d5c86c3370 selftests/bpf: Add uprobe/usdt syscall tests
Adding tests for optimized uprobe/usdt probes.

Checking that we get expected trampoline and attached bpf programs
get executed properly.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-15-jolsa@kernel.org
2025-08-21 20:09:24 +02:00
Jiri Olsa
7932c4cf57 selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi
Renaming uprobe_syscall_executed prog to test_uretprobe_multi
to fit properly in the following changes that add more programs.

Plus adding pid filter and increasing executed variable.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-14-jolsa@kernel.org
2025-08-21 20:09:23 +02:00
Jiri Olsa
4e7005223e selftests/bpf: Reorg the uprobe_syscall test function
Adding __test_uprobe_syscall with non x86_64 stub to execute all the tests,
so we don't need to keep adding non x86_64 stub functions for new tests.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-13-jolsa@kernel.org
2025-08-21 20:09:23 +02:00
Jiri Olsa
17c3b00157 selftests/bpf: Import usdt.h from libbpf/usdt project
Importing usdt.h from libbpf/usdt project.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250720112133.244369-12-jolsa@kernel.org
2025-08-21 20:09:23 +02:00
Martin KaFai Lau
5c42715e63 Merge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/master'
Merge 'skb-meta-dynptr' branch into 'master' branch. No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-08-18 17:59:26 -07:00
Jakub Sitnicki
403fae5978 selftests/bpf: Cover metadata access from a modified skb clone
Demonstrate that, when processing an skb clone, the metadata gets truncated
if the program contains a direct write to either the payload or the
metadata, due to an implicit unclone in the prologue, and otherwise the
dynptr to the metadata is limited to being read-only.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-9-8a39e636e0fb@cloudflare.com
2025-08-18 10:29:43 -07:00
Jakub Sitnicki
bd1b51b319 selftests/bpf: Cover read/write to skb metadata at an offset
Exercise r/w access to skb metadata through an offset-adjusted dynptr,
read/write helper with an offset argument, and a slice starting at an
offset.

Also check for the expected errors when the offset is out of bounds.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-8-8a39e636e0fb@cloudflare.com
2025-08-18 10:29:43 -07:00
Jakub Sitnicki
ed93360807 selftests/bpf: Cover write access to skb metadata via dynptr
Add tests what exercise writes to skb metadata in two ways:
1. indirectly, using bpf_dynptr_write helper,
2. directly, using a read-write dynptr slice.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-7-8a39e636e0fb@cloudflare.com
2025-08-18 10:29:43 -07:00
Jakub Sitnicki
153f6bfd48 selftests/bpf: Cover read access to skb metadata via dynptr
Exercise reading from SKB metadata area in two new ways:
1. indirectly, with bpf_dynptr_read(), and
2. directly, with bpf_dynptr_slice().

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-6-8a39e636e0fb@cloudflare.com
2025-08-18 10:29:42 -07:00
Jakub Sitnicki
dd9f6cfb4e selftests/bpf: Parametrize test_xdp_context_tuntap
We want to add more test cases to cover different ways to access the
metadata area. Prepare for it. Pull up the skeleton management.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-5-8a39e636e0fb@cloudflare.com
2025-08-18 10:29:42 -07:00
Jakub Sitnicki
6dfd5e01e1 selftests/bpf: Pass just bpf_map to xdp_context_test helper
Prepare for parametrizing the xdp_context tests. The assert_test_result
helper doesn't need the whole skeleton. Pass just what it needs.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-4-8a39e636e0fb@cloudflare.com
2025-08-18 10:29:42 -07:00
Jakub Sitnicki
0e74eb4d57 selftests/bpf: Cover verifier checks for skb_meta dynptr type
dynptr for skb metadata behaves the same way as the dynptr for skb data
with one exception - writes to skb_meta dynptr don't invalidate existing
skb and skb_meta slices.

Duplicate those the skb dynptr tests which we can, since
bpf_dynptr_from_skb_meta kfunc can be called only from TC BPF, to cover the
skb_meta dynptr verifier checks.

Also add a couple of new tests (skb_data_valid_*) to ensure we don't
invalidate the slices in the mentioned case, which are specific to skb_meta
dynptr.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-3-8a39e636e0fb@cloudflare.com
2025-08-18 10:29:42 -07:00
Ilya Leoshkevich
1274163035 selftests/bpf: Clobber a lot of registers in tailcall_bpf2bpf_hierarchy tests
Clobbering a lot of registers and stack slots helps exposing tail call
counter overwrite bugs in JITs.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250813121016.163375-5-iii@linux.ibm.com
2025-08-18 15:08:30 +02:00
Yureka Lilian
7f8fa9d370 selftests/bpf: Add test for DEVMAP reuse
The test covers basic re-use of a pinned DEVMAP map,
with both matching and mismatching parameters.

Signed-off-by: Yureka Lilian <yuka@yuka.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250814180113.1245565-4-yuka@yuka.dev
2025-08-15 16:52:52 -07:00
Matt Bobrowski
c80d797206 bpf/selftests: Fix test_tcpnotify_user
Based on a bisect, it appears that commit 7ee9887703 ("timers:
Implement the hierarchical pull model") has somehow inadvertently
broken BPF selftest test_tcpnotify_user. The error that is being
generated by this test is as follows:

	FAILED: Wrong stats Expected 10 calls, got 8

It looks like the change allows timer functions to be run on CPUs
different from the one they are armed on. The test had pinned itself
to CPU 0, and in the past the retransmit attempts also occurred on CPU
0. The test had set the max_entries attribute for
BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling
bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was
likely to be in range. With the change to allow timers to run on other
CPUs, the current CPU tasked with performing the retransmit might be
bumped and in turn fall out of range, as the event will be filtered
out via __bpf_perf_event_output() using:

    if (unlikely(index >= array->map.max_entries))
            return -E2BIG;

A possible change would be to explicitly set the max_entries attribute
for perf_event_map in test_tcpnotify_kern.c to a value that's at least
as large as the number of CPUs. As it turns out however, if the field
is left unset, then the libbpf will determine the number of CPUs available
on the underlying system and update the max_entries attribute accordingly
in map_set_def_max_entries().

A further problem with the test is that it has a thread that continues
running up until the program exits. The main thread cleans up some
LIBBPF data structures, while the other thread continues to use them,
which inevitably will trigger a SIGSEGV. This can be dealt with by
telling the thread to run for as long as necessary and doing a
pthread_join on it before exiting the program.

Finally, I don't think binding the process to CPU 0 is meaningful for
this test any more, so get rid of that.

Fixes: 435f90a338 ("selftests/bpf: add a test case for sock_ops perf-event notification")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/aJ8kHhwgATmA3rLf@google.com
2025-08-15 13:05:29 -07:00
Pu Lehui
dc0fe95614 selftests/bpf: Enable arena atomics tests for RV64
Enable arena atomics tests for RV64.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-11-pulehui@huaweicloud.com
2025-08-15 10:46:51 +02:00
Amery Hung
07866544e4 selftests/bpf: Copy test_kmods when installing selftest
Commit d6212d82bf ("selftests/bpf: Consolidate kernel modules into
common directory") consolidated the Makefile of test_kmods. However,
since it removed test_kmods from TEST_GEN_PROGS_EXTENDED, the kernel
modules required by bpf selftests are now missing from kselftest_install
when "make install". Fix it by adding test_kmod to TEST_GEN_FILES.

Fixes: d6212d82bf ("selftests/bpf: Consolidate kernel modules into common directory")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250812175039.2323570-1-ameryhung@gmail.com
2025-08-13 15:53:01 -07:00
Martin KaFai Lau
9e293d47bf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Cross merge bpf/master after 6.17-rc1.

No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-08-12 12:14:02 -07:00
Amery Hung
ba7000f1c3 selftests/bpf: Test multi_st_ops and calling kfuncs from different programs
Test multi_st_ops and demonstrate how different bpf programs can call
a kfuncs that refers to the struct_ops instance in the same source file
by id. The id is defined as a global vairable and initialized before
attaching the skeleton. Kfuncs that take the id can hide the argument
with a macro to make it almost transparent to bpf program developers.

The test involves two struct_ops returning different values from
.test_1. In syscall and tracing programs, check if the correct value is
returned by a kfunc that calls .test_1.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250806162540.681679-4-ameryhung@gmail.com
2025-08-06 16:01:58 -07:00
Amery Hung
eeb52b6279 selftests/bpf: Add multi_st_ops that supports multiple instances
Current struct_ops in bpf_testmod only support attaching single instance.
Add multi_st_ops that supports multiple instances. The struct_ops uses map
id as the struct_ops id and will reject attachment with an existing id.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250806162540.681679-3-ameryhung@gmail.com
2025-08-06 16:01:56 -07:00
Linus Torvalds
806381e1a2 powerpc fixes for 6.17 #2
- Fixes for several issues in the powernv PCI hotplug path
  - Fix htmldoc generation for htm.rst in toctree
  - Add jit support for load_acquire and store_release in ppc64 bpf jit
 
 Thanks to: Bjorn Helgaas, Hari Bathini, Puranjay Mohan, Saket Kumar Bhaskar,
 Shawn Anastasio, Timothy Pearson, Vishal Parmar
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmiQDOcACgkQpnEsdPSH
 ZJRqTBAAkpaZRrbzX6P1fjoTb89esIfY7YSymmXrgMwoF1fgTMzPSjg2Lzwzcjfb
 Ednlo8Hy+Ei2SDvctNaqtOcZuidn81AJN41aFIu44S/Mj1cde/cuBMKUTqx+ZBO0
 gI4Bd4+V7yEv484PF7ZJga9K1VM85THCLgVWJ00VNHVHyvxgAuFiXdWmh6/qMUyi
 thvvLR+ANuAQz3S4VwbBg3AifDl6LXx2s5VB30xYxnPKzFNKZmnGXKwuOJH7rQe2
 J8v99n0tcXW1tRGE4pVykzXg4EXL5zgWT9fJ5EZxbeXaW9sqMxi4VjO4jSsrSZ+K
 q2v362Dyjgygel9aC2rzN8Q+P5horX2QBR7knJJGa0VtztUiPWKR8za7vGLbzlcm
 rUvnTuoY1dwFB72Sy3amALalvWscssL/1sHazvRv65RcciW7/PZNVEiZ0xh1RKqb
 J8nWlb+iNBf2z12qLmS5DvUaveaZG1eyndLeD/knsEC49DEOoEy3t7QM1F7KscRR
 mYPsEpfjF/D0r+vzb6zl2ykwhJf/t7BNu7MdXH7xbIpj5iwtrhSUfvmx6g2MThzA
 Vee2QvQACscdop3W/6xgATH4xoq96v1XxMmCLnZ/HVl2PorxO27ad4EMNO4sBG8c
 5agWHT7EnoUgwNF30DRtIHd7jNK/jt8++3kZx6CG9hdSboAM/pM=
 =tTms
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:

 - Fixes for several issues in the powernv PCI hotplug path

 - Fix htmldoc generation for htm.rst in toctree

 - Add jit support for load_acquire and store_release in ppc64 bpf jit

Thanks to Bjorn Helgaas, Hari Bathini, Puranjay Mohan, Saket Kumar
Bhaskar, Shawn Anastasio, Timothy Pearson, and Vishal Parmar

* tag 'powerpc-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc64/bpf: Add jit support for load_acquire and store_release
  docs: powerpc: add htm.rst to toctree
  PCI: pnv_php: Enable third attention indicator state
  PCI: pnv_php: Fix surprise plug detection and recovery
  powerpc/eeh: Make EEH driver device hotplug safe
  powerpc/eeh: Export eeh_unfreeze_pe()
  PCI: pnv_php: Work around switches with broken presence detection
  PCI: pnv_php: Clean up allocated IRQs on unplug
2025-08-03 19:15:04 -07:00
Amery Hung
7841811417 selftests/bpf: Test concurrent task local data key creation
Test thread-safety of tld_create_key(). Since tld_create_key() does
not rely on locks but memory barriers and atomic operations to protect
the shared metadata, the thread-safety of the function is non-trivial.
Make sure concurrent tld_key_create(), both valid and invalid, can not
race and corrupt metatada, which may leads to TLDs not being thread-
specific or duplicate TLDs with the same name.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20250730185903.3574598-5-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01 18:00:46 -07:00
Amery Hung
120f1a950e selftests/bpf: Test basic task local data operations
Test basic operations of task local data with valid and invalid
tld_create_key().

For invalid calls, make sure they return the right error code and check
that the TLDs are not inserted by running tld_get_data("
value_not_exists") on the bpf side. The call should a null pointer.

For valid calls, first make sure the TLDs are created by calling
tld_get_data() on the bpf side. The call should return a valid pointer.

Finally, verify that the TLDs are indeed task-specific (i.e., their
addresses do not overlap) with multiple user threads. This done by
writing values unique to each thread, reading them from both user space
and bpf, and checking if the value read back matches the value written.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20250730185903.3574598-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01 18:00:46 -07:00
Amery Hung
31e838e1cd selftests/bpf: Introduce task local data
Task local data defines an abstract storage type for storing task-
specific data (TLD). This patch provides user space and bpf
implementation as header-only libraries for accessing task local data.

Task local data is a bpf task local storage map with two UPTRs:

- tld_meta_u, shared by all tasks of a process, consists of the total
  count and size of TLDs and an array of metadata of TLDs. A TLD
  metadata contains the size and name. The name is used to identify a
  specific TLD in bpf programs.

- u_tld_data points to a task-specific memory. It stores TLD data and
  the starting offset of data in a page.

  Task local design decouple user space and bpf programs. Since bpf
  program does not know the size of TLDs in compile time, u_tld_data
  is declared as a page to accommodate TLDs up to a page. As a result,
  while user space will likely allocate memory smaller than a page for
  actual TLDs, it needs to pin a page to kernel. It will pin the page
  that contains enough memory if the allocated memory spans across the
  page boundary.

The library also creates another task local storage map, tld_key_map,
to cache keys for bpf programs to speed up the access.

Below are the core task local data API:

                   User space                          BPF
  Define TLD       TLD_DEFINE_KEY(), tld_create_key()  -
  Init TLD object  -                                   tld_object_init()
  Get TLD data     tld_get_data()                      tld_get_data()

- TLD_DEFINE_KEY(), tld_create_key()

  A TLD is first defined by the user space with TLD_DEFINE_KEY() or
  tld_create_key(). TLD_DEFINE_KEY() defines a TLD statically and
  allocates just enough memory during initialization. tld_create_key()
  allows creating TLDs on the fly, but has a fix memory budget,
  TLD_DYN_DATA_SIZE.

  Internally, they all call __tld_create_key(), which iterates
  tld_meta_u->metadata to check if a TLD can be added. The total TLD
  size needs to fit into a page (limit of UPTR), and no two TLDs can
  have the same name. If a TLD can be added, u_tld_meta->cnt is
  increased using cmpxchg as there may be other concurrent
  __tld_create_key(). After a successful cmpxchg, the last available
  tld_meta_u->metadata now belongs to the calling thread. To prevent
  other threads from reading incomplete metadata while it is being
  updated, tld_meta_u->metadata->size is used to signal the completion.

  Finally, the offset, derived from adding up prior TLD sizes is then
  encapsulated as an opaque object key to prevent user misuse. The
  offset is guaranteed to be 8-byte aligned to prevent load/store
  tearing and allow atomic operations on it.

- tld_get_data()

  User space programs can pass the key to tld_get_data() to get a
  pointer to the associated TLD. The pointer will remain valid for the
  lifetime of the thread.

  tld_data_u is lazily allocated on the first call to tld_get_data().
  Trying to read task local data from bpf will result in -ENODATA
  during tld_object_init(). The task-specific memory need to be freed
  manually by calling tld_free() on thread exit to prevent memory leak
  or use TLD_FREE_DATA_ON_THREAD_EXIT.

- tld_object_init() (BPF)

  BPF programs need to call tld_object_init() before calling
  tld_get_data(). This is to avoid redundant map lookup in
  tld_get_data() by storing pointers to the map values on stack.
  The pointers are encapsulated as tld_object.

  tld_key_map is also created on the first time tld_object_init()
  is called to cache TLD keys successfully fetched by tld_get_data().

  bpf_task_storage_get(.., F_CREATE) needs to be retried since it may
  fail when another thread has already taken the percpu counter lock
  for the task local storage.

- tld_get_data() (BPF)

  BPF programs can also get a pointer to a TLD with tld_get_data().
  It uses the cached key in tld_key_map to locate the data in
  tld_data_u->data. If the cached key is not set yet (<= 0),
  __tld_fetch_key() will be called to iterate tld_meta_u->metadata
  and find the TLD by name. To prevent redundant string comparison
  in the future when the search fail, the tld_meta_u->cnt is stored
  in the non-positive range of the key. Next time, __tld_fetch_key()
  will be called only if there are new TLDs and the search will start
  from the newly added tld_meta_u->metadata using the old
  tld_meta_u-cnt.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20250730185903.3574598-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01 18:00:46 -07:00
Paul Chaignon
d8d2d9d12f selftests/bpf: Test for unaligned flow_dissector ctx access
This patch adds tests for two context fields where unaligned accesses
were not properly rejected.

Note the new macro is similar to the existing narrow_load macro, but we
need a different description and access offset. Combining the two
macros into one is probably doable but I don't think it would help
readability.

vmlinux.h is included in place of bpf.h so we have the definition of
struct bpf_nf_ctx.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/bf014046ddcf41677fb8b98d150c14027e9fddba.1754039605.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01 14:47:39 -07:00
Linus Torvalds
d9104cec3e bpf-next-6.17
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmiINnEACgkQ6rmadz2v
 bToBnA/9F+A3R6rTwGk4HK3xpfc/nm2Tanl3oRN7S2ub/mskDOtWSIyG6cVFZ0UG
 1fK6IkByyRIpAF/5qhdlw8drRXHkQtGLA0lP2L9llm4X1mHLofB18y9OeLrDE1WN
 KwNP06+IGX9W802lCGSIXOY+VmRscVfXSMokyQt2ilHplKjOnDqJcYkWupi3T2rC
 mz79FY9aEl2YrIcpj9RXz+8nwP49pZBuW2P0IM5PAIj4BJBXShrUp8T1nz94okNe
 NFsnAyRxjWpUT0McEgtA9WvpD9lZqujYD8Qp0KlGZWmI3vNpV5d9S1+dBcEb1n7q
 dyNMkTF3oRrJhhg4VqoHc6fVpzSEoZ9ZxV5Hx4cs+ganH75D4YbdGqx/7mR3DUgH
 MZh6rHF1pGnK7TAm7h5gl3ZRAOkZOaahbe1i01NKo9CEe5fSh3AqMyzJYoyGHRKi
 xDN39eQdWBNA+hm1VkbK2Bv93Rbjrka2Kj+D3sSSO9Bo/u3ntcknr7LW39idKz62
 Q8dkKHcCEtun7gjk0YXPF013y81nEohj1C+52BmJ2l5JitM57xfr6YOaQpu7DPDE
 AJbHx6ASxKdyEETecd0b+cXUPQ349zmRXy0+CDMAGKpBicC0H0mHhL14cwOY1Hfu
 EIpIjmIJGI3JNF6T5kybcQGSBOYebdV0FFgwSllzPvuYt7YsHCs=
 =/O3j
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Remove usermode driver (UMD) framework (Thomas Weißschuh)

 - Introduce Strongly Connected Component (SCC) in the verifier to
   detect loops and refine register liveness (Eduard Zingerman)

 - Allow 'void *' cast using bpf_rdonly_cast() and corresponding
   '__arg_untrusted' for global function parameters (Eduard Zingerman)

 - Improve precision for BPF_ADD and BPF_SUB operations in the verifier
   (Harishankar Vishwanathan)

 - Teach the verifier that constant pointer to a map cannot be NULL
   (Ihor Solodrai)

 - Introduce BPF streams for error reporting of various conditions
   detected by BPF runtime (Kumar Kartikeya Dwivedi)

 - Teach the verifier to insert runtime speculation barrier (lfence on
   x86) to mitigate speculative execution instead of rejecting the
   programs (Luis Gerhorst)

 - Various improvements for 'veristat' (Mykyta Yatsenko)

 - For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to
   improve bug detection by syzbot (Paul Chaignon)

 - Support BPF private stack on arm64 (Puranjay Mohan)

 - Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's
   node (Song Liu)

 - Introduce kfuncs for read-only string opreations (Viktor Malik)

 - Implement show_fdinfo() for bpf_links (Tao Chen)

 - Reduce verifier's stack consumption (Yonghong Song)

 - Implement mprog API for cgroup-bpf programs (Yonghong Song)

* tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits)
  selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite
  selftests/bpf: Add selftest for attaching tracing programs to functions in deny list
  bpf: Add log for attaching tracing programs to functions in deny list
  bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions
  bpf: Fix various typos in verifier.c comments
  bpf: Add third round of bounds deduction
  selftests/bpf: Test invariants on JSLT crossing sign
  selftests/bpf: Test cross-sign 64bits range refinement
  selftests/bpf: Update reg_bound range refinement logic
  bpf: Improve bounds when s64 crosses sign boundary
  bpf: Simplify bounds refinement from s32
  selftests/bpf: Enable private stack tests for arm64
  bpf, arm64: JIT support for private stack
  bpf: Move bpf_jit_get_prog_name() to core.c
  bpf, arm64: Fix fp initialization for exception boundary
  umd: Remove usermode driver framework
  bpf/preload: Don't select USERMODE_DRIVER
  selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure
  selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure
  selftests/bpf: Increase xdp data size for arm64 64K page size
  ...
2025-07-30 09:58:50 -07:00
Linus Torvalds
8be4d31cb8 Networking changes for 6.17.
Core & protocols
 ----------------
 
  - Wrap datapath globals into net_aligned_data, to avoid false sharing.
 
  - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container).
 
  - Add SO_INQ and SCM_INQ support to AF_UNIX.
 
  - Add SIOCINQ support to AF_VSOCK.
 
  - Add TCP_MAXSEG sockopt to MPTCP.
 
  - Add IPv6 force_forwarding sysctl to enable forwarding per interface.
 
  - Make TCP validation of whether packet fully fits in the receive
    window and the rcv_buf more strict. With increased use of HW
    aggregation a single "packet" can be multiple 100s of kB.
 
  - Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
    improves latency up to 33% for sockmap users.
 
  - Convert TCP send queue handling from tasklet to BH workque.
 
  - Improve BPF iteration over TCP sockets to see each socket exactly once.
 
  - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code.
 
  - Support enabling kernel threads for NAPI processing on per-NAPI
    instance basis rather than a whole device. Fully stop the kernel NAPI
    thread when threaded NAPI gets disabled. Previously thread would stick
    around until ifdown due to tricky synchronization.
 
  - Allow multicast routing to take effect on locally-generated packets.
 
  - Add output interface argument for End.X in segment routing.
 
  - MCTP: add support for gateway routing, improve bind() handling.
 
  - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink.
 
  - Add a new neighbor flag ("extern_valid"), which cedes refresh
    responsibilities to userspace. This is needed for EVPN multi-homing
    where a neighbor entry for a multi-homed host needs to be synced
    across all the VTEPs among which the host is multi-homed.
 
  - Support NUD_PERMANENT for proxy neighbor entries.
 
  - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM.
 
  - Add sequence numbers to netconsole messages. Unregister netconsole's
    console when all net targets are removed. Code refactoring.
    Add a number of selftests.
 
  - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
    should be used for an inbound SA lookup.
 
  - Support inspecting ref_tracker state via DebugFS.
 
  - Don't force bonding advertisement frames tx to ~333 ms boundaries.
    Add broadcast_neighbor option to send ARP/ND on all bonded links.
 
  - Allow providing upcall pid for the 'execute' command in openvswitch.
 
  - Remove DCCP support from Netfilter's conntrack.
 
  - Disallow multiple packet duplications in the queuing layer.
 
  - Prevent use of deprecated iptables code on PREEMPT_RT.
 
 Driver API
 ----------
 
  - Support RSS and hashing configuration over ethtool Netlink.
 
  - Add dedicated ethtool callbacks for getting and setting hashing fields.
 
  - Add support for power budget evaluation strategy in PSE /
    Power-over-Ethernet. Generate Netlink events for overcurrent etc.
 
  - Support DPLL phase offset monitoring across all device inputs.
    Support providing clock reference and SYNC over separate DPLL
    inputs.
 
  - Support traffic classes in devlink rate API for bandwidth management.
 
  - Remove rtnl_lock dependency from UDP tunnel port configuration.
 
 Device drivers
 --------------
 
  - Add a new Broadcom driver for 800G Ethernet (bnge).
 
  - Add a standalone driver for Microchip ZL3073x DPLL.
 
  - Remove IBM's NETIUCV device driver.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
     - support zero-copy Tx of DMABUF memory
     - take page size into account for page pool recycling rings
    - Intel (100G, ice, idpf):
      - idpf: XDP and AF_XDP support preparations
      - idpf: add flow steering
      - add link_down_events statistic
      - clean up the TSPLL code
      - preparations for live VM migration
    - nVidia/Mellanox:
     - support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
     - optimize context memory usage for matchers
     - expose serial numbers in devlink info
     - support PCIe congestion metrics
    - Meta (fbnic):
      - add 25G, 50G, and 100G link modes to phylink
      - support dumping FW logs
    - Marvell/Cavium:
      - support for CN20K generation of the Octeon chips
    - Amazon:
      - add HW clock (without timestamping, just hypervisor time access)
 
  - Ethernet virtual:
    - VirtIO net:
      - support segmentation of UDP-tunnel-encapsulated packets
    - Google (gve):
      - support packet timestamping and clock synchronization
    - Microsoft vNIC:
      - add handler for device-originated servicing events
      - allow dynamic MSI-X vector allocation
      - support Tx bandwidth clamping
 
  - Ethernet NICs consumer, and embedded:
    - AMD:
      - amd-xgbe: hardware timestamping and PTP clock support
    - Broadcom integrated MACs (bcmgenet, bcmasp):
      - use napi_complete_done() return value to support NAPI polling
      - add support for re-starting auto-negotiation
    - Broadcom switches (b53):
      - support BCM5325 switches
      - add bcm63xx EPHY power control
    - Synopsys (stmmac):
      - lots of code refactoring and cleanups
    - TI:
      - icssg-prueth: read firmware-names from device tree
      - icssg: PRP offload support
    - Microchip:
      - lan78xx: convert to PHYLINK for improved PHY and MAC management
      - ksz: add KSZ8463 switch support
    - Intel:
      - support similar queue priority scheme in multi-queue and
        time-sensitive networking (taprio)
      - support packet pre-emption in both
    - RealTek (r8169):
      - enable EEE at 5Gbps on RTL8126
    - Airoha:
      - add PPPoE offload support
      - MDIO bus controller for Airoha AN7583
 
  - Ethernet PHYs:
    - support for the IPQ5018 internal GE PHY
    - micrel KSZ9477 switch-integrated PHYs:
      - add MDI/MDI-X control support
      - add RX error counters
      - add cable test support
      - add Signal Quality Indicator (SQI) reporting
    - dp83tg720: improve reset handling and reduce link recovery time
    - support bcm54811 (and its MII-Lite interface type)
    - air_en8811h: support resume/suspend
    - support PHY counters for QCA807x and QCA808x
    - support WoL for QCA807x
 
  - CAN drivers:
    - rcar_canfd: support for Transceiver Delay Compensation
    - kvaser: report FW versions via devlink dev info
 
  - WiFi:
    - extended regulatory info support (6 GHz)
    - add statistics and beacon monitor for Multi-Link Operation (MLO)
    - support S1G aggregation, improve S1G support
    - add Radio Measurement action fields
    - support per-radio RTS threshold
    - some work around how FIPS affects wifi, which was wrong (RC4 is used
      by TKIP, not only WEP)
    - improvements for unsolicited probe response handling
 
  - WiFi drivers:
    - RealTek (rtw88):
      - IBSS mode for SDIO devices
    - RealTek (rtw89):
      - BT coexistence for MLO/WiFi7
      - concurrent station + P2P support
      - support for USB devices RTL8851BU/RTL8852BU
    - Intel (iwlwifi):
      - use embedded PNVM in (to be released) FW images to fix
        compatibility issues
      - many cleanups (unused FW APIs, PCIe code, WoWLAN)
      - some FIPS interoperability
    - MediaTek (mt76):
      - firmware recovery improvements
      - more MLO work
    - Qualcomm/Atheros (ath12k):
      - fix scan on multi-radio devices
      - more EHT/Wi-Fi 7 features
      - encapsulation/decapsulation offload
    - Broadcom (brcm80211):
      - support SDIO 43751 device
 
  - Bluetooth:
    - hci_event: add support for handling LE BIG Sync Lost event
    - ISO: add socket option to report packet seqnum via CMSG
    - ISO: support SCM_TIMESTAMPING for ISO TS
 
  - Bluetooth drivers:
    - intel_pcie: support Function Level Reset
    - nxpuart: add support for 4M baudrate
    - nxpuart: implement powerup sequence, reset, FW dump, and FW loading
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmiFgLgACgkQMUZtbf5S
 IrvafxAAnQRwYBoIG+piCILx6z5pRvBGHkmEQ4AQgSCFuq2eO3ubwMFIqEybfma1
 5+QFjUZAV3OgGgKRBS2KGWxtSzdiF+/JGV1VOIN67sX3Mm0a2QgjA4n5CgKL0FPr
 o6BEzjX5XwG1zvGcBNQ5BZ19xUUKjoZQgTtnea8sZ57Fsp5RtRgmYRqoewNvNk/n
 uImh0NFsDVb0UeOpSzC34VD9l1dJvLGdui4zJAjno/vpvmT1DkXjoK419J/r52SS
 X+5WgsfJ6DkjHqVN1tIhhK34yWqBOcwGFZJgEnWHMkFIl2FqRfFKMHyqtfLlVnLA
 mnIpSyz8Sq2AHtx0TlgZ3At/Ri8p5+yYJgHOXcDKyABa8y8Zf4wrycmr6cV9JLuL
 z54nLEVnJuvfDVDVJjsLYdJXyhMpZFq6+uAItdxKaw8Ugp/QqG4QtoRj+XIHz4ZW
 z6OohkCiCzTwEISFK+pSTxPS30eOxq43kCspcvuLiwCCStJBRkRb5GdZA4dm7LA+
 1Od4ADAkHjyrFtBqTyyC2scX8UJ33DlAIpAYyIeS6w9Cj9EXxtp1z33IAAAZ03MW
 jJwIaJuc8bK2fWKMmiG7ucIXjPo4t//KiWlpkwwqLhPbjZgfDAcxq1AC2TLoqHBL
 y4EOgKpHDCMAghSyiFIAn2JprGcEt8dp+11B0JRXIn4Pm/eYDH8=
 =lqbe
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Wrap datapath globals into net_aligned_data, to avoid false sharing

   - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)

   - Add SO_INQ and SCM_INQ support to AF_UNIX

   - Add SIOCINQ support to AF_VSOCK

   - Add TCP_MAXSEG sockopt to MPTCP

   - Add IPv6 force_forwarding sysctl to enable forwarding per interface

   - Make TCP validation of whether packet fully fits in the receive
     window and the rcv_buf more strict. With increased use of HW
     aggregation a single "packet" can be multiple 100s of kB

   - Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
     improves latency up to 33% for sockmap users

   - Convert TCP send queue handling from tasklet to BH workque

   - Improve BPF iteration over TCP sockets to see each socket exactly
     once

   - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code

   - Support enabling kernel threads for NAPI processing on per-NAPI
     instance basis rather than a whole device. Fully stop the kernel
     NAPI thread when threaded NAPI gets disabled. Previously thread
     would stick around until ifdown due to tricky synchronization

   - Allow multicast routing to take effect on locally-generated packets

   - Add output interface argument for End.X in segment routing

   - MCTP: add support for gateway routing, improve bind() handling

   - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink

   - Add a new neighbor flag ("extern_valid"), which cedes refresh
     responsibilities to userspace. This is needed for EVPN multi-homing
     where a neighbor entry for a multi-homed host needs to be synced
     across all the VTEPs among which the host is multi-homed

   - Support NUD_PERMANENT for proxy neighbor entries

   - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM

   - Add sequence numbers to netconsole messages. Unregister
     netconsole's console when all net targets are removed. Code
     refactoring. Add a number of selftests

   - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
     should be used for an inbound SA lookup

   - Support inspecting ref_tracker state via DebugFS

   - Don't force bonding advertisement frames tx to ~333 ms boundaries.
     Add broadcast_neighbor option to send ARP/ND on all bonded links

   - Allow providing upcall pid for the 'execute' command in openvswitch

   - Remove DCCP support from Netfilter's conntrack

   - Disallow multiple packet duplications in the queuing layer

   - Prevent use of deprecated iptables code on PREEMPT_RT

  Driver API:

   - Support RSS and hashing configuration over ethtool Netlink

   - Add dedicated ethtool callbacks for getting and setting hashing
     fields

   - Add support for power budget evaluation strategy in PSE /
     Power-over-Ethernet. Generate Netlink events for overcurrent etc

   - Support DPLL phase offset monitoring across all device inputs.
     Support providing clock reference and SYNC over separate DPLL
     inputs

   - Support traffic classes in devlink rate API for bandwidth
     management

   - Remove rtnl_lock dependency from UDP tunnel port configuration

  Device drivers:

   - Add a new Broadcom driver for 800G Ethernet (bnge)

   - Add a standalone driver for Microchip ZL3073x DPLL

   - Remove IBM's NETIUCV device driver

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support zero-copy Tx of DMABUF memory
         - take page size into account for page pool recycling rings
      - Intel (100G, ice, idpf):
         - idpf: XDP and AF_XDP support preparations
         - idpf: add flow steering
         - add link_down_events statistic
         - clean up the TSPLL code
         - preparations for live VM migration
      - nVidia/Mellanox:
         - support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
         - optimize context memory usage for matchers
         - expose serial numbers in devlink info
         - support PCIe congestion metrics
      - Meta (fbnic):
         - add 25G, 50G, and 100G link modes to phylink
         - support dumping FW logs
      - Marvell/Cavium:
         - support for CN20K generation of the Octeon chips
      - Amazon:
         - add HW clock (without timestamping, just hypervisor time access)

   - Ethernet virtual:
      - VirtIO net:
         - support segmentation of UDP-tunnel-encapsulated packets
      - Google (gve):
         - support packet timestamping and clock synchronization
      - Microsoft vNIC:
         - add handler for device-originated servicing events
         - allow dynamic MSI-X vector allocation
         - support Tx bandwidth clamping

   - Ethernet NICs consumer, and embedded:
      - AMD:
         - amd-xgbe: hardware timestamping and PTP clock support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - use napi_complete_done() return value to support NAPI polling
         - add support for re-starting auto-negotiation
      - Broadcom switches (b53):
         - support BCM5325 switches
         - add bcm63xx EPHY power control
      - Synopsys (stmmac):
         - lots of code refactoring and cleanups
      - TI:
         - icssg-prueth: read firmware-names from device tree
         - icssg: PRP offload support
      - Microchip:
         - lan78xx: convert to PHYLINK for improved PHY and MAC management
         - ksz: add KSZ8463 switch support
      - Intel:
         - support similar queue priority scheme in multi-queue and
           time-sensitive networking (taprio)
         - support packet pre-emption in both
      - RealTek (r8169):
         - enable EEE at 5Gbps on RTL8126
      - Airoha:
         - add PPPoE offload support
         - MDIO bus controller for Airoha AN7583

   - Ethernet PHYs:
      - support for the IPQ5018 internal GE PHY
      - micrel KSZ9477 switch-integrated PHYs:
         - add MDI/MDI-X control support
         - add RX error counters
         - add cable test support
         - add Signal Quality Indicator (SQI) reporting
      - dp83tg720: improve reset handling and reduce link recovery time
      - support bcm54811 (and its MII-Lite interface type)
      - air_en8811h: support resume/suspend
      - support PHY counters for QCA807x and QCA808x
      - support WoL for QCA807x

   - CAN drivers:
      - rcar_canfd: support for Transceiver Delay Compensation
      - kvaser: report FW versions via devlink dev info

   - WiFi:
      - extended regulatory info support (6 GHz)
      - add statistics and beacon monitor for Multi-Link Operation (MLO)
      - support S1G aggregation, improve S1G support
      - add Radio Measurement action fields
      - support per-radio RTS threshold
      - some work around how FIPS affects wifi, which was wrong (RC4 is
        used by TKIP, not only WEP)
      - improvements for unsolicited probe response handling

   - WiFi drivers:
      - RealTek (rtw88):
         - IBSS mode for SDIO devices
      - RealTek (rtw89):
         - BT coexistence for MLO/WiFi7
         - concurrent station + P2P support
         - support for USB devices RTL8851BU/RTL8852BU
      - Intel (iwlwifi):
         - use embedded PNVM in (to be released) FW images to fix
           compatibility issues
         - many cleanups (unused FW APIs, PCIe code, WoWLAN)
         - some FIPS interoperability
      - MediaTek (mt76):
         - firmware recovery improvements
         - more MLO work
      - Qualcomm/Atheros (ath12k):
         - fix scan on multi-radio devices
         - more EHT/Wi-Fi 7 features
         - encapsulation/decapsulation offload
      - Broadcom (brcm80211):
         - support SDIO 43751 device

   - Bluetooth:
      - hci_event: add support for handling LE BIG Sync Lost event
      - ISO: add socket option to report packet seqnum via CMSG
      - ISO: support SCM_TIMESTAMPING for ISO TS

   - Bluetooth drivers:
      - intel_pcie: support Function Level Reset
      - nxpuart: add support for 4M baudrate
      - nxpuart: implement powerup sequence, reset, FW dump, and FW loading"

* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
  dpll: zl3073x: Fix build failure
  selftests: bpf: fix legacy netfilter options
  ipv6: annotate data-races around rt->fib6_nsiblings
  ipv6: fix possible infinite loop in fib6_info_uses_dev()
  ipv6: prevent infinite loop in rt6_nlmsg_size()
  ipv6: add a retry logic in net6_rt_notify()
  vrf: Drop existing dst reference in vrf_ip6_input_dst
  net/sched: taprio: align entry index attr validation with mqprio
  net: fsl_pq_mdio: use dev_err_probe
  selftests: rtnetlink.sh: remove esp4_offload after test
  vsock: remove unnecessary null check in vsock_getname()
  igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
  stmmac: xsk: fix negative overflow of budget in zerocopy mode
  dt-bindings: ieee802154: Convert at86rf230.txt yaml format
  net: dsa: microchip: Disable PTP function of KSZ8463
  net: dsa: microchip: Setup fiber ports for KSZ8463
  net: dsa: microchip: Write switch MAC address differently for KSZ8463
  net: dsa: microchip: Use different registers for KSZ8463
  net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
  dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
  ...
2025-07-30 08:58:55 -07:00
KaFai Wan
51d3750aba selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite
Delete fexit_noreturns.c files and migrate the cases into
tracing_failure.c files.

The result:

 $ tools/testing/selftests/bpf/test_progs -t tracing_failure/fexit_noreturns
 #467/4   tracing_failure/fexit_noreturns:OK
 #467     tracing_failure:OK
 Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-5-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 19:39:30 -07:00
KaFai Wan
a32f6f17a7 selftests/bpf: Add selftest for attaching tracing programs to functions in deny list
The result:

 $ tools/testing/selftests/bpf/test_progs -t tracing_failure/tracing_deny
 #468/3   tracing_failure/tracing_deny:OK
 #468     tracing_failure:OK
 Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-4-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 19:39:29 -07:00
KaFai Wan
a5a6b29a70 bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions
With this change, we know the precise rejected function name when
attaching fexit/fmod_ret to __noreturn functions from log.

$ ./fexit
libbpf: prog 'fexit': BPF program load failed: -EINVAL
libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG --
Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected.

Suggested-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 19:39:29 -07:00
Linus Torvalds
7e7bc8335b vfs-6.17-rc1.bpf
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCjwAKCRCRxhvAZXjc
 osnVAQCv4rM7sF4yJvGlm1myIJcJy5Sabk2q31qMdI1VHmkcOwD+Mxs7d1aByTS8
 /6djhVleq6lcT2LpP9j8YI3Rb+x30QY=
 =PF3o
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs bpf updates from Christian Brauner:
 "These changes allow bpf to read extended attributes from cgroupfs.

  This is useful in redirecting AF_UNIX socket connections based on
  cgroup membership of the socket. One use-case is the ability to
  implement log namespaces in systemd so services and containers are
  redirected to different journals"

* tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  selftests/kernfs: test xattr retrieval
  selftests/bpf: Add tests for bpf_cgroup_read_xattr
  bpf: Mark cgroup_subsys_state->cgroup RCU safe
  bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node
  kernfs: remove iattr_mutex
2025-07-28 14:42:31 -07:00
Paul Chaignon
5dbb19b16a bpf: Add third round of bounds deduction
Commit d7f0087381 ("bpf: try harder to deduce register bounds from
different numeric domains") added a second call to __reg_deduce_bounds
in reg_bounds_sync because a single call wasn't enough to converge to a
fixed point in terms of register bounds.

With patch "bpf: Improve bounds when s64 crosses sign boundary" from
this series, Eduard noticed that calling __reg_deduce_bounds twice isn't
enough anymore to converge. The first selftest added in "selftests/bpf:
Test cross-sign 64bits range refinement" highlights the need for a third
call to __reg_deduce_bounds. After instruction 7, reg_bounds_sync
performs the following bounds deduction:

  reg_bounds_sync entry:          scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
  __update_reg_bounds:            scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146)
  __reg_deduce_bounds:
      __reg32_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg64_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,smin32=-783,smax32=-146,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg_deduce_mixed_bounds:  scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
  __reg_deduce_bounds:
      __reg32_deduce_bounds:      scalar(smin=-655,smax=0xeffffeee,umin=umin32=0xfffffcf1,umax=0xffffffffffffff6e,smin32=-783,smax32=-146,umax32=0xffffff6e)
      __reg64_deduce_bounds:      scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
      __reg_deduce_mixed_bounds:  scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e)
  __reg_bound_offset:             scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))
  __update_reg_bounds:            scalar(smin=-655,smax=smax32=-146,umin=0xfffffffffffffd71,umax=0xffffffffffffff6e,smin32=-783,umin32=0xfffffcf1,umax32=0xffffff6e,var_off=(0xfffffffffffffc00; 0x3ff))

In particular, notice how:
1. In the first call to __reg_deduce_bounds, __reg32_deduce_bounds
   learns new u32 bounds.
2. __reg64_deduce_bounds is unable to improve bounds at this point.
3. __reg_deduce_mixed_bounds derives new u64 bounds from the u32 bounds.
4. In the second call to __reg_deduce_bounds, __reg64_deduce_bounds
   improves the smax and umin bounds thanks to patch "bpf: Improve
   bounds when s64 crosses sign boundary" from this series.
5. Subsequent functions are unable to improve the ranges further (only
   tnums). Yet, a better smin32 bound could be learned from the smin
   bound.

__reg32_deduce_bounds is able to improve smin32 from smin, but for that
we need a third call to __reg_deduce_bounds.

As discussed in [1], there may be a better way to organize the deduction
rules to learn the same information with less calls to the same
functions. Such an optimization requires further analysis and is
orthogonal to the present patchset.

Link: https://lore.kernel.org/bpf/aIKtSK9LjQXB8FLY@mail.gmail.com/ [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/79619d3b42e5525e0e174ed534b75879a5ba15de.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:13 -07:00
Paul Chaignon
f96841bbf4 selftests/bpf: Test invariants on JSLT crossing sign
The improvement of the u64/s64 range refinement fixed the invariant
violation that was happening on this test for BPF_JSLT when crossing the
sign boundary.

After this patch, we have one test remaining with a known invariant
violation. It's the same test as fixed here but for 32 bits ranges.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ad046fb0016428f1a33c3b81617aabf31b51183f.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:13 -07:00
Paul Chaignon
26e5e346a5 selftests/bpf: Test cross-sign 64bits range refinement
This patch adds coverage for the new cross-sign 64bits range refinement
logic. The three tests cover the cases when the u64 and s64 ranges
overlap (1) in the negative portion of s64, (2) in the positive portion
of s64, and (3) in both portions.

The first test is a simplified version of a BPF program generated by
syzkaller that caused an invariant violation [1]. It looks like
syzkaller could not extract the reproducer itself (and therefore didn't
report it to the mailing list), but I was able to extract it from the
console logs of a crash.

The principle is similar to the invariant violation described in
commit 6279846b9b ("bpf: Forget ranges when refining tnum after
JSET"): the verifier walks a dead branch, uses the condition to refine
ranges, and ends up with inconsistent ranges. In this case, the dead
branch is when we fallthrough on both jumps. The new refinement logic
improves the bounds such that the second jump is properly detected as
always-taken and the verifier doesn't end up walking a dead branch.

The second and third tests are inspired by the first, but rely on
condition jumps to prepare the bounds instead of ALU instructions. An
R10 write is used to trigger a verifier error when the bounds can't be
refined.

Link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/a0e17b00dab8dabcfa6f8384e7e151186efedfdd.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:12 -07:00
Paul Chaignon
da653de268 selftests/bpf: Update reg_bound range refinement logic
This patch updates the range refinement logic in the reg_bound test to
match the new logic from the previous commit. Without this change, tests
would fail because we end with more precise ranges than the tests
expect.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/b7f6b1fbe03373cca4e1bb6a113035a6cd2b3ff7.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-28 10:02:12 -07:00
Puranjay Mohan
cf2a6de32c powerpc64/bpf: Add jit support for load_acquire and store_release
Add JIT support for the load_acquire and store_release instructions. The
implementation is similar to the kernel where:

        load_acquire  => plain load -> lwsync
        store_release => lwsync -> plain store

To test the correctness of the implementation, following selftests were
run:

  [fedora@linux-kernel bpf]$ sudo ./test_progs -a \
  verifier_load_acquire,verifier_store_release,atomics
  #11/1    atomics/add:OK
  #11/2    atomics/sub:OK
  #11/3    atomics/and:OK
  #11/4    atomics/or:OK
  #11/5    atomics/xor:OK
  #11/6    atomics/cmpxchg:OK
  #11/7    atomics/xchg:OK
  #11      atomics:OK
  #519/1   verifier_load_acquire/load-acquire, 8-bit:OK
  #519/2   verifier_load_acquire/load-acquire, 8-bit @unpriv:OK
  #519/3   verifier_load_acquire/load-acquire, 16-bit:OK
  #519/4   verifier_load_acquire/load-acquire, 16-bit @unpriv:OK
  #519/5   verifier_load_acquire/load-acquire, 32-bit:OK
  #519/6   verifier_load_acquire/load-acquire, 32-bit @unpriv:OK
  #519/7   verifier_load_acquire/load-acquire, 64-bit:OK
  #519/8   verifier_load_acquire/load-acquire, 64-bit @unpriv:OK
  #519/9   verifier_load_acquire/load-acquire with uninitialized
  src_reg:OK
  #519/10  verifier_load_acquire/load-acquire with uninitialized src_reg
  @unpriv:OK
  #519/11  verifier_load_acquire/load-acquire with non-pointer src_reg:OK
  #519/12  verifier_load_acquire/load-acquire with non-pointer src_reg
  @unpriv:OK
  #519/13  verifier_load_acquire/misaligned load-acquire:OK
  #519/14  verifier_load_acquire/misaligned load-acquire @unpriv:OK
  #519/15  verifier_load_acquire/load-acquire from ctx pointer:OK
  #519/16  verifier_load_acquire/load-acquire from ctx pointer @unpriv:OK
  #519/17  verifier_load_acquire/load-acquire with invalid register R15:OK
  #519/18  verifier_load_acquire/load-acquire with invalid register R15
  @unpriv:OK
  #519/19  verifier_load_acquire/load-acquire from pkt pointer:OK
  #519/20  verifier_load_acquire/load-acquire from flow_keys pointer:OK
  #519/21  verifier_load_acquire/load-acquire from sock pointer:OK
  #519     verifier_load_acquire:OK
  #556/1   verifier_store_release/store-release, 8-bit:OK
  #556/2   verifier_store_release/store-release, 8-bit @unpriv:OK
  #556/3   verifier_store_release/store-release, 16-bit:OK
  #556/4   verifier_store_release/store-release, 16-bit @unpriv:OK
  #556/5   verifier_store_release/store-release, 32-bit:OK
  #556/6   verifier_store_release/store-release, 32-bit @unpriv:OK
  #556/7   verifier_store_release/store-release, 64-bit:OK
  #556/8   verifier_store_release/store-release, 64-bit @unpriv:OK
  #556/9   verifier_store_release/store-release with uninitialized
  src_reg:OK
  #556/10  verifier_store_release/store-release with uninitialized src_reg
  @unpriv:OK
  #556/11  verifier_store_release/store-release with uninitialized
  dst_reg:OK
  #556/12  verifier_store_release/store-release with uninitialized dst_reg
  @unpriv:OK
  #556/13  verifier_store_release/store-release with non-pointer
  dst_reg:OK
  #556/14  verifier_store_release/store-release with non-pointer dst_reg
  @unpriv:OK
  #556/15  verifier_store_release/misaligned store-release:OK
  #556/16  verifier_store_release/misaligned store-release @unpriv:OK
  #556/17  verifier_store_release/store-release to ctx pointer:OK
  #556/18  verifier_store_release/store-release to ctx pointer @unpriv:OK
  #556/19  verifier_store_release/store-release, leak pointer to stack:OK
  #556/20  verifier_store_release/store-release, leak pointer to stack
  @unpriv:OK
  #556/21  verifier_store_release/store-release, leak pointer to map:OK
  #556/22  verifier_store_release/store-release, leak pointer to map
  @unpriv:OK
  #556/23  verifier_store_release/store-release with invalid register
  R15:OK
  #556/24  verifier_store_release/store-release with invalid register R15
  @unpriv:OK
  #556/25  verifier_store_release/store-release to pkt pointer:OK
  #556/26  verifier_store_release/store-release to flow_keys pointer:OK
  #556/27  verifier_store_release/store-release to sock pointer:OK
  #556     verifier_store_release:OK
  Summary: 3/55 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250717202935.29018-2-puranjay@kernel.org
2025-07-28 08:13:35 +05:30
Puranjay Mohan
e9f545d0d3 selftests/bpf: Enable private stack tests for arm64
As arm64 JIT now supports private stack, make sure all relevant tests
run on arm64 architecture.

Relevant tests:

 #415/1   struct_ops_private_stack/private_stack:OK
 #415/2   struct_ops_private_stack/private_stack_fail:OK
 #415/3   struct_ops_private_stack/private_stack_recur:OK
 #415     struct_ops_private_stack:OK
 #549/1   verifier_private_stack/Private stack, single prog:OK
 #549/2   verifier_private_stack/Private stack, subtree > MAX_BPF_STACK:OK
 #549/3   verifier_private_stack/No private stack:OK
 #549/4   verifier_private_stack/Private stack, callback:OK
 #549/5   verifier_private_stack/Private stack, exception in mainprog:OK
 #549/6   verifier_private_stack/Private stack, exception in subprog:OK
 #549/7   verifier_private_stack/Private stack, async callback, not nested:OK
 #549/8   verifier_private_stack/Private stack, async callback, potential nesting:OK
 #549     verifier_private_stack:OK
 Summary: 2/11 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250724120257.7299-4-puranjay@kernel.org
2025-07-26 21:27:15 +02:00
Jakub Kicinski
38b74b212a selftests: bpf: fix legacy netfilter options
Recent commit to add NETFILTER_XTABLES_LEGACY missed setting
a couple of configs to y. They are still enabled but as modules
which appears to have upset BPF CI, e.g.:

   test_bpf_nf_ct:FAIL:iptables-legacy -t raw -A PREROUTING -j CONNMARK --set-mark 42/0 unexpected error: 768 (errno 0)

Fixes: 3c3ab65f00 ("selftests: net: Enable legacy netfilter legacy options.")
Link: https://patch.msgid.link/20250726155349.1161845-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26 12:05:09 -07:00
Yonghong Song
4a5dcb3373 selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure
For arm64 64K page size, the xdp data size was set to be more than 64K
in one of previous patches. This will cause failure for bpf_dynptr_memset().
Since the failure of bpf_dynptr_memset() is expected with 64K page size,
return success.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250725043440.209266-1-yonghong.song@linux.dev
2025-07-25 18:20:44 -07:00
Yonghong Song
90f791a975 selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure
For arm64 64K page size, the bpf_dynptr_copy() in test dynptr/test_dynptr_copy_xdp
will succeed, but the test will failure with 4K page size. This patch made a change
so the test will fail expectedly for both 4K and 64K page sizes.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://patch.msgid.link/20250725043435.208974-1-yonghong.song@linux.dev
2025-07-25 18:20:43 -07:00
Yonghong Song
4c82768e41 selftests/bpf: Increase xdp data size for arm64 64K page size
With arm64 64K page size, the following 4 subtests failed:
  #97/25   dynptr/test_probe_read_user_dynptr:FAIL
  #97/26   dynptr/test_probe_read_kernel_dynptr:FAIL
  #97/27   dynptr/test_probe_read_user_str_dynptr:FAIL
  #97/28   dynptr/test_probe_read_kernel_str_dynptr:FAIL

These failures are due to function bpf_dynptr_check_off_len() in
include/linux/bpf.h where there is a test
  if (len > size || offset > size - len)
    return -E2BIG;
With 64K page size, the 'offset' is greater than 'size - len',
which caused the test failure.

For 64KB page size, this patch increased the xdp buffer size from 5000 to
90000. The above 4 test failures are fixed as 'size' value is increased.
But it introduced two new failures:
  #97/4    dynptr/test_dynptr_copy_xdp:FAIL
  #97/12   dynptr/test_dynptr_memset_xdp_chunks:FAIL

These two failures will be addressed in subsequent patches.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://patch.msgid.link/20250725043430.208469-1-yonghong.song@linux.dev
2025-07-25 18:20:43 -07:00
Florian Westphal
3c3ab65f00 selftests: net: Enable legacy netfilter legacy options.
Some specified options rely on NETFILTER_XTABLES_LEGACY to be enabled.
IP_NF_TARGET_TTL for instance depends on IP_NF_MANGLE which in turn
depends on IP_NF_IPTABLES_LEGACY -> NETFILTER_XTABLES_LEGACY.

Enable relevant iptables config options explicitly, this is needed
to avoid breakage when symbols related to iptables-legacy
will depend on NETFILTER_LEGACY resp. IP_TABLES_LEGACY.

This also means that the classic tables (Kernel modules) will
not be enabled by default, so enable them too.

Signed-off-by: Florian Westphal <fw@strlen.de>
[bigeasy: Split out the config bits from the main patch]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25 18:38:55 +02:00
Jakub Kicinski
a4f5759b6f bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaIJYlAAKCRAraS+Z+3y5
 E5MFAQDW29BJyjRbB75oy6RxmFZX+xFmGgmy1XO3w822gIwgzQD/WzhsmFPDYv/F
 7iOpLvez6zTySUdTJXJGCTvYJG5EHwU=
 =U8S4
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-07-24

We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 4 files changed, 40 insertions(+), 15 deletions(-).

The main changes are:

1) Improved verifier error message for incorrect narrower load from
   pointer field in ctx, from Paul Chaignon.

2) Disabled migration in nf_hook_run_bpf to address a syzbot report,
   from Kuniyuki Iwashima.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Test invalid narrower ctx load
  bpf: Reject narrower access to pointer ctx fields
  bpf: Disable migration in nf_hook_run_bpf().
====================

Link: https://patch.msgid.link/20250724173306.3578483-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 18:02:24 -07:00
Jakub Kicinski
8b5a19b4ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc8).

Conflicts:

drivers/net/ethernet/microsoft/mana/gdma_main.c
  9669ddda18 ("net: mana: Fix warnings for missing export.h header inclusion")
  7553911210 ("net: mana: Allocate MSI-X vectors dynamically")
https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/ti/icssg/icssg_prueth.h
  6e86fb73de ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG")
  ffe8a49091 ("net: ti: icssg-prueth: Read firmware-names from device tree")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 11:10:46 -07:00
Paul Chaignon
ba578b87fe selftests/bpf: Test invalid narrower ctx load
This patch adds selftests to cover invalid narrower loads on the
context. These used to cause kernel warnings before the previous patch.
To trigger the warning, the load had to be aligned, to read an affected
context field (ex., skb->sk), and not starting at the beginning of the
field.

The nine new cases all fail without the previous patch.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/44cd83ea9c6868079943f0a436c6efa850528cc1.1753194596.git.paul.chaignon@gmail.com
2025-07-23 19:35:56 -07:00
Alexei Starovoitov
beb1097ec8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-18 12:15:59 -07:00
Jakub Kicinski
ffe5aedc43 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaHlCFwAKCRAraS+Z+3y5
 E6qQAP9jVyIq+bKkZhRkew07cDNbYB01rJkJEO0Y/N7hnTyfwgD+PhiXGv5FiPp9
 8iM3d51QKCOLlR/h3zc2RqR72S17RQA=
 =ZaJz
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-07-17

We've added 13 non-merge commits during the last 20 day(s) which contain
a total of 4 files changed, 712 insertions(+), 84 deletions(-).

The main changes are:

1) Avoid skipping or repeating a sk when using a TCP bpf_iter,
   from Jordan Rife.

2) Clarify the driver requirement on using the XDP metadata,
   from Song Yoong Siang

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  doc: xdp: Clarify driver implementation for XDP Rx metadata
  selftests/bpf: Add tests for bucket resume logic in established sockets
  selftests/bpf: Create iter_tcp_destroy test program
  selftests/bpf: Create established sockets in socket iterator tests
  selftests/bpf: Make ehash buckets configurable in socket iterator tests
  selftests/bpf: Allow for iteration over multiple states
  selftests/bpf: Allow for iteration over multiple ports
  selftests/bpf: Add tests for bucket resume logic in listening sockets
  bpf: tcp: Avoid socket skips and repeats during iteration
  bpf: tcp: Use bpf_tcp_iter_batch_item for bpf_tcp_iter_state batch items
  bpf: tcp: Get rid of st_bucket_done
  bpf: tcp: Make sure iter->batch always contains a full bucket snapshot
  bpf: tcp: Make mem flags configurable through bpf_iter_tcp_realloc_batch
====================

Link: https://patch.msgid.link/20250717191731.4142326-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17 18:07:37 -07:00
Puranjay Mohan
0769857a07 selftests/bpf: fix implementation of smp_mb()
As BPF doesn't include any barrier instructions, smp_mb() is implemented
by doing a dummy value returning atomic operation. Such an operation
acts a full barrier as enforced by LKMM and also by the work in progress
BPF memory model.

If the returned value is not used, clang[1] can optimize the value
returning atomic instruction in to a normal atomic instruction which
provides no ordering guarantees.

Mark the variable as volatile so the above optimization is never
performed and smp_mb() works as expected.

[1] https://godbolt.org/z/qzze7bG6z

Fixes: 88d706ba7c ("selftests/bpf: Introduce arena spin lock")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20250710175434.18829-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:38:52 -07:00
Tao Chen
fd60aa0a45 bpf/selftests: Add selftests for token info
A previous change added bpf_token_info to get token info with
bpf_get_obj_info_by_fd, this patch adds a new test for token info.

 #461/12  token/bpf_token_info:OK

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Link: https://lore.kernel.org/r/20250716134654.1162635-2-chen.dylane@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:38:05 -07:00
Ilya Leoshkevich
d459dbbbfa selftests/bpf: Stress test attaching a BPF prog to another BPF prog
Add a test that invokes a BPF prog in a loop, while concurrently
attaching and detaching another BPF prog to and from it. This helps
identifying race conditions in bpf_arch_text_poke().

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250716194524.48109-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:32:31 -07:00
Alexis Lothoré (eBPF Foundation)
4a760d2d7a selftests/bpf: enable tracing_struct tests for arm64
Now that the constraint preventing attachment to functions consuming
struct on stack has been removed from the kernel (and moved to pahole,
with a slightly smarter detection, to prevent only those that are
packed), re-enable the tracing_struct tests for arm64.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20250709-arm64_relax_jit_comp-v1-2-3850fe189092@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16 18:28:30 -07:00
Yonghong Song
e860a98c8a selftests/bpf: Fix build error due to certain uninitialized variables
With the latest llvm21 compiler, I hit several errors when building bpf
selftests. Some of errors look like below:

  test_maps.c:565:40: error: variable 'val' is uninitialized when passed as a
      const pointer argument here [-Werror,-Wuninitialized-const-pointer]
    565 |         assert(bpf_map_update_elem(fd, NULL, &val, 0) < 0 &&
        |                                               ^~~

  prog_tests/bpf_iter.c:400:25: error: variable 'c' is uninitialized when passed
      as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
  400 |         write(finish_pipe[1], &c, 1);
      |                                ^

Some other errors have similar the pattern as the above.

These errors are fixed by initializing those variables properly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250715185910.3659447-1-yonghong.song@linux.dev
2025-07-15 14:38:58 -07:00
Jordan Rife
f126f0ce7c selftests/bpf: Add tests for bucket resume logic in established sockets
Replicate the set of test cases used for UDP socket iterators to test
similar scenarios for TCP established sockets.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-07-14 15:12:54 -07:00
Jordan Rife
8fc0c5a82d selftests/bpf: Create iter_tcp_destroy test program
Prepare for bucket resume tests for established TCP sockets by creating
a program to immediately destroy and remove sockets from the TCP ehash
table, since close() is not deterministic.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-07-14 15:12:54 -07:00
Jordan Rife
07ebabbbfe selftests/bpf: Create established sockets in socket iterator tests
Prepare for bucket resume tests for established TCP sockets by creating
established sockets. Collect socket fds from connect() and accept()
sides and pass them to test cases.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-07-14 15:12:52 -07:00
Jordan Rife
08327292e7 selftests/bpf: Make ehash buckets configurable in socket iterator tests
Prepare for bucket resume tests for established TCP sockets by making
the number of ehash buckets configurable. Subsequent patches force all
established sockets into the same bucket by setting ehash_buckets to
one.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-07-14 15:11:39 -07:00
Jordan Rife
f00468124a selftests/bpf: Allow for iteration over multiple states
Add parentheses around loopback address check to fix up logic and make
the socket state filter configurable for the TCP socket iterators.
Iterators can skip the socket state check by setting ss to 0.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-07-14 12:09:09 -07:00
Jordan Rife
346066c327 selftests/bpf: Allow for iteration over multiple ports
Prepare to test TCP socket iteration over both listening and established
sockets by allowing the BPF iterator programs to skip the port check.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-07-14 12:09:09 -07:00
Jordan Rife
da1d987d3b selftests/bpf: Add tests for bucket resume logic in listening sockets
Replicate the set of test cases used for UDP socket iterators to test
similar scenarios for TCP listening sockets.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-07-14 12:09:09 -07:00
Paul Chaignon
d81526a6eb selftests/bpf: Range analysis test case for JSET
This patch adds coverage for the warning detected by syzkaller and fixed
in the previous patch. Without the previous patch, this test fails with:

  verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds
  violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0]
  var_off=(0x0, 0x0)(1)

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/c7893be1170fdbcf64e0200c110cdbd360ce7086.1752171365.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:45:25 -07:00
Emil Tsalapatis
9f9559f0ac selftests/bpf: add selftests for bpf_arena_reserve_pages
Add selftests for the new bpf_arena_reserve_pages kfunc.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250709191312.29840-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11 10:43:54 -07:00
Eduard Zingerman
ad97cb2ed0 selftests/bpf: Remove enum64 case from __arg_untrusted test suite
The enum64 type used by verifier_global_ptr_args test case requires
CONFIG_SCHED_CLASS_EXT. At the moment selftets do not depend on this
option. There are just a few enum64 types in the kernel. Instead of
tying selftests to implementation details of unrelated sub-systems,
just remove enum64 test case. Simple enums are covered and that should
be sufficient.

Fixes: 68cca81fd5 ("selftests/bpf: tests for __arg_untrusted void * global func params")
Reported-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20250708220856.3059578-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-08 18:57:47 -07:00
Jason Xing
680acde13f selftests/bpf: add a new test to check the consumer update case
The subtest sends 33 packets at one time on purpose to see if xsk
exitting __xsk_generic_xmit() updates the global consumer of tx queue
when reaching the max loop (max_tx_budget, 32 by default). The number 33
can avoid xskq_cons_peek_desc() updates the consumer when it's about to
quit sending, to accurately check if the issue that the first patch
resolves remains. The new case will not check this issue in zero copy
mode.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250703141712.33190-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08 18:28:07 -07:00
Paul Chaignon
192e3aa145 selftests/bpf: Negative test case for tail call map
This patch adds a negative test case for the following verifier error.

    expected prog array map for tail call

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/aGu0i1X_jII-3aFa@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:46:13 -07:00
Luis Gerhorst
92974cef83 selftests/bpf: Add Spectre v4 tests
Add the following tests:

1. A test with an (unimportant) ldimm64 (16 byte insn) and a
   Spectre-v4--induced nospec that clarifies and serves as a basic
   Spectre v4 test.

2. Make sure a Spectre v4 nospec_result does not prevent a Spectre v1
   nospec from being added before the dangerous instruction (tests that
   [1] is fixed).

3. Combine the two, which is the combination that triggers the warning
   in [2]. This is because the unanalyzed stack write has nospec_result
   set, but the ldimm64 (which was just analyzed) had incremented
   insn_idx by 2. That violates the assertion that nospec_result is only
   used after insns that increment insn_idx by 1 (i.e., stack writes).

[1] https://lore.kernel.org/bpf/4266fd5de04092aa4971cbef14f1b4b96961f432.camel@gmail.com/
[2] https://lore.kernel.org/bpf/685b3c1b.050a0220.2303ee.0010.GAE@google.com/

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Link: https://lore.kernel.org/r/20250705190908.1756862-3-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:32:34 -07:00
Saket Kumar Bhaskar
0f626c98fd selftests/bpf: Set CONFIG_PACKET=y for selftests
BPF selftest fails to build with below error:

  CLNG-BPF [test_progs] lsm_cgroup.bpf.o
progs/lsm_cgroup.c:105:21: error: variable has incomplete type 'struct sockaddr_ll'
  105 |         struct sockaddr_ll sa = {};
      |                            ^
progs/lsm_cgroup.c:105:9: note: forward declaration of 'struct sockaddr_ll'
  105 |         struct sockaddr_ll sa = {};
      |                ^
1 error generated.

lsm_cgroup selftest requires sockaddr_ll structure which is not there
in vmlinux.h when the kernel is built with CONFIG_PACKET=m.

Enabling CONFIG_PACKET=y ensures that sockaddr_ll is available in vmlinux,
allowing it to be captured in the generated vmlinux.h for bpf selftests.

Reported-by: Sachin P Bappalige <sachinpb@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250707071735.705137-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:31:57 -07:00
Eduard Zingerman
68cca81fd5 selftests/bpf: tests for __arg_untrusted void * global func params
Check usage of __arg_untrusted parameters of primitive type:
- passing of {trusted, untrusted, map value, scalar value, values with
  variable offset} to untrusted `void *`, `char *` or enum is ok;
- varifier represents such parameters as rdonly_untrusted_mem(sz=0).

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:07 -07:00
Eduard Zingerman
54ac2c9418 selftests/bpf: test cases for __arg_untrusted
Check usage of __arg_untrusted parameters with PTR_TO_BTF_ID:
- combining __arg_untrusted with other tags is forbidden;
- non-kernel (program local) types for __arg_untrusted are forbidden;
- passing of {trusted, untrusted, map value, scalar value, values with
  variable offset} to untrusted is ok;
- passing of PTR_TO_BTF_ID with a different type to untrusted is ok;
- passing of untrusted to trusted is forbidden.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:07 -07:00
Eduard Zingerman
f1f5d6f25d selftests/bpf: ptr_to_btf_id struct walk ending with primitive pointer
Validate that reading a PTR_TO_BTF_ID field produces a value of type
PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED, if field is a pointer to a
primitive type.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:06 -07:00
Eduard Zingerman
2d5c91e1cc bpf: rdonly_untrusted_mem for btf id walk pointer leafs
When processing a load from a PTR_TO_BTF_ID, the verifier calculates
the type of the loaded structure field based on the load offset.
For example, given the following types:

  struct foo {
    struct foo *a;
    int *b;
  } *p;

The verifier would calculate the type of `p->a` as a pointer to
`struct foo`. However, the type of `p->b` is currently calculated as a
SCALAR_VALUE.

This commit updates the logic for processing PTR_TO_BTF_ID to instead
calculate the type of p->b as PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED.
This change allows further dereferencing of such pointers (using probe
memory instructions).

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250704230354.1323244-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-07 08:25:06 -07:00
Kumar Kartikeya Dwivedi
5697683e13 selftests/bpf: Add tests for prog streams
Add selftests to stress test the various facets of the stream API,
memory allocation pattern, and ensuring dumping support is tested and
functional.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-13-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03 19:30:07 -07:00
Ihor Solodrai
7b29689263 selftests/bpf: Add test cases for bpf_dynptr_memset()
Add tests to verify the behavior of bpf_dynptr_memset():
  * normal memset 0
  * normal memset non-0
  * memset with an offset
  * memset in dynptr that was adjusted
  * error: size overflow
  * error: offset+size overflow
  * error: readonly dynptr
  * memset into non-linear xdp dynptr

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/bpf/20250702210309.3115903-3-isolodrai@meta.com
2025-07-03 15:21:20 -07:00
Mykyta Yatsenko
38d95beb4b selftests/bpf: Allow veristat compile standalone
Veristat is synced into the standalone repo, where it compiles without
kernel private dependencies. This patch fixes compilation errors in
standalone veristat.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250702175622.358405-1-mykyta.yatsenko5@gmail.com
2025-07-02 11:45:14 -07:00
Paul Chaignon
7ec899ac90 selftests/bpf: Negative test case for ref_obj_id in args
This patch adds a test case, as shown below, for the verifier error
"more than one arg with ref_obj_id".

    0: (b7) r2 = 20
    1: (b7) r3 = 0
    2: (18) r1 = 0xffff92cee3cbc600
    4: (85) call bpf_ringbuf_reserve#131
    5: (55) if r0 == 0x0 goto pc+3
    6: (bf) r1 = r0
    7: (bf) r2 = r0
    8: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204
    9: (95) exit

This error is currently incorrectly reported as a verifier bug, with a
warning. The next patch in this series will address that.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/3ba78e6cda47ccafd6ea70dadbc718d020154664.1751463262.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02 10:43:34 -07:00
Eduard Zingerman
a90f5f7370 selftests/bpf: null checks for rdonly_untrusted_mem should be preserved
Test case checking that verifier does not assume rdonly_untrusted_mem
values as not null.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250702073620.897517-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02 10:43:30 -07:00
Matteo Croce
07ee18a0bc selftests/bpf: Don't call fsopen() as privileged user
In the BPF token example, the fsopen() syscall is called as privileged
user. This is unneeded because fsopen() can be called also as
unprivileged user from the user namespace.
As the `fs_fd` file descriptor which was sent back and forth is still the
same, keep it open instead of cloning and closing it twice via SCM_RIGHTS.

cfr. https://github.com/systemd/systemd/pull/36134

Signed-off-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20250701183123.31781-1-technoboy85@gmail.com
2025-07-02 10:42:54 -07:00
Song Liu
21eebc655b
selftests/bpf: Add tests for bpf_cgroup_read_xattr
Add tests for different scenarios with bpf_cgroup_read_xattr:
1. Read cgroup xattr from bpf_cgroup_from_id;
2. Read cgroup xattr from bpf_cgroup_ancestor;
3. Read cgroup xattr from css_iter;
4. Use bpf_cgroup_read_xattr in LSM hook security_socket_connect.
5. Use bpf_cgroup_read_xattr in cgroup program.

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/20250623063854.1896364-5-song@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-02 14:18:20 +02:00
Paul Chaignon
bf4807c89d selftests/bpf: Add negative test cases for snprintf
This patch adds a couple negative test cases with a trailing % at the
end of the format string. The %p% case was fixed by the previous commit,
whereas the %s% case was already successfully rejected before.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/0669bf6eb4f9e5bb10e949d60311c06e2d942447.1751395489.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-01 15:22:46 -07:00
Colin Ian King
1230be8209 selftests/bpf: Fix spelling mistake "subtration" -> "subtraction"
There are spelling mistakes in description text. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250630125528.563077-1-colin.i.king@gmail.com
2025-07-01 13:28:56 -07:00
Mykyta Yatsenko
cce3fee729 selftests/bpf: Enable dynptr/test_probe_read_user_str_dynptr
Enable previously disabled dynptr/test_probe_read_user_str_dynptr test,
after the fix it depended on was merged into bpf-next.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250630133515.1108325-1-mykyta.yatsenko5@gmail.com
2025-07-01 12:42:23 -07:00
Eduard Zingerman
c4b1be928e selftests/bpf: bpf_rdonly_cast u{8,16,32,64} access tests
Tests with aligned and misaligned memory access of different sizes via
pointer returned by bpf_rdonly_cast().

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250627015539.1439656-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27 18:55:28 -07:00
Mykyta Yatsenko
ffaff1804e selftests/bpf: improve error messages in veristat
Return error if preset parsing fails. Avoid proceeding with veristat run
if preset does not parse.
Before:
```
./veristat set_global_vars.bpf.o -G "arr[999999999999999999999] = 1"
Failed to parse value '999999999999999999999'
Processing 'set_global_vars.bpf.o'...
File                   Program           Verdict  Duration (us)  Insns  States  Program size  Jited size
---------------------  ----------------  -------  -------------  -----  ------  ------------  ----------
set_global_vars.bpf.o  test_set_globals  success             27     64       0            82           0
---------------------  ----------------  -------  -------------  -----  ------  ------------  ----------
Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs.
```
After:
```
./veristat set_global_vars.bpf.o -G "arr[999999999999999999999] = 1"
Failed to parse value '999999999999999999999'
Failed to parse global variable presets: arr[999999999999999999999] = 1
```

Improve error messages:
 * If preset struct member can't be found.
 * Array index out of bounds

Extract rtrim function.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250627144342.686896-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27 18:54:46 -07:00
Song Liu
bacdf5a0e6 selftests/bpf: Fix cgroup_xattr/read_cgroupfs_xattr
cgroup_xattr/read_cgroupfs_xattr has two issues:

1. cgroup_xattr/read_cgroupfs_xattr messes up lo without creating a netns
   first. This causes issue with other tests.

   Fix this by using a different hook (lsm.s/file_open) and not messing
   with lo.

2. cgroup_xattr/read_cgroupfs_xattr sets up cgroups without proper
   mount namespaces.

   Fix this by using the existing cgroup helpers. A new helper
   set_cgroup_xattr() is added to set xattr on cgroup files.

Fixes: f4fba2d6d2 ("selftests/bpf: Add tests for bpf_cgroup_read_xattr")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Closes: https://lore.kernel.org/bpf/CAADnVQ+iqMi2HEj_iH7hsx+XJAsqaMWqSDe4tzcGAnehFWA9Sw@mail.gmail.com/
Signed-off-by: Song Liu <song@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20250627191221.765921-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27 14:34:08 -07:00
Alexei Starovoitov
48d998af99 Merge branch 'vfs-6.17.bpf' of https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Merge branch 'vfs-6.17.bpf' from vfs tree into bpf-next/master
and resolve conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 19:01:04 -07:00
Jakub Kicinski
32155c6fd9 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iIsEABYKADMWIQTFp0I1jqZrAX+hPRXbK58LschIgwUCaF3LFhUcZGFuaWVsQGlv
 Z2VhcmJveC5uZXQACgkQ2yufC7HISINtRgD+JagJmBokoPnsk7DfauJnVhaP95aV
 tsnna+fU1kGwS7MBAMINCoLyeISiD/XG0O+Om38czhhglWbl4+TgrthegPkE
 =opKf
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2025-06-27

We've added 6 non-merge commits during the last 8 day(s) which contain
a total of 6 files changed, 120 insertions(+), 20 deletions(-).

The main changes are:

1) Fix RCU usage in task_cls_state() for BPF programs using helpers like
   bpf_get_cgroup_classid_curr() outside of networking, from Charalampos
   Mitrodimas.

2) Fix a sockmap race between map_update and a pending workqueue from
   an earlier map_delete freeing the old psock where both pointed to the
   same psock->sk, from Jiayuan Chen.

3) Fix a data corruption issue when using bpf_msg_pop_data() in kTLS which
   failed to recalculate the ciphertext length, also from Jiayuan Chen.

4) Remove xdp_redirect_map{,_err} trace events since they are unused and
   also hide XDP trace events under CONFIG_BPF_SYSCALL, from Steven Rostedt.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  xdp: tracing: Hide some xdp events under CONFIG_BPF_SYSCALL
  xdp: Remove unused events xdp_redirect_map and xdp_redirect_map_err
  net, bpf: Fix RCU usage in task_cls_state() for BPF programs
  selftests/bpf: Add test to cover ktls with bpf_msg_pop_data
  bpf, ktls: Fix data corruption when using bpf_msg_pop_data() in ktls
  bpf, sockmap: Fix psock incorrectly pointing to sk
====================

Link: https://patch.msgid.link/20250626230111.24772-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26 17:13:17 -07:00
Mykyta Yatsenko
583588594b selftests/bpf: Test array presets in veristat
Modify existing veristat tests to verify that array presets are applied
as expected.
Introduce few negative tests as well to check that common error modes
are handled.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250625165904.87820-4-mykyta.yatsenko5@gmail.com
2025-06-26 10:28:51 -07:00
Mykyta Yatsenko
edc99d0b02 selftests/bpf: Support array presets in veristat
Implement support for presetting values for array elements in veristat.
For example:
```
sudo ./veristat set_global_vars.bpf.o -G "arr[3] = 1"
```
Arrays of structures and structure of arrays work, but each individual
scalar value has to be set separately: `foo[1].bar[2] = value`.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250625165904.87820-3-mykyta.yatsenko5@gmail.com
2025-06-26 10:28:50 -07:00
Mykyta Yatsenko
be898cb5cb selftests/bpf: Separate var preset parsing in veristat
Refactor var preset parsing in veristat to simplify implementation.
Prepare parsed variable beforehand so that parsing logic is separated
from functionality of calculating offsets and searching fields.
Introduce rvalue struct, storing either int or enum (string value),
will be reused in the next patch, extract parsing rvalue into a
separate function.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250625165904.87820-2-mykyta.yatsenko5@gmail.com
2025-06-26 10:28:50 -07:00
Alexei Starovoitov
886178a33a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3
Cross-merge BPF, perf and other fixes after downstream PRs.
It restores BPF CI to green after critical fix
commit bc4394e5e7 ("perf: Fix the throttle error of some clock events")

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:49:39 -07:00
Viktor Malik
e8763fb66a selftests/bpf: Add tests for string kfuncs
Add both positive and negative tests cases using string kfuncs added in
the previous patches.

Positive tests check that the functions work as expected.

Negative tests pass various incorrect strings to the kfuncs and check
for the expected error codes:
  -E2BIG  when passing too long strings
  -EFAULT when trying to read inaccessible kernel memory
  -ERANGE when passing userspace pointers on arches with non-overlapping
          address spaces

A majority of the tests use the RUN_TESTS helper which executes BPF
programs with BPF_PROG_TEST_RUN and check for the expected return value.
An exception to this are tests for long strings as we need to memset the
long string from userspace (at least I haven't found an ergonomic way to
memset it from a BPF program), which cannot be done using the RUN_TESTS
infrastructure.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/090451a2e60c9ae1dceb4d1bfafa3479db5c7481.1750917800.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:44:46 -07:00
Viktor Malik
a55b7d3932 selftests/bpf: Allow macros in __retval
Allow macro expansion for values passed to the `__retval` and
`__retval_unpriv` attributes. This is especially useful for testing
programs which return various error codes.

With this change, the code for parsing special literals can be made
simpler, as the literals are defined via macros. The only exception is
INT_MIN which expands to (-INT_MAX -1), which is not single number and
cannot be parsed by strtol. So, we instead use a prefixed literal
_INT_MIN in __retval and handle it separately (assign the expected
return to INT_MIN). Also, strtol cannot handle the "ll" suffix so change
the value of POINTER_VALUE from 0xcafe4all to 0xbadcafe.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/a6c6b551ae0575351faa7b7a1df52f9341a5cbe8.1750917800.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26 09:44:46 -07:00
Willem de Bruijn
5e9388f798 selftests/bpf: adapt one more case in test_lru_map to the new target_free
The below commit that updated BPF_MAP_TYPE_LRU_HASH free target,
also updated tools/testing/selftests/bpf/test_lru_map to match.

But that missed one case that passes with 4 cores, but fails at
higher cpu counts.

Update test_lru_sanity3 to also adjust its expectation of target_free.

This time tested with 1, 4, 16, 64 and 384 cpu count.

Fixes: d4adf1c9ee ("bpf: Adjust free target to avoid global starvation of LRU map")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20250625210412.2732970-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:19:36 -07:00
Eduard Zingerman
12ed81f823 selftests/bpf: check operations on untrusted ro pointers to mem
The following cases are tested:
- it is ok to load memory at any offset from rdonly_untrusted_mem;
- rdonly_untrusted_mem offset/bounds are not tracked;
- writes into rdonly_untrusted_mem are forbidden;
- atomic operations on rdonly_untrusted_mem are forbidden;
- rdonly_untrusted_mem can't be passed as a memory argument of a
  helper of kfunc;
- it is ok to use PTR_TO_MEM and PTR_TO_BTF_ID in a same load
  instruction.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625182414.30659-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:13:16 -07:00
Song Liu
2945434e24 selftests/bpf: Add tests for BPF_NEG range tracking logic
BPF_REG now has range tracking logic. Add selftests for BPF_NEG.
Specifically, return value of LSM hook lsm.s/socket_connect is used to
show that the verifer tracks BPF_NEG(1) falls in the [-4095, 0] range;
while BPF_NEG(100000) does not fall in that range.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625164025.3310203-3-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:12:18 -07:00
Song Liu
aced132599 bpf: Add range tracking for BPF_NEG
Add range tracking for instruction BPF_NEG. Without this logic, a trivial
program like the following will fail

    volatile bool found_value_b;
    SEC("lsm.s/socket_connect")
    int BPF_PROG(test_socket_connect)
    {
        if (!found_value_b)
                return -1;
        return 0;
    }

with verifier log:

"At program exit the register R0 has smin=0 smax=4294967295 should have
been in [-4095, 0]".

This is because range information is lost in BPF_NEG:

0: R1=ctx() R10=fp0
; if (!found_value_b) @ xxxx.c:24
0: (18) r1 = 0xffa00000011e7048       ; R1_w=map_value(...)
2: (71) r0 = *(u8 *)(r1 +0)           ; R0_w=scalar(smin32=0,smax=255)
3: (a4) w0 ^= 1                       ; R0_w=scalar(smin32=0,smax=255)
4: (84) w0 = -w0                      ; R0_w=scalar(range info lost)

Note that, the log above is manually modified to highlight relevant bits.

Fix this by maintaining proper range information with BPF_NEG, so that
the verifier will know:

4: (84) w0 = -w0                      ; R0_w=scalar(smin32=-255,smax=0)

Also updated selftests based on the expected behavior.

Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625164025.3310203-2-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25 15:12:17 -07:00
Yonghong Song
d69bafe6ee selftests/bpf: Fix usdt multispec failure with arm64/clang20 selftest build
When building the selftest with arm64/clang20, the following test failed:
  ...
  ubtest_multispec_usdt:PASS:usdt_100_called 0 nsec
  subtest_multispec_usdt:PASS:usdt_100_sum 0 nsec
  subtest_multispec_usdt:FAIL:usdt_300_bad_attach unexpected pointer: 0xaaaad82a2a80
  #471/2   usdt/multispec:FAIL
  #471     usdt:FAIL

But arm64/gcc11 built kernel selftests succeeded. Further debug found arm64/clang
generated code has much less argument pattern after dedup, but gcc generated
code has a lot more.

Check usdt probes with usdt.test.o on arm64 platform:

with gcc11 build binary:
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000054f8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp]
  stapsdt              0x00000031       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000005510, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 4]
  ...
  stapsdt              0x00000032       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000005660, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 60]
  ...
  stapsdt              0x00000034       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000070e8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 1192]
  stapsdt              0x00000034       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000007100, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 1196]
  ...
  stapsdt              0x00000032       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000009ec4, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[sp, 60]

with clang20 build binary:
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000009a0, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x9]
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000009b8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x9]
  ...
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000002590, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x9]
  stapsdt              0x0000002e       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x00000000000025a8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x8]
  ...
  stapsdt              0x0000002f       NT_STAPSDT (SystemTap probe descriptors)
    Provider: test
    Name: usdt_300
    Location: 0x0000000000007fdc, Base: 0x0000000000000000, Semaphore: 0x0000000000000008
    Arguments: -4@[x10]

There are total 300 locations for usdt_300. For gcc11 built binary, there are
300 spec's. But for clang20 built binary, there are 3 spec's. The default
BPF_USDT_MAX_SPEC_CNT is 256, so bpf_program__attach_usdt() will fail for gcc
but it will succeed with clang.

To fix the problem, do not do bpf_program__attach_usdt() for usdt_300
with arm64/clang setup.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250624211802.2198821-1-yonghong.song@linux.dev
2025-06-25 12:52:54 -07:00
Adin Scannell
fa6f092cc0 libbpf: Fix possible use-after-free for externs
The `name` field in `obj->externs` points into the BTF data at initial
open time. However, some functions may invalidate this after opening and
before loading (e.g. `bpf_map__set_value_size`), which results in
pointers into freed memory and undefined behavior.

The simplest solution is to simply `strdup` these strings, similar to
the `essent_name`, and free them at the same time.

In order to test this path, the `global_map_resize` BPF selftest is
modified slightly to ensure the presence of an extern, which causes this
test to fail prior to the fix. Given there isn't an obvious API or error
to test against, I opted to add this to the existing test as an aspect
of the resizing feature rather than duplicate the test.

Fixes: 9d0a23313b ("libbpf: Add capability for resizing datasec maps")
Signed-off-by: Adin Scannell <amscanne@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250625050215.2777374-1-amscanne@meta.com
2025-06-25 12:28:58 -07:00
Harishankar Vishwanathan
e1d794541b selftests/bpf: Add testcases for BPF_ADD and BPF_SUB
The previous commit improves the precision in scalar(32)_min_max_add,
and scalar(32)_min_max_sub. The improvement in precision occurs in cases
when all outcomes overflow or underflow, respectively.

This commit adds selftests that exercise those cases.

This commit also adds selftests for cases where the output register
state bounds for u(32)_min/u(32)_max are conservatively set to unbounded
(when there is partial overflow or underflow).

Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu>
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250623040359.343235-3-harishankar.vishwanathan@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-24 18:48:34 -07:00
Jerome Marchand
b8a205486e selftests/bpf: Convert test_sysctl to prog_tests
Convert test_sysctl test to prog_tests with minimal change to the
tests themselves.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250619140603.148942-3-jmarchan@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-23 21:50:44 -07:00
Luis Gerhorst
3ce7cdde66 selftests/bpf: Support ppc64el in vmtest
With a rootfs built using libbpf's BPF CI [1], we can run specific tests
as follows:

$ ../libbpf-ci/rootfs/mkrootfs_debian.sh --arch ppc64el --distro noble
$ PLATFORM=ppc64el CROSS_COMPILE=powerpc64le-linux-gnu- \
    tools/testing/selftests/bpf/vmtest.sh \
    -l libbpf-vmtest-rootfs-*-noble-ppc64el.tar.zst \
    -- ./test_progs -t verifier_array_access

Does not include a DENYLIST or support for KVM for now.

[1] https://github.com/libbpf/ci

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Link: https://lore.kernel.org/r/20250619140854.2135283-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-23 09:34:58 -07:00
Song Liu
f4fba2d6d2
selftests/bpf: Add tests for bpf_cgroup_read_xattr
Add tests for different scenarios with bpf_cgroup_read_xattr:
1. Read cgroup xattr from bpf_cgroup_from_id;
2. Read cgroup xattr from bpf_cgroup_ancestor;
3. Read cgroup xattr from css_iter;
4. Use bpf_cgroup_read_xattr in LSM hook security_socket_connect.
5. Use bpf_cgroup_read_xattr in cgroup program.

Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/20250623063854.1896364-5-song@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-23 13:03:12 +02:00
Slava Imameev
f8b19aeca1 selftests/bpf: Add test for bpftool access to read-only protected maps
Add selftest cases that validate bpftool's expected behavior when
accessing maps protected from modification via security_bpf_map.

The test includes a BPF program attached to security_bpf_map with two maps:
- A protected map that only allows read-only access
- An unprotected map that allows full access

The test script attaches the BPF program to security_bpf_map and
verifies that for the bpftool map command:
- Read access works on both maps
- Write access fails on the protected map
- Write access succeeds on the unprotected map
- These behaviors remain consistent when the maps are pinned

Signed-off-by: Slava Imameev <slava.imameev@crowdstrike.com>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20250620151812.13952-2-slava.imameev@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-20 11:13:03 -07:00
Willem de Bruijn
d4adf1c9ee bpf: Adjust free target to avoid global starvation of LRU map
BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the
map is full, due to percpu reservations and force shrink before
neighbor stealing. Once a CPU is unable to borrow from the global map,
it will once steal one elem from a neighbor and after that each time
flush this one element to the global list and immediately recycle it.

Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map
with 79 CPUs. CPU 79 will observe this behavior even while its
neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%).

CPUs need not be active concurrently. The issue can appear with
affinity migration, e.g., irqbalance. Each CPU can reserve and then
hold onto its 128 elements indefinitely.

Avoid global list exhaustion by limiting aggregate percpu caches to
half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count.
This change has no effect on sufficiently large tables.

Similar to LOCAL_NR_SCANS and lru->nr_scans, introduce a map variable
lru->free_target. The extra field fits in a hole in struct bpf_lru.
The cacheline is already warm where read in the hot path. The field is
only accessed with the lru lock held.

Tested-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20250618215803.3587312-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-18 18:50:14 -07:00
Eduard Zingerman
cd7312a78f selftests/bpf: include limits.h needed for PATH_MAX directly
Constant PATH_MAX is used in function unpriv_helpers.c:open_config().
This constant is provided by include file <limits.h>.
The dependency was added by commit [1], which does not include
<limits.h> directly, relying instead on <limits.h> being included from
zlib.h -> zconf.h.
As it turns out, this is not the case for all systems, e.g. on
Fedora 41 zlib 1.3.1 is used, and there <limits.h> is not included
from zconf.h. Hence, there is a compilation error on Fedora 41.

[1] commit fc2915bb8b ("selftests/bpf: More precise cpu_mitigations state detection")

Fixes: fc2915bb8b ("selftests/bpf: More precise cpu_mitigations state detection")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20250618093134.3078870-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-18 06:52:07 -07:00
James Bottomley
bd07bd12f2 bpf: Fix key serial argument of bpf_lookup_user_key()
The underlying lookup_user_key() function uses a signed 32 bit integer
for key serial numbers because legitimate serial numbers are positive
(and > 3) and keyrings are negative.  Using a u32 for the keyring in
the bpf function doesn't currently cause any conversion problems but
will start to trip the signed to unsigned conversion warnings when the
kernel enables them, so convert the argument to signed (and update the
tests accordingly) before it acquires more users.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/r/84cdb0775254d297d75e21f577089f64abdfbd28.camel@HansenPartnership.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 18:15:27 -07:00
Mykyta Yatsenko
66ab68c9de selftests/bpf: Fix unintentional switch case fall through
Break from switch expression after parsing -n CLI argument in veristat,
instead of falling through and enabling comparison mode.

Fixes: a5c57f81eb ("veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250617121536.1320074-1-mykyta.yatsenko5@gmail.com
2025-06-17 13:24:31 -07:00
Eduard Zingerman
fc2915bb8b selftests/bpf: More precise cpu_mitigations state detection
test_progs and test_verifier binaries execute unpriv tests under the
following conditions:
- unpriv BPF is enabled;
- CPU mitigations are enabled (see [1] for details).

The detection of the "mitigations enabled" state is performed by
unpriv_helpers.c:get_mitigations_off() via inspecting kernel boot
command line, looking for a parameter "mitigations=off".

Such detection scheme won't work for certain configurations,
e.g. when CONFIG_CPU_MITIGATIONS is disabled and boot parameter is
not supplied.

Miss-detection leads to test_progs executing tests meant to be run
only with mitigations enabled, e.g.
verifier_and.c:known_subreg_with_unknown_reg(), and reporting false
failures.

Internally, verifier sets bpf_verifier_env->bypass_spec_{v1,v4}
basing on the value returned by kernel/cpu.c:cpu_mitigations_off().
This function is backed by a variable kernel/cpu.c:cpu_mitigations.

This state is not fully introspect-able via sysfs. The closest proxy
is /sys/devices/system/cpu/vulnerabilities/spectre_v1, but it reports
"vulnerable" state only if mitigations are disabled *and* current cpu
is vulnerable, while verifier does not check cpu state.

There are only two ways the kernel/cpu.c:cpu_mitigations can be set:
- via boot parameter;
- via CONFIG_CPU_MITIGATIONS option.

This commit updates unpriv_helpers.c:get_mitigations_off() to scan
/boot/config-$(uname -r) and /proc/config.gz for
CONFIG_CPU_MITIGATIONS value in addition to boot command line check.

Tested using the following configurations:
- mitigations enabled (unpriv tests are enabled)
- mitigations disabled via boot cmdline (unpriv tests skipped)
- mitigations disabled via CONFIG_CPU_MITIGATIONS
  (unpriv tests skipped)

[1] https://lore.kernel.org/bpf/20231025031144.5508-1-laoar.shao@gmail.com/

Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250617005710.1066165-2-eddyz87@gmail.com
2025-06-17 13:23:49 -07:00
Yonghong Song
a633dab4b4 selftests/bpf: Fix RELEASE build failure with gcc14
With gcc14, when building with RELEASE=1, I hit four below compilation
failure:

Error 1:
  In file included from test_loader.c:6:
  test_loader.c: In function ‘run_subtest’: test_progs.h:194:17:
      error: ‘retval’ may be used uninitialized in this function
   [-Werror=maybe-uninitialized]
    194 |                 fprintf(stdout, ##format);           \
        |                 ^~~~~~~
  test_loader.c:958:13: note: ‘retval’ was declared here
    958 |         int retval, err, i;
        |             ^~~~~~

  The uninitialized var 'retval' actually could cause incorrect result.

Error 2:
  In function ‘test_fd_array_cnt’:
  prog_tests/fd_array.c:71:14: error: ‘btf_id’ may be used uninitialized in this
      function [-Werror=maybe-uninitialized]
     71 |         fd = bpf_btf_get_fd_by_id(id);
        |              ^~~~~~~~~~~~~~~~~~~~~~~~
  prog_tests/fd_array.c:302:15: note: ‘btf_id’ was declared here
    302 |         __u32 btf_id;
        |               ^~~~~~

  Changing ASSERT_GE to ASSERT_EQ can fix the compilation error. Otherwise,
  there is no functionality change.

Error 3:
  prog_tests/tailcalls.c: In function ‘test_tailcall_hierarchy_count’:
  prog_tests/tailcalls.c:1402:23: error: ‘fentry_data_fd’ may be used uninitialized
      in this function [-Werror=maybe-uninitialized]
     1402 |                 err = bpf_map_lookup_elem(fentry_data_fd, &i, &val);
          |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  The code is correct. The change intends to silence gcc errors.

Error 4: (this error only happens on arm64)
  In file included from prog_tests/log_buf.c:4:
  prog_tests/log_buf.c: In function ‘bpf_prog_load_log_buf’:
  ./test_progs.h:390:22: error: ‘log_buf’ may be used uninitialized [-Werror=maybe-uninitialized]
    390 |         int ___err = libbpf_get_error(___res);             \
        |                      ^~~~~~~~~~~~~~~~~~~~~~~~
  prog_tests/log_buf.c:158:14: note: in expansion of macro ‘ASSERT_OK_PTR’
    158 |         if (!ASSERT_OK_PTR(log_buf, "log_buf_alloc"))
        |              ^~~~~~~~~~~~~
  In file included from selftests/bpf/tools/include/bpf/bpf.h:32,
                 from ./test_progs.h:36:
  selftests/bpf/tools/include/bpf/libbpf_legacy.h:113:17:
    note: by argument 1 of type ‘const void *’ to ‘libbpf_get_error’ declared here
    113 | LIBBPF_API long libbpf_get_error(const void *ptr);
        |                 ^~~~~~~~~~~~~~~~

  Adding a pragma to disable maybe-uninitialized fixed the issue.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250617044956.2686668-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 10:18:30 -07:00
Song Liu
a766cfbbeb bpf: Mark dentry->d_inode as trusted_or_null
LSM hooks such as security_path_mknod() and security_inode_rename() have
access to newly allocated negative dentry, which has NULL d_inode.
Therefore, it is necessary to do the NULL pointer check for d_inode.

Also add selftests that checks the verifier enforces the NULL pointer
check.

Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20250613052857.1992233-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17 08:40:59 -07:00
Eduard Zingerman
4a4b84ba9e selftests/bpf: verify jset handling in CFG computation
A test case to check if both branches of jset are explored when
computing program CFG.

At 'if r1 & 0x7 ...':
- register 'r2' is computed alive only if jump branch of jset
  instruction is followed;
- register 'r0' is computed alive only if fallthrough branch of jset
  instruction is followed.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250613175331.3238739-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-13 11:51:19 -07:00
Eduard Zingerman
67cdcc405b veristat: Memory accounting for bpf programs
This commit adds a new field mem_peak / "Peak memory (MiB)" field to a
set of gathered statistics. The field is intended as an estimate for
peak verifier memory consumption for processing of a given program.

Mechanically stat is collected as follows:
- At the beginning of handle_verif_mode() a new cgroup is created
  and veristat process is moved into this cgroup.
- At each program load:
  - bpf_object__load() is split into bpf_object__prepare() and
    bpf_object__load() to avoid accounting for memory allocated for
    maps;
  - before bpf_object__load():
    - a write to "memory.peak" file of the new cgroup is used to reset
      cgroup statistics;
    - updated value is read from "memory.peak" file and stashed;
  - after bpf_object__load() "memory.peak" is read again and
    difference between new and stashed values is used as a metric.

If any of the above steps fails veristat proceeds w/o collecting
mem_peak information for a program, reporting mem_peak as -1.

While memcg provides data in bytes (converted from pages), veristat
converts it to megabytes to avoid jitter when comparing results of
different executions.

The change has no measurable impact on veristat running time.

A correlation between "Peak states" and "Peak memory" fields provides
a sanity check for gathered statistics, e.g. a sample of data for
sched_ext programs:

Program                   Peak states  Peak memory (MiB)
------------------------  -----------  -----------------
lavd_select_cpu                  2153                 44
lavd_enqueue                     1982                 41
lavd_dispatch                    3480                 28
layered_dispatch                 1417                 17
layered_enqueue                   760                 11
lavd_cpu_offline                  349                  6
lavd_cpu_online                   349                  6
lavd_init                         394                  6
rusty_init                        350                  5
layered_select_cpu                391                  4
...
rusty_stopping                    134                  1
arena_topology_node_init          170                  0

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613072147.3938139-3-eddyz87@gmail.com
2025-06-13 10:44:12 -07:00
Song Liu
ccefa19335 bpf/veristat: Fix veristat for map type BPF_MAP_TYPE_CGRP_STORAGE
BPF_MAP_TYPE_CGRP_STORAGE doesn't allow non-zero max_entries. So veristat
should not set it to 1.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250613050001.1058733-1-song@kernel.org
2025-06-13 10:08:22 -07:00
Yonghong Song
44df9e0d4e selftests/bpf: Fix xdp_do_redirect failure with 64KB page size
On arm64 with 64KB page size, the selftest xdp_do_redirect failed like
below:
  ...
  test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec
  test_max_pkt_size:PASS:prog_run_max_size 0 nsec
  test_max_pkt_size:FAIL:prog_run_too_big unexpected prog_run_too_big: actual -28 != expected -22

With 64KB page size, the xdp frame size will be much bigger so
the existing test will fail.

Adjust various parameters so the test can also work on 64K page size.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035042.2208630-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Yonghong Song
96fcf7e7a7 selftests/bpf: Fix two net related test failures with 64K page size
When running BPF selftests on arm64 with a 64K page size, I encountered
the following two test failures:
  sockmap_basic/sockmap skb_verdict change tail:FAIL
  tc_change_tail:FAIL

With further debugging, I identified the root cause in the following
kernel code within __bpf_skb_change_tail():

    u32 max_len = BPF_SKB_MAX_LEN;
    u32 min_len = __bpf_skb_min_len(skb);
    int ret;

    if (unlikely(flags || new_len > max_len || new_len < min_len))
        return -EINVAL;

With a 4K page size, new_len = 65535 and max_len = 16064, the function
returns -EINVAL. However, With a 64K page size, max_len increases to
261824, allowing execution to proceed further in the function. This is
because BPF_SKB_MAX_LEN scales with the page size and larger page sizes
result in higher max_len values.

Updating the new_len parameter in both tests based on actual kernel
page size resolved both failures.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035037.2207911-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Yonghong Song
4fc012daf9 bpf: Fix an issue in bpf_prog_test_run_xdp when page size greater than 4K
The bpf selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow failed on
arm64 with 64KB page:
   xdp_adjust_tail/xdp_adjust_frags_tail_grow:FAIL

In bpf_prog_test_run_xdp(), the xdp->frame_sz is set to 4K, but later on
when constructing frags, with 64K page size, the frag data_len could
be more than 4K. This will cause problems in bpf_xdp_frags_increase_tail().

To fix the failure, the xdp->frame_sz is set to be PAGE_SIZE so kernel
can test different page size properly. With the kernel change, the user
space and bpf prog needs adjustment. Currently, the MAX_SKB_FRAGS default
value is 17, so for 4K page, the maximum packet size will be less than 68K.
To test 64K page, a bigger maximum packet size than 68K is desired. So two
different functions are implemented for subtest xdp_adjust_frags_tail_grow.
Depending on different page size, different data input/output sizes are used
to adapt with different page size.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250612035032.2207498-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 19:07:51 -07:00
Fushuai Wang
6a4bd31f68 selftests/bpf: fix signedness bug in redir_partial()
When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly
treats it as success due to unsigned promotion. Explicitly check for -1
first.

Fixes: a4b7193d8e ("selftests/bpf: Add sockmap test for redirecting partial skb data")
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Link: https://lore.kernel.org/r/20250612084208.27722-1-wangfushuai@baidu.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Eduard Zingerman
5159482fdb selftests/bpf: tests with a loop state missing read/precision mark
The test case absent_mark_in_the_middle_state is equivalent of the
following C program:

   1: r8 = bpf_get_prandom_u32();
   2: r6 = -32;
   3: bpf_iter_num_new(&fp[-8], 0, 10);
   4: if (unlikely(bpf_get_prandom_u32()))
   5:   r6 = -31;
   6: for (;;) {
   7:   if (!bpf_iter_num_next(&fp[-8]))
   8:     break;
   9:   if (unlikely(bpf_get_prandom_u32()))
  10:     *(u64 *)(fp + r6) = 7;
  11: }
  12: bpf_iter_num_destroy(&fp[-8]);
  13: return 0;

W/o a fix that instructs verifier to ignore branches count for loop
entries verification proceeds as follows:
- 1-4, state is {r6=-32,fp-8=active};
- 6, checkpoint A is created with {r6=-32,fp-8=active};
- 7, checkpoint B is created with {r6=-32,fp-8=active},
     push state {r6=-32,fp-8=active} from 7 to 9;
- 8,12,13, {r6=-32,fp-8=drained}, exit;
- pop state with {r6=-32,fp-8=active} from 7 to 9;
- 9, push state {r6=-32,fp-8=active} from 9 to 10;
- 6, checkpoint C is created with {r6=-32,fp-8=active};
- 7, checkpoint A is hit, no precision propagated for r6 to C;
- pop state {r6=-32,fp-8=active} from 9 to 10;
- 10, state is {r6=-31,fp-8=active}, r6 is marked as read and precise,
      these marks are propagated to checkpoints A and B (but not C, as
      it is not the parent of current state;
- 6, {r6=-31,fp-8=active} checkpoint C is hit, because r6 is not
     marked precise for this checkpoint;
- the program is accepted, despite a possibility of unaligned u64
  stack access at offset -31.

The test case absent_mark_in_the_middle_state2 is similar except the
following change:

       r8 = bpf_get_prandom_u32();
       r6 = -32;
       bpf_iter_num_new(&fp[-8], 0, 10);
       if (unlikely(bpf_get_prandom_u32())) {
         r6 = -31;
 + jump_into_loop:
 +       goto +0;
 +       goto loop;
 +     }
 +     if (unlikely(bpf_get_prandom_u32()))
 +       goto jump_into_loop;
 + loop:
       for (;;) {
         if (!bpf_iter_num_next(&fp[-8]))
           break;
         if (unlikely(bpf_get_prandom_u32()))
           *(u64 *)(fp + r6) = 7;
       }
       bpf_iter_num_destroy(&fp[-8])
       return 0

The goal is to check that read/precision marks are propagated to
checkpoint created at 'goto +0' that resides outside of the loop.

The test case absent_mark_in_the_middle_state3 is a bit different and
is equivalent to the C program below:

    int absent_mark_in_the_middle_state3(void)
    {
      bpf_iter_num_new(&fp[-8], 0, 10)
      loop1(-32, &fp[-8])
      loop1_wrapper(&fp[-8])
      bpf_iter_num_destroy(&fp[-8])
    }

    int loop1(num, iter)
    {
      while (bpf_iter_num_next(iter)) {
        if (unlikely(bpf_get_prandom_u32()))
          *(fp + num) = 7;
      }
      return 0
    }

    int loop1_wrapper(iter)
    {
      r6 = -32;
      if (unlikely(bpf_get_prandom_u32()))
        r6 = -31;
      loop1(r6, iter);
      return 0;
    }

The unsafe state is reached in a similar manner, but the loop is
located inside a subprogram that is called from two locations in the
main subprogram. This detail is important for exercising
bpf_scc_visit->backedges memory management.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250611200836.4135542-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-12 16:52:43 -07:00
Yonghong Song
517b088a84 selftests/bpf: Fix cgroup_mprog_ordering failure due to uninitialized variable
On arm64, the cgroup_mprog_ordering selftest failed with test_progs run
when building with clang compiler. The reason is due to socklen_t optlen
not initialized.

In kernel function do_ip_getsockopt(), we have

        if (copy_from_sockptr(&len, optlen, sizeof(int)))
                return -EFAULT;
        if (len < 0)
                return -EINVAL;

The above 'len' variable is a negative value and hence the test failed.

But the test is okay on x86_64. I checked the x86_64 asm code and I didn't
see explicit initialization of 'optlen' but its value is 0 so kernel
didn't return error. This should be a pure luck.

Fix the bug by initializing 'oplen' var properly.

Fixes: e422d5f118 ("selftests/bpf: Add two selftests for mprog API based cgroup progs")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250611162103.1623692-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-11 12:32:50 -07:00
Jiayuan Chen
f1c025773f selftests/bpf: Add test to cover ktls with bpf_msg_pop_data
The selftest can reproduce an issue where using bpf_msg_pop_data() in
ktls causes errors on the receiving end.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20250609020910.397930-3-jiayuan.chen@linux.dev
2025-06-11 16:59:45 +02:00
Luis Gerhorst
4a8765d9a5 selftests/bpf: Add test for Spectre v1 mitigation
This is based on the gadget from the description of commit 9183671af6db
("bpf: Fix leakage under speculation on mispredicted branches").

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250603212814.338867-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:10 -07:00
Luis Gerhorst
d6f1c85f22 bpf: Fall back to nospec for Spectre v1
This implements the core of the series and causes the verifier to fall
back to mitigating Spectre v1 using speculation barriers. The approach
was presented at LPC'24 [1] and RAID'24 [2].

If we find any forbidden behavior on a speculative path, we insert a
nospec (e.g., lfence speculation barrier on x86) before the instruction
and stop verifying the path. While verifying a speculative path, we can
furthermore stop verification of that path whenever we encounter a
nospec instruction.

A minimal example program would look as follows:

	A = true
	B = true
	if A goto e
	f()
	if B goto e
	unsafe()
e:	exit

There are the following speculative and non-speculative paths
(`cur->speculative` and `speculative` referring to the value of the
push_stack() parameters):

- A = true
- B = true
- if A goto e
  - A && !cur->speculative && !speculative
    - exit
  - !A && !cur->speculative && speculative
    - f()
    - if B goto e
      - B && cur->speculative && !speculative
        - exit
      - !B && cur->speculative && speculative
        - unsafe()

If f() contains any unsafe behavior under Spectre v1 and the unsafe
behavior matches `state->speculative &&
error_recoverable_with_nospec(err)`, do_check() will now add a nospec
before f() instead of rejecting the program:

	A = true
	B = true
	if A goto e
	nospec
	f()
	if B goto e
	unsafe()
e:	exit

Alternatively, the algorithm also takes advantage of nospec instructions
inserted for other reasons (e.g., Spectre v4). Taking the program above
as an example, speculative path exploration can stop before f() if a
nospec was inserted there because of Spectre v4 sanitization.

In this example, all instructions after the nospec are dead code (and
with the nospec they are also dead code speculatively).

For this, it relies on the fact that speculation barriers generally
prevent all later instructions from executing if the speculation was not
correct:

* On Intel x86_64, lfence acts as full speculation barrier, not only as
  a load fence [3]:

    An LFENCE instruction or a serializing instruction will ensure that
    no later instructions execute, even speculatively, until all prior
    instructions complete locally. [...] Inserting an LFENCE instruction
    after a bounds check prevents later operations from executing before
    the bound check completes.

  This was experimentally confirmed in [4].

* On AMD x86_64, lfence is dispatch-serializing [5] (requires MSR
  C001_1029[1] to be set if the MSR is supported, this happens in
  init_amd()). AMD further specifies "A dispatch serializing instruction
  forces the processor to retire the serializing instruction and all
  previous instructions before the next instruction is executed" [8]. As
  dispatch is not specific to memory loads or branches, lfence therefore
  also affects all instructions there. Also, if retiring a branch means
  it's PC change becomes architectural (should be), this means any
  "wrong" speculation is aborted as required for this series.

* ARM's SB speculation barrier instruction also affects "any instruction
  that appears later in the program order than the barrier" [6].

* PowerPC's barrier also affects all subsequent instructions [7]:

    [...] executing an ori R31,R31,0 instruction ensures that all
    instructions preceding the ori R31,R31,0 instruction have completed
    before the ori R31,R31,0 instruction completes, and that no
    subsequent instructions are initiated, even out-of-order, until
    after the ori R31,R31,0 instruction completes. The ori R31,R31,0
    instruction may complete before storage accesses associated with
    instructions preceding the ori R31,R31,0 instruction have been
    performed

Regarding the example, this implies that `if B goto e` will not execute
before `if A goto e` completes. Once `if A goto e` completes, the CPU
should find that the speculation was wrong and continue with `exit`.

If there is any other path that leads to `if B goto e` (and therefore
`unsafe()`) without going through `if A goto e`, then a nospec will
still be needed there. However, this patch assumes this other path will
be explored separately and therefore be discovered by the verifier even
if the exploration discussed here stops at the nospec.

This patch furthermore has the unfortunate consequence that Spectre v1
mitigations now only support architectures which implement BPF_NOSPEC.
Before this commit, Spectre v1 mitigations prevented exploits by
rejecting the programs on all architectures. Because some JITs do not
implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's
security to a limited extent:

* The regression is limited to systems vulnerable to Spectre v1, have
  unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The
  latter is not the case for x86 64- and 32-bit, arm64, and powerpc
  64-bit and they are therefore not affected by the regression.
  According to commit a6f6a95f25 ("LoongArch, bpf: Fix jit to skip
  speculation barrier opcode"), LoongArch is not vulnerable to Spectre
  v1 and therefore also not affected by the regression.

* To the best of my knowledge this regression may therefore only affect
  MIPS. This is deemed acceptable because unpriv BPF is still disabled
  there by default. As stated in a previous commit, BPF_NOSPEC could be
  implemented for MIPS based on GCC's speculation_barrier
  implementation.

* It is unclear which other architectures (besides x86 64- and 32-bit,
  ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel
  are vulnerable to Spectre v1. Also, it is not clear if barriers are
  available on these architectures. Implementing BPF_NOSPEC on these
  architectures therefore is non-trivial. Searching GCC and the kernel
  for speculation barrier implementations for these architectures
  yielded no result.

* If any of those regressed systems is also vulnerable to Spectre v4,
  the system was already vulnerable to Spectre v4 attacks based on
  unpriv BPF before this patch and the impact is therefore further
  limited.

As an alternative to regressing security, one could still reject
programs if the architecture does not emit BPF_NOSPEC (e.g., by removing
the empty BPF_NOSPEC-case from all JITs except for LoongArch where it
appears justified). However, this will cause rejections on these archs
that are likely unfounded in the vast majority of cases.

In the tests, some are now successful where we previously had a
false-positive (i.e., rejection). Change them to reflect where the
nospec should be inserted (using __xlated_unpriv) and modify the error
message if the nospec is able to mitigate a problem that previously
shadowed another problem (in that case __xlated_unpriv does not work,
therefore just add a comment).

Define SPEC_V1 to avoid duplicating this ifdef whenever we check for
nospec insns using __xlated_unpriv, define it here once. This also
improves readability. PowerPC can probably also be added here. However,
omit it for now because the BPF CI currently does not include a test.

Limit it to EPERM, EACCES, and EINVAL (and not everything except for
EFAULT and ENOMEM) as it already has the desired effect for most
real-world programs. Briefly went through all the occurrences of EPERM,
EINVAL, and EACCESS in verifier.c to validate that catching them like
this makes sense.

Thanks to Dustin for their help in checking the vendor documentation.

[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
    Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
    Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
[3] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html
    ("Managed Runtime Speculative Execution Side Channel Mitigations")
[4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a
    tool to analyze speculative execution attacks and mitigations" -
    Section 4.6 "Stopping Speculative Execution")
[5] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf
    ("White Paper - SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD
    PROCESSORS - REVISION 5.09.23")
[6] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/SB--Speculation-Barrier-
    ("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set
    Architecture (2020-12)")
[7] https://wiki.raptorcs.com/w/images/5/5f/OPF_PowerISA_v3.1C.pdf
    ("Power ISA™ - Version 3.1C - May 26, 2024 - Section 9.2.1 of Book
    III")
[8] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
    ("AMD64 Architecture Programmer’s Manual Volumes 1–5 - Revision 4.08
    - April 2024 - 7.6.4 Serializing Instructions")

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Dustin Nguyen <nguyen@cs.fau.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603212428.338473-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:10 -07:00
Tao Chen
d77efc0ef5 selftests/bpf: Add cookies check for tracing fill_link_info test
Adding tests for getting cookie with fill_link_info for tracing.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606165818.3394397-2-chen.dylane@linux.dev
2025-06-09 16:45:17 -07:00
Ihor Solodrai
260b862918 selftests/bpf: Add test cases with CONST_PTR_TO_MAP null checks
A test requires the following to happen:
  * CONST_PTR_TO_MAP value is checked for null
  * the code in the null branch fails verification

Add test cases:
* direct global map_ptr comparison to null
* lookup inner map, then two checks (the first transforms
  map_value_or_null into map_ptr)
* lookup inner map, spill-fill it, then check for null
* use an array of ringbufs to recreate a common coding pattern [1]

[1] https://lore.kernel.org/bpf/CAEf4BzZNU0gX_sQ8k8JaLe1e+Veth3Rk=4x7MDhv=hQxvO8EDw@mail.gmail.com/

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-4-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Ihor Solodrai
eb6c992784 selftests/bpf: Add cmp_map_pointer_with_const test
Add a test for CONST_PTR_TO_MAP comparison with a non-0 constant. A
BPF program with this code must not pass verification in unpriv.

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-3-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Ihor Solodrai
5534e58f2e bpf: Make reg_not_null() true for CONST_PTR_TO_MAP
When reg->type is CONST_PTR_TO_MAP, it can not be null. However the
verifier explores the branches under rX == 0 in check_cond_jmp_op()
even if reg->type is CONST_PTR_TO_MAP, because it was not checked for
in reg_not_null().

Fix this by adding CONST_PTR_TO_MAP to the set of types that are
considered non nullable in reg_not_null().

An old "unpriv: cmp map pointer with zero" selftest fails with this
change, because now early out correctly triggers in
check_cond_jmp_op(), making the verification to pass.

In practice verifier may allow pointer to null comparison in unpriv,
since in many cases the relevant branch and comparison op are removed
as dead code. So change the expected test result to __success_unpriv.

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250609183024.359974-2-isolodrai@meta.com
2025-06-09 16:42:04 -07:00
Yonghong Song
e422d5f118 selftests/bpf: Add two selftests for mprog API based cgroup progs
Two tests are added:
  - cgroup_mprog_opts, which mimics tc_opts.c ([1]). Both prog and link
    attach are tested. Some negative tests are also included.
  - cgroup_mprog_ordering, which actually runs the program with some mprog
    API flags.

  [1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/tc_opts.c

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163156.2429955-1-yonghong.song@linux.dev
2025-06-09 16:28:31 -07:00
Yonghong Song
c1bb68656b selftests/bpf: Move some tc_helpers.h functions to test_progs.h
Move static inline functions id_from_prog_fd() and id_from_link_fd()
from prog_tests/tc_helpers.h to test_progs.h so these two functions
can be reused for later cgroup mprog selftests.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250606163151.2429325-1-yonghong.song@linux.dev
2025-06-09 16:28:30 -07:00
Yonghong Song
bbc7bd658d selftests/bpf: Fix a user_ringbuf failure with arm64 64KB page size
The ringbuf max_entries must be PAGE_ALIGNED. See kernel function
ringbuf_map_alloc(). So for arm64 64KB page size, adjust max_entries
properly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013626.1553001-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
8c8c5e3c85 selftests/bpf: Fix ringbuf/ringbuf_write test failure with arm64 64KB page size
The ringbuf max_entries must be PAGE_ALIGNED. See kernel function
ringbuf_map_alloc(). So for arm64 64KB page size, adjust max_entries
and other related metrics properly.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013621.1552332-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
377d371590 selftests/bpf: Fix bpf_mod_race test failure with arm64 64KB page size
Currently, uffd_register.range.len is set to 4096 for command
'ioctl(uffd, UFFDIO_REGISTER, &uffd_register)'. For arm64 64KB page size,
the len must be 64KB size aligned as page size alignment is required.
See fs/userfaultfd.c:validate_unaligned_range().

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013615.1551783-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Yonghong Song
ae8824037a selftests/bpf: Reduce test_xdp_adjust_frags_tail_grow logs
For selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow, if tested failure,
I see a long list of log output like

    ...
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    test_xdp_adjust_frags_tail_grow:PASS:9Kb+10b-untouched 0 nsec
    ...

There are total 7374 lines of the above which is too much. Let us
only issue such logs when it is an assert failure.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250607013610.1551399-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-06 19:21:43 -07:00
Rong Tao
64a064ce33 selftests/bpf: rbtree: Fix incorrect global variable usage
Within __add_three() function, should use function parameters instead of
global variables. So that the variables groot_nested.inner.root and
groot_nested.inner.glock in rbtree_add_nodes_nested() are tested
correctly.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_3DD7405C0839EBE2724AC5FA357B5402B105@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-05 13:55:26 -07:00
Blake Jones
a570f386f3 Tests for the ".emit_strings" functionality in the BTF dumper.
When this mode is turned on, "emit_zeroes" and "compact" have no effect,
and embedded NUL characters always terminate printing of an array.

Signed-off-by: Blake Jones <blakejones@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250603203701.520541-2-blakejones@google.com
2025-06-05 13:45:16 -07:00
Tao Chen
25a0d04d38 selftests/bpf: Add cookies check for raw_tp fill_link_info test
Adding tests for getting cookie with fill_link_info for raw_tp.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250603154309.3063644-2-chen.dylane@linux.dev
2025-06-05 11:44:52 -07:00
Yonghong Song
baa39c169d selftests/bpf: Fix selftest btf_tag/btf_type_tag_percpu_vmlinux_helper failure
Ihor Solodrai reported selftest 'btf_tag/btf_type_tag_percpu_vmlinux_helper'
failure ([1]) during 6.16 merge window. The failure log:

  ...
  7: (15) if r0 == 0x0 goto pc+1        ; R0=ptr_css_rstat_cpu()
  ; *(volatile int *)rstat; @ btf_type_tag_percpu.c:68
  8: (61) r1 = *(u32 *)(r0 +0)
  cannot access ptr member updated_children with moff 0 in struct css_rstat_cpu with off 0 size 4

Two changes are needed. First, 'struct cgroup_rstat_cpu' needs to be
replaced with 'struct css_rstat_cpu' to be consistent with new data
structure. Second, layout of 'css_rstat_cpu' is changed compared
to 'cgroup_rstat_cpu'. The first member becomes a pointer so
the bpf prog needs to do 8-byte load instead of 4-byte load.

  [1] https://lore.kernel.org/bpf/6f688f2e-7d26-423a-9029-d1b1ef1c938a@linux.dev/

Cc: Ihor Solodrai <ihor.solodrai@linux.dev>
Cc: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: JP Kobryn <inwardvessel@gmail.com>
Link: https://lore.kernel.org/r/20250529201151.1787575-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-01 13:07:47 -07:00
Saket Kumar Bhaskar
4b65d5ae97 selftests/bpf: Fix bpf selftest build error
On linux-next, build for bpf selftest displays an error due to
mismatch in the expected function signature of bpf_testmod_test_read
and bpf_testmod_test_write.

Commit 97d06802d1 ("sysfs: constify bin_attribute argument of bin_attribute::read/write()")
changed the required type for struct bin_attribute to const struct bin_attribute.

To resolve the error, update corresponding signature for the callback.

Fixes: 97d06802d1 ("sysfs: constify bin_attribute argument of bin_attribute::read/write()")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/e915da49-2b9a-4c4c-a34f-877f378129f6@linux.ibm.com/
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Link: https://lore.kernel.org/r/20250512091108.2015615-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-01 12:57:41 -07:00
Linus Torvalds
b78f1293f9 tracing updates for v6.16:
- Have module addresses get updated in the persistent ring buffer
 
   The addresses of the modules from the previous boot are saved in the
   persistent ring buffer. If the same modules are loaded and an address is
   in the old buffer points to an address that was both saved in the
   persistent ring buffer and is loaded in memory, shift the address to point
   to the address that is loaded in memory in the trace event.
 
 - Print function names for irqs off and preempt off callsites
 
   When ignoring the print fmt of a trace event and just printing the fields
   directly, have the fields for preempt off and irqs off events still show
   the function name (via kallsyms) instead of just showing the raw address.
 
 - Clean ups of the histogram code
 
   The histogram functions saved over 800 bytes on the stack to process
   events as they come in. Instead, create per-cpu buffers that can hold this
   information and have a separate location for each context level (thread,
   softirq, IRQ and NMI).
 
   Also add some more comments to the code.
 
 - Add "common_comm" field for histograms
 
   Add "common_comm" that uses the current->comm as a field in an event
   histogram and acts like any of the other fields of the event.
 
 - Show "subops" in the enabled_functions file
 
   When the function graph infrastructure is used, a subsystem has a "subops"
   that it attaches its callback function to. Instead of the
   enabled_functions just showing a function calling the function that calls
   the subops functions, also show the subops functions that will get called
   for that function too.
 
 - Add "copy_trace_marker" option to instances
 
   There are cases where an instance is created for tooling to write into,
   but the old tooling has the top level instance hardcoded into the
   application. New tools want to consume the data from an instance and not
   the top level buffer. By adding a copy_trace_marker option, whenever the
   top instance trace_marker is written into, a copy of it is also written
   into the instance with this option set. This allows new tools to read what
   old tools are writing into the top buffer.
 
   If this option is cleared by the top instance, then what is written into
   the trace_marker is not written into the top instance. This is a way to
   redirect the trace_marker writes into another instance.
 
 - Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp()
 
   If a tracepoint is created by DECLARE_TRACE() instead of TRACE_EVENT(),
   then it will not be exposed via tracefs. Currently there's no way to
   differentiate in the kernel the tracepoint functions between those that
   are exposed via tracefs or not. A calling convention has been made
   manually to append a "_tp" prefix for events created by DECLARE_TRACE().
   Instead of doing this manually, force it so that all DECLARE_TRACE()
   events have this notation.
 
 - Use __string() for task->comm in some sched events
 
   Instead of hardcoding the comm to be TASK_COMM_LEN in some of the
   scheduler events use __string() which makes it dynamic. Note, if these
   events are parsed by user space it they may break, and the event may have
   to be converted back to the hardcoded size.
 
 - Have function graph "depth" be unsigned to the user
 
   Internally to the kernel, the "depth" field of the function graph event is
   signed due to -1 being used for end of boundary. What actually gets
   recorded in the event itself is zero or positive. Reflect this to user
   space by showing "depth" as unsigned int and be consistent across all
   events.
 
 - Allow an arbitrary long CPU string to osnoise_cpus_write()
 
   The filtering of which CPUs to write to can exceed 256 bytes. If a machine
   has 256 CPUs, and the filter is to filter every other CPU, the write would
   take a string larger than 256 bytes. Instead of using a fixed size buffer
   on the stack that is 256 bytes, allocate it to handle what is passed in.
 
 - Stop having ftrace check the per-cpu data "disabled" flag
 
   The "disabled" flag in the data structure passed to most ftrace functions
   is checked to know if tracing has been disabled or not. This flag was
   added back in 2008 before the ring buffer had its own way to disable
   tracing. The "disable" flag is now not always set when needed, and the
   ring buffer flag should be used in all locations where the disabled is
   needed. Since the "disable" flag is redundant and incorrect, stop using it.
   Fix up some locations that use the "disable" flag to use the ring buffer
   info.
 
 - Use a new tracer_tracing_disable/enable() instead of data->disable flag
 
   There's a few cases that set the data->disable flag to stop tracing, but
   this flag is not consistently used. It is also an on/off switch where if a
   function set it and calls another function that sets it, the called
   function may incorrectly enable it.
 
   Use a new trace_tracing_disable() and tracer_tracing_enable() that uses a
   counter and can be nested. These use the ring buffer flags which are
   always checked making the disabling more consistent.
 
 - Save the trace clock in the persistent ring buffer
 
   Save what clock was used for tracing in the persistent ring buffer and set
   it back to that clock after a reboot.
 
 - Remove unused reference to a per CPU data pointer in mmiotrace functions
 
 - Remove unused buffer_page field from trace_array_cpu structure
 
 - Remove more strncpy() instances
 
 - Other minor clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaDhiqRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkheAQDpyRHoXF1AIoEqyahDax8f3vpZQeCH
 B/mn+YJmU1wuVgEA7AFALov5SHKv4IzoARz68GXtR0jGhP5D8uebUhUqDAQ=
 =WmFG
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Have module addresses get updated in the persistent ring buffer

   The addresses of the modules from the previous boot are saved in the
   persistent ring buffer. If the same modules are loaded and an address
   is in the old buffer points to an address that was both saved in the
   persistent ring buffer and is loaded in memory, shift the address to
   point to the address that is loaded in memory in the trace event.

 - Print function names for irqs off and preempt off callsites

   When ignoring the print fmt of a trace event and just printing the
   fields directly, have the fields for preempt off and irqs off events
   still show the function name (via kallsyms) instead of just showing
   the raw address.

 - Clean ups of the histogram code

   The histogram functions saved over 800 bytes on the stack to process
   events as they come in. Instead, create per-cpu buffers that can hold
   this information and have a separate location for each context level
   (thread, softirq, IRQ and NMI).

   Also add some more comments to the code.

 - Add "common_comm" field for histograms

   Add "common_comm" that uses the current->comm as a field in an event
   histogram and acts like any of the other fields of the event.

 - Show "subops" in the enabled_functions file

   When the function graph infrastructure is used, a subsystem has a
   "subops" that it attaches its callback function to. Instead of the
   enabled_functions just showing a function calling the function that
   calls the subops functions, also show the subops functions that will
   get called for that function too.

 - Add "copy_trace_marker" option to instances

   There are cases where an instance is created for tooling to write
   into, but the old tooling has the top level instance hardcoded into
   the application. New tools want to consume the data from an instance
   and not the top level buffer. By adding a copy_trace_marker option,
   whenever the top instance trace_marker is written into, a copy of it
   is also written into the instance with this option set. This allows
   new tools to read what old tools are writing into the top buffer.

   If this option is cleared by the top instance, then what is written
   into the trace_marker is not written into the top instance. This is a
   way to redirect the trace_marker writes into another instance.

 - Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp()

   If a tracepoint is created by DECLARE_TRACE() instead of
   TRACE_EVENT(), then it will not be exposed via tracefs. Currently
   there's no way to differentiate in the kernel the tracepoint
   functions between those that are exposed via tracefs or not. A
   calling convention has been made manually to append a "_tp" prefix
   for events created by DECLARE_TRACE(). Instead of doing this
   manually, force it so that all DECLARE_TRACE() events have this
   notation.

 - Use __string() for task->comm in some sched events

   Instead of hardcoding the comm to be TASK_COMM_LEN in some of the
   scheduler events use __string() which makes it dynamic. Note, if
   these events are parsed by user space it they may break, and the
   event may have to be converted back to the hardcoded size.

 - Have function graph "depth" be unsigned to the user

   Internally to the kernel, the "depth" field of the function graph
   event is signed due to -1 being used for end of boundary. What
   actually gets recorded in the event itself is zero or positive.
   Reflect this to user space by showing "depth" as unsigned int and be
   consistent across all events.

 - Allow an arbitrary long CPU string to osnoise_cpus_write()

   The filtering of which CPUs to write to can exceed 256 bytes. If a
   machine has 256 CPUs, and the filter is to filter every other CPU,
   the write would take a string larger than 256 bytes. Instead of using
   a fixed size buffer on the stack that is 256 bytes, allocate it to
   handle what is passed in.

 - Stop having ftrace check the per-cpu data "disabled" flag

   The "disabled" flag in the data structure passed to most ftrace
   functions is checked to know if tracing has been disabled or not.
   This flag was added back in 2008 before the ring buffer had its own
   way to disable tracing. The "disable" flag is now not always set when
   needed, and the ring buffer flag should be used in all locations
   where the disabled is needed. Since the "disable" flag is redundant
   and incorrect, stop using it. Fix up some locations that use the
   "disable" flag to use the ring buffer info.

 - Use a new tracer_tracing_disable/enable() instead of data->disable
   flag

   There's a few cases that set the data->disable flag to stop tracing,
   but this flag is not consistently used. It is also an on/off switch
   where if a function set it and calls another function that sets it,
   the called function may incorrectly enable it.

   Use a new trace_tracing_disable() and tracer_tracing_enable() that
   uses a counter and can be nested. These use the ring buffer flags
   which are always checked making the disabling more consistent.

 - Save the trace clock in the persistent ring buffer

   Save what clock was used for tracing in the persistent ring buffer
   and set it back to that clock after a reboot.

 - Remove unused reference to a per CPU data pointer in mmiotrace
   functions

 - Remove unused buffer_page field from trace_array_cpu structure

 - Remove more strncpy() instances

 - Other minor clean ups and fixes

* tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (36 commits)
  tracing: Fix compilation warning on arm32
  tracing: Record trace_clock and recover when reboot
  tracing/sched: Use __string() instead of fixed lengths for task->comm
  tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix
  tracing: Cleanup upper_empty() in pid_list
  tracing: Allow the top level trace_marker to write into another instances
  tracing: Add a helper function to handle the dereference arg in verifier
  tracing: Remove unnecessary "goto out" that simply returns ret is trigger code
  tracing: Fix error handling in event_trigger_parse()
  tracing: Rename event_trigger_alloc() to trigger_data_alloc()
  tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf
  tracing: Remove unused buffer_page field from trace_array_cpu structure
  tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer
  tracing: Convert the per CPU "disabled" counter to local from atomic
  tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field
  ring-buffer: Add ring_buffer_record_is_on_cpu()
  tracing: Do not use per CPU array_buffer.data->disabled for cpumask
  ftrace: Do not disabled function graph based on "disabled" field
  tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled
  tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one()
  ...
2025-05-29 21:04:36 -07:00
Linus Torvalds
90b83efa67 bpf-next-6.16
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmg3NqgACgkQ6rmadz2v
 bTpNUQ/8DPeYtn3nskpsP2OwFy6O3hhfCe6gjOAmUVSk000xbG+AcI/h1DnGZWgk
 xlVcEs93ekzUzHd7k1+RJ2c5yDLXieLJAtb66rbFU1enkxs2cWlcWSKE6K/gaoh3
 G1BCARVlKwtrJhrVrsXtYP/eGZxKRSUZFK7xhtCk7lp7sRI3xkTLE+FJBcDkTJ6W
 HwF14i3zO+BkqNGdFwwlASCCqRItSNBBiM3KjW1DbETOTfAKlvCTrcgdUiODqxhF
 PNnULW+xmICABDFlKfDMlUAGNlSHKjiI3+g31LdblA5eyEhIqiCRgBGFYoCnsluk
 qUauRSie61KqC7fxN3qVpC3bXJfD1td7uIvoqSkDLtTv8a5+HAoiohzi1qBzCayl
 LAGkBYewAfDtdDDjNY38JLH2RCdyY6zG9DhqghPHdPlM7zj7L5zZgj34igEwesMM
 mfj9TuFFF99yfX5UUeSxKpDGR1eO4Ew0p7tg8CRs8Fqh6AIQSmboREZrsncVRCTS
 4SDHSI4KcO4LO2pEKzy+X4dewganN7aESnQG34iG0liyvDDwJOgUnDWLRwPLas7k
 3b/zIfBLxOJpA5R+0hhAMtjMA4NgyKJf4yFZwEieuasQjvzwTApi24YhZ/b3HSEB
 2Dp8kHEEbwezv0OFFz/fJ88dNQnrDmtJ+QByN/liA8kj4Yuh2+Q=
 =j3t8
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Fix and improve BTF deduplication of identical BTF types (Alan
   Maguire and Andrii Nakryiko)

 - Support up to 12 arguments in BPF trampoline on arm64 (Xu Kuohai and
   Alexis Lothoré)

 - Support load-acquire and store-release instructions in BPF JIT on
   riscv64 (Andrea Parri)

 - Fix uninitialized values in BPF_{CORE,PROBE}_READ macros (Anton
   Protopopov)

 - Streamline allowed helpers across program types (Feng Yang)

 - Support atomic update for hashtab of BPF maps (Hou Tao)

 - Implement json output for BPF helpers (Ihor Solodrai)

 - Several s390 JIT fixes (Ilya Leoshkevich)

 - Various sockmap fixes (Jiayuan Chen)

 - Support mmap of vmlinux BTF data (Lorenz Bauer)

 - Support BPF rbtree traversal and list peeking (Martin KaFai Lau)

 - Tests for sockmap/sockhash redirection (Michal Luczaj)

 - Introduce kfuncs for memory reads into dynptrs (Mykyta Yatsenko)

 - Add support for dma-buf iterators in BPF (T.J. Mercier)

 - The verifier support for __bpf_trap() (Yonghong Song)

* tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (135 commits)
  bpf, arm64: Remove unused-but-set function and variable.
  selftests/bpf: Add tests with stack ptr register in conditional jmp
  bpf: Do not include stack ptr register in precision backtracking bookkeeping
  selftests/bpf: enable many-args tests for arm64
  bpf, arm64: Support up to 12 function arguments
  bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem()
  bpf: Avoid __bpf_prog_ret0_warn when jit fails
  bpftool: Add support for custom BTF path in prog load/loadall
  selftests/bpf: Add unit tests with __bpf_trap() kfunc
  bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variable
  bpf: Remove special_kfunc_set from verifier
  selftests/bpf: Add test for open coded dmabuf_iter
  selftests/bpf: Add test for dmabuf_iter
  bpf: Add open coded dmabuf iterator
  bpf: Add dmabuf iterator
  dma-buf: Rename debugfs symbols
  bpf: Fix error return value in bpf_copy_from_user_dynptr
  libbpf: Use mmap to parse vmlinux BTF from sysfs
  selftests: bpf: Add a test for mmapable vmlinux BTF
  btf: Allow mmap of vmlinux btf
  ...
2025-05-28 15:52:42 -07:00
Linus Torvalds
1b98f357da Networking changes for 6.16.
Core
 ----
 
  - Implement the Device Memory TCP transmit path, allowing zero-copy
    data transmission on top of TCP from e.g. GPU memory to the wire.
 
  - Move all the IPv6 routing tables management outside the RTNL scope,
    under its own lock and RCU. The route control path is now 3x times
    faster.
 
  - Convert queue related netlink ops to instance lock, reducing
    again the scope of the RTNL lock. This improves the control plane
    scalability.
 
  - Refactor the software crc32c implementation, removing unneeded
    abstraction layers and improving significantly the related
    micro-benchmarks.
 
  - Optimize the GRO engine for UDP-tunneled traffic, for a 10%
    performance improvement in related stream tests.
 
  - Cover more per-CPU storage with local nested BH locking; this is a
    prep work to remove the current per-CPU lock in local_bh_disable()
    on PREMPT_RT.
 
  - Introduce and use nlmsg_payload helper, combining buffer bounds
    verification with accessing payload carried by netlink messages.
 
 Netfilter
 ---------
 
  - Rewrite the procfs conntrack table implementation, improving
    considerably the dump performance. A lot of user-space tools
    still use this interface.
 
  - Implement support for wildcard netdevice in netdev basechain
    and flowtables.
 
  - Integrate conntrack information into nft trace infrastructure.
 
  - Export set count and backend name to userspace, for better
    introspection.
 
 BPF
 ---
 
  - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
    programs and can be controlled in similar way to traditional qdiscs
    using the "tc qdisc" command.
 
  - Refactor the UDP socket iterator, addressing long standing issues
    WRT duplicate hits or missed sockets.
 
 Protocols
 ---------
 
  - Improve TCP receive buffer auto-tuning and increase the default
    upper bound for the receive buffer; overall this improves the single
    flow maximum thoughput on 200Gbs link by over 60%.
 
  - Add AFS GSSAPI security class to AF_RXRPC; it provides transport
    security for connections to the AFS fileserver and VL server.
 
  - Improve TCP multipath routing, so that the sources address always
    matches the nexthop device.
 
  - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
    and thus preventing DoS caused by passing around problematic FDs.
 
  - Retire DCCP socket. DCCP only receives updates for bugs, and major
    distros disable it by default. Its removal allows for better
    organisation of TCP fields to reduce the number of cache lines hit
    in the fast path.
 
  - Extend TCP drop-reason support to cover PAWS checks.
 
 Driver API
 ----------
 
  - Reorganize PTP ioctl flag support to require an explicit opt-in for
    the drivers, avoiding the problem of drivers not rejecting new
    unsupported flags.
 
  - Converted several device drivers to timestamping APIs.
 
  - Introduce per-PHY ethtool dump helpers, improving the support for
    dump operations targeting PHYs.
 
 Tests and tooling
 -----------------
 
  - Add support for classic netlink in user space C codegen, so that
    ynl-c can now read, create and modify links, routes addresses and
    qdisc layer configuration.
 
  - Add ynl sub-types for binary attributes, allowing ynl-c to output
    known struct instead of raw binary data, clarifying the classic
    netlink output.
 
  - Extend MPTCP selftests to improve the code-coverage.
 
  - Add tests for XDP tail adjustment in AF_XDP.
 
 New hardware / drivers
 ----------------------
 
  - OpenVPN virtual driver: offload OpenVPN data channels processing
    to the kernel-space, increasing the data transfer throughput WRT
    the user-space implementation.
 
  - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.
 
  - Broadcom asp-v3.0 ethernet driver.
 
  - AMD Renoir ethernet device.
 
  - ReakTek MT9888 2.5G ethernet PHY driver.
 
  - Aeonsemi 10G C45 PHYs driver.
 
 Drivers
 -------
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox (mlx5):
      - refactor the stearing table handling to reduce significantly
        the amount of memory used
      - add support for complex matches in H/W flow steering
      - improve flow streeing error handling
      - convert to netdev instance locking
    - Intel (100G, ice, igb, ixgbe, idpf):
      - ice: add switchdev support for LLDP traffic over VF
      - ixgbe: add firmware manipulation and regions devlink support
      - igb: introduce support for frame transmission premption
      - igb: adds persistent NAPI configuration
      - idpf: introduce RDMA support
      - idpf: add initial PTP support
    - Meta (fbnic):
      - extend hardware stats coverage
      - add devlink dev flash support
    - Broadcom (bnxt):
      - add support for RX-side device memory TCP
    - Wangxun (txgbe):
      - implement support for udp tunnel offload
      - complete PTP and SRIOV support for AML 25G/10G devices
 
  - Ethernet NICs embedded and virtual:
    - Google (gve):
      - add device memory TCP TX support
    - Amazon (ena):
      - support persistent per-NAPI config
    - Airoha:
      - add H/W support for L2 traffic offload
      - add per flow stats for flow offloading
    - RealTek (rtl8211): add support for WoL magic packet
    - Synopsys (stmmac):
      - dwmac-socfpga 1000BaseX support
      - add Loongson-2K3000 support
      - introduce support for hardware-accelerated VLAN stripping
    - Broadcom (bcmgenet):
      - expose more H/W stats
    - Freescale (enetc, dpaa2-eth):
      - enetc: add MAC filter, VLAN filter RSS and loopback support
      - dpaa2-eth: convert to H/W timestamping APIs
    - vxlan: convert FDB table to rhashtable, for better scalabilty
    - veth: apply qdisc backpressure on full ring to reduce TX drops
 
  - Ethernet switches:
    - Microchip (kzZ88x3): add ETS scheduler support
 
  - Ethernet PHYs:
    - RealTek (rtl8211):
      - add support for WoL magic packet
      - add support for PHY LEDs
 
  - CAN:
    - Adds RZ/G3E CANFD support to the rcar_canfd driver.
    - Preparatory work for CAN-XL support.
    - Add self-tests framework with support for CAN physical interfaces.
 
  - WiFi:
    - mac80211:
      - scan improvements with multi-link operation (MLO)
    - Qualcomm (ath12k):
      - enable AHB support for IPQ5332
      - add monitor interface support to QCN9274
      - add multi-link operation support to WCN7850
      - add 802.11d scan offload support to WCN7850
      - monitor mode for WCN7850, better 6 GHz regulatory
    - Qualcomm (ath11k):
      - restore hibernation support
    - MediaTek (mt76):
      - WiFi-7 improvements
      - implement support for mt7990
    - Intel (iwlwifi):
      - enhanced multi-link single-radio (EMLSR) support on 5 GHz links
      - rework device configuration
    - RealTek (rtw88):
      - improve throughput for RTL8814AU
    - RealTek (rtw89):
      - add multi-link operation support
      - STA/P2P concurrency improvements
      - support different SAR configs by antenna
 
  - Bluetooth:
    - introduce HCI Driver protocol
    - btintel_pcie: do not generate coredump for diagnostic events
    - btusb: add HCI Drv commands for configuring altsetting
    - btusb: add RTL8851BE device 0x0bda:0xb850
    - btusb: add new VID/PID 13d3/3584 for MT7922
    - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
    - btnxpuart: implement host-wakeup feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmg3D64SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkcIsQAK2eEc+BxQer975wzvtMg6gF9eoex4a+
 rZ7jxfDzDtNvTauoQsrpehDZp0FnySaVGCU36lHGB2OvDnhCpPc5hXzKDWQpOuqQ
 SHrGG3/6FTbdTG/HfHUcbNyrUzIf53SADSObiQ3qg4gyEQ3sCpcOKtVtMcU8rvsY
 /HqMnsJWFaROUMjMtCcnUSgjmeY9kBvha3sTXUqgeRugEOCvZD7z4rpqFIcQqHw7
 e2Fi8dwIXEYNxqPp6MRq2qdyUTewCRruE8ZIMAFuhtfYeMElUZMPlqlMENX3AzTQ
 cr0EgwcFOUxRA7oZRxhoBNBsVXavtSpQr4ZDoWplxP4aQ37n5tc1E9Q72axpB/Og
 FbJRl6GvWYnCd8071BczgmfHlKaTAigPvt2Z4r6JjM5I/Bij/IZ3k+On1OTuOAj/
 EqfFkdZ0a5cfKrwUMP+oSGtSAywkMVUtnIKJlZeRbjSj2432sCfe2jVAlS8ELM43
 3LUgXYrAKtA87g171LlsRu5EEpI5QmqPb+i5LpPlEXe2TJEgPisyfecJ3NafF/2+
 j575lm+TFNm9NTNhGGjDPEvw0djI5wSGGMe9J4gC74eWi6s5t6C4cuUf84TKWdwR
 x+9H0IB7rfFncAwXHJuUUtzd+fPHaYzs5dDGbSgMQOXr1cr1wlubCK8mQ1r/Wt/a
 3GjFIOQKW2Q5
 =t/Tz
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core:

   - Implement the Device Memory TCP transmit path, allowing zero-copy
     data transmission on top of TCP from e.g. GPU memory to the wire.

   - Move all the IPv6 routing tables management outside the RTNL scope,
     under its own lock and RCU. The route control path is now 3x times
     faster.

   - Convert queue related netlink ops to instance lock, reducing again
     the scope of the RTNL lock. This improves the control plane
     scalability.

   - Refactor the software crc32c implementation, removing unneeded
     abstraction layers and improving significantly the related
     micro-benchmarks.

   - Optimize the GRO engine for UDP-tunneled traffic, for a 10%
     performance improvement in related stream tests.

   - Cover more per-CPU storage with local nested BH locking; this is a
     prep work to remove the current per-CPU lock in local_bh_disable()
     on PREMPT_RT.

   - Introduce and use nlmsg_payload helper, combining buffer bounds
     verification with accessing payload carried by netlink messages.

  Netfilter:

   - Rewrite the procfs conntrack table implementation, improving
     considerably the dump performance. A lot of user-space tools still
     use this interface.

   - Implement support for wildcard netdevice in netdev basechain and
     flowtables.

   - Integrate conntrack information into nft trace infrastructure.

   - Export set count and backend name to userspace, for better
     introspection.

  BPF:

   - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
     programs and can be controlled in similar way to traditional qdiscs
     using the "tc qdisc" command.

   - Refactor the UDP socket iterator, addressing long standing issues
     WRT duplicate hits or missed sockets.

  Protocols:

   - Improve TCP receive buffer auto-tuning and increase the default
     upper bound for the receive buffer; overall this improves the
     single flow maximum thoughput on 200Gbs link by over 60%.

   - Add AFS GSSAPI security class to AF_RXRPC; it provides transport
     security for connections to the AFS fileserver and VL server.

   - Improve TCP multipath routing, so that the sources address always
     matches the nexthop device.

   - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
     and thus preventing DoS caused by passing around problematic FDs.

   - Retire DCCP socket. DCCP only receives updates for bugs, and major
     distros disable it by default. Its removal allows for better
     organisation of TCP fields to reduce the number of cache lines hit
     in the fast path.

   - Extend TCP drop-reason support to cover PAWS checks.

  Driver API:

   - Reorganize PTP ioctl flag support to require an explicit opt-in for
     the drivers, avoiding the problem of drivers not rejecting new
     unsupported flags.

   - Converted several device drivers to timestamping APIs.

   - Introduce per-PHY ethtool dump helpers, improving the support for
     dump operations targeting PHYs.

  Tests and tooling:

   - Add support for classic netlink in user space C codegen, so that
     ynl-c can now read, create and modify links, routes addresses and
     qdisc layer configuration.

   - Add ynl sub-types for binary attributes, allowing ynl-c to output
     known struct instead of raw binary data, clarifying the classic
     netlink output.

   - Extend MPTCP selftests to improve the code-coverage.

   - Add tests for XDP tail adjustment in AF_XDP.

  New hardware / drivers:

   - OpenVPN virtual driver: offload OpenVPN data channels processing to
     the kernel-space, increasing the data transfer throughput WRT the
     user-space implementation.

   - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.

   - Broadcom asp-v3.0 ethernet driver.

   - AMD Renoir ethernet device.

   - ReakTek MT9888 2.5G ethernet PHY driver.

   - Aeonsemi 10G C45 PHYs driver.

  Drivers:

   - Ethernet high-speed NICs:
       - nVidia/Mellanox (mlx5):
           - refactor the steering table handling to significantly
             reduce the amount of memory used
           - add support for complex matches in H/W flow steering
           - improve flow streeing error handling
           - convert to netdev instance locking
       - Intel (100G, ice, igb, ixgbe, idpf):
           - ice: add switchdev support for LLDP traffic over VF
           - ixgbe: add firmware manipulation and regions devlink support
           - igb: introduce support for frame transmission premption
           - igb: adds persistent NAPI configuration
           - idpf: introduce RDMA support
           - idpf: add initial PTP support
       - Meta (fbnic):
           - extend hardware stats coverage
           - add devlink dev flash support
       - Broadcom (bnxt):
           - add support for RX-side device memory TCP
       - Wangxun (txgbe):
           - implement support for udp tunnel offload
           - complete PTP and SRIOV support for AML 25G/10G devices

   - Ethernet NICs embedded and virtual:
       - Google (gve):
           - add device memory TCP TX support
       - Amazon (ena):
           - support persistent per-NAPI config
       - Airoha:
           - add H/W support for L2 traffic offload
           - add per flow stats for flow offloading
       - RealTek (rtl8211): add support for WoL magic packet
       - Synopsys (stmmac):
           - dwmac-socfpga 1000BaseX support
           - add Loongson-2K3000 support
           - introduce support for hardware-accelerated VLAN stripping
       - Broadcom (bcmgenet):
           - expose more H/W stats
       - Freescale (enetc, dpaa2-eth):
           - enetc: add MAC filter, VLAN filter RSS and loopback support
           - dpaa2-eth: convert to H/W timestamping APIs
       - vxlan: convert FDB table to rhashtable, for better scalabilty
       - veth: apply qdisc backpressure on full ring to reduce TX drops

   - Ethernet switches:
       - Microchip (kzZ88x3): add ETS scheduler support

   - Ethernet PHYs:
       - RealTek (rtl8211):
           - add support for WoL magic packet
           - add support for PHY LEDs

   - CAN:
       - Adds RZ/G3E CANFD support to the rcar_canfd driver.
       - Preparatory work for CAN-XL support.
       - Add self-tests framework with support for CAN physical interfaces.

   - WiFi:
       - mac80211:
           - scan improvements with multi-link operation (MLO)
       - Qualcomm (ath12k):
           - enable AHB support for IPQ5332
           - add monitor interface support to QCN9274
           - add multi-link operation support to WCN7850
           - add 802.11d scan offload support to WCN7850
           - monitor mode for WCN7850, better 6 GHz regulatory
       - Qualcomm (ath11k):
           - restore hibernation support
       - MediaTek (mt76):
           - WiFi-7 improvements
           - implement support for mt7990
       - Intel (iwlwifi):
           - enhanced multi-link single-radio (EMLSR) support on 5 GHz links
           - rework device configuration
       - RealTek (rtw88):
           - improve throughput for RTL8814AU
       - RealTek (rtw89):
           - add multi-link operation support
           - STA/P2P concurrency improvements
           - support different SAR configs by antenna

   - Bluetooth:
       - introduce HCI Driver protocol
       - btintel_pcie: do not generate coredump for diagnostic events
       - btusb: add HCI Drv commands for configuring altsetting
       - btusb: add RTL8851BE device 0x0bda:0xb850
       - btusb: add new VID/PID 13d3/3584 for MT7922
       - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
       - btnxpuart: implement host-wakeup feature"

* tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits)
  selftests/bpf: Fix bpf selftest build warning
  selftests: netfilter: Fix skip of wildcard interface test
  net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
  net: openvswitch: Fix the dead loop of MPLS parse
  calipso: Don't call calipso functions for AF_INET sk.
  selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
  net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
  octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback
  octeontx2-pf: QOS: Perform cache sync on send queue teardown
  net: mana: Add support for Multi Vports on Bare metal
  net: devmem: ncdevmem: remove unused variable
  net: devmem: ksft: upgrade rx test to send 1K data
  net: devmem: ksft: add 5 tuple FS support
  net: devmem: ksft: add exit_wait to make rx test pass
  net: devmem: ksft: add ipv4 support
  net: devmem: preserve sockc_err
  page_pool: fix ugly page_pool formatting
  net: devmem: move list_add to net_devmem_bind_dmabuf.
  selftests: netfilter: nft_queue.sh: include file transfer duration in log message
  net: phy: mscc: Fix memory leak when using one step timestamping
  ...
2025-05-28 15:24:36 -07:00
Linus Torvalds
3b66e6b3c0 cgroup: Changes for v6.16
- cgroup rstat shared the tracking tree across all controlers with the
   rationale being that a cgroup which is using one resource is likely to be
   using other resources at the same time (ie. if something is allocating
   memory, it's probably consuming CPU cycles). However, this turned out to
   not scale very well especially with memcg using rstat for internal
   operations which made memcg stat read and flush patterns substantially
   different from other controllers. JP Kobryn split the rstat tree per
   controller.
 
 - cgroup BPF support was hooking into cgroup init/exit paths directly.
   Convert them to use a notifier chain instead so that other usages can be
   added easily. The two of the patches which implement this are mislabeled
   as belonging to sched_ext instead of cgroup. Sorry.
 
 - Relatively minor cpuset updates.
 
 - Documentation updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaDYUmA4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGRhbAP90v8QwUkWEKGQSam8JY3by7PvrW6pV5ot+BGuM
 4xu3BAEAjsJ9FdiwYLwKYqG7y59xhhBFOo6GpcP52kPp3znl+QQ=
 =6MIT
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - cgroup rstat shared the tracking tree across all controllers with the
   rationale being that a cgroup which is using one resource is likely
   to be using other resources at the same time (ie. if something is
   allocating memory, it's probably consuming CPU cycles).

   However, this turned out to not scale very well especially with memcg
   using rstat for internal operations which made memcg stat read and
   flush patterns substantially different from other controllers. JP
   Kobryn split the rstat tree per controller.

 - cgroup BPF support was hooking into cgroup init/exit paths directly.

   Convert them to use a notifier chain instead so that other usages can
   be added easily. The two of the patches which implement this are
   mislabeled as belonging to sched_ext instead of cgroup. Sorry.

 - Relatively minor cpuset updates

 - Documentation updates

* tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits)
  sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier
  sched_ext: Introduce cgroup_lifetime_notifier
  cgroup: Minor reorganization of cgroup_create()
  cgroup, docs: cpu controller's interaction with various scheduling policies
  cgroup, docs: convert space indentation to tab indentation
  cgroup: avoid per-cpu allocation of size zero rstat cpu locks
  cgroup, docs: be specific about bandwidth control of rt processes
  cgroup: document the rstat per-cpu initialization
  cgroup: helper for checking rstat participation of css
  cgroup: use subsystem-specific rstat locks to avoid contention
  cgroup: use separate rstat trees for each subsystem
  cgroup: compare css to cgroup::self in helper for distingushing css
  cgroup: warn on rstat usage by early init subsystems
  cgroup/cpuset: drop useless cpumask_empty() in compute_effective_exclusive_cpumask()
  cgroup/rstat: Improve cgroup_rstat_push_children() documentation
  cgroup: fix goto ordering in cgroup_init()
  cgroup: fix pointer check in css_rstat_init()
  cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs
  cgroup/cpuset: Fix obsolete comment in cpuset_css_offline()
  cgroup/cpuset: Always use cpu_active_mask
  ...
2025-05-27 20:59:53 -07:00
Yonghong Song
5ffb537e41 selftests/bpf: Add tests with stack ptr register in conditional jmp
Add two tests:
  - one test has 'rX <op> r10' where rX is not r10, and
  - another test has 'rX <op> rY' where rX and rY are not r10
    but there is an early insn 'rX = r10'.

Without previous verifier change, both tests will fail.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250524041340.4046304-1-yonghong.song@linux.dev
2025-05-27 14:09:12 -07:00
Alexis Lothoré (eBPF Foundation)
149ead9d7e selftests/bpf: enable many-args tests for arm64
Now that support for up to 12 args is enabled for tracing programs on
ARM64, enable the existing tests for this feature on this architecture.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20250527-many_args_arm64-v3-2-3faf7bb8e4a2@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 10:50:31 -07:00
Yonghong Song
92de53d247 selftests/bpf: Add unit tests with __bpf_trap() kfunc
Add some inline-asm tests and C tests where __bpf_trap() or
__builtin_trap() is used in the code. The __builtin_trap()
test is guarded with llvm21 ([1]) since otherwise the compilation
failure will happen.

  [1] https://github.com/llvm/llvm-project/pull/131731

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250523205331.1291734-1-yonghong.song@linux.dev
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 10:30:07 -07:00
T.J. Mercier
7594dcb71f selftests/bpf: Add test for open coded dmabuf_iter
Use the same test buffers as the traditional iterator and a new BPF map
to verify the test buffers can be found with the open coded dmabuf
iterator.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250522230429.941193-6-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 09:51:26 -07:00
T.J. Mercier
ae5d2c59ec selftests/bpf: Add test for dmabuf_iter
This test creates a udmabuf, and a dmabuf from the system dmabuf heap,
and uses a BPF program that prints dmabuf metadata with the new
dmabuf_iter to verify they can be found.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250522230429.941193-5-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27 09:51:26 -07:00
Linus Torvalds
ddddf9d64f Performance events updates for v6.16:
Core & generic-arch updates:
 
  - Add support for dynamic constraints and propagate it to
    the Intel driver (Kan Liang)
 
  - Fix & enhance driver-specific throttling support (Kan Liang)
 
  - Record sample last_period before updating on the
    x86 and PowerPC platforms (Mark Barnett)
 
  - Make perf_pmu_unregister() usable (Peter Zijlstra)
 
  - Unify perf_event_free_task() / perf_event_exit_task_context()
    (Peter Zijlstra)
 
  - Simplify perf_event_release_kernel() and perf_event_free_task()
    (Peter Zijlstra)
 
  - Allocate non-contiguous AUX pages by default (Yabin Cui)
 
 Uprobes updates:
 
  - Add support to emulate NOP instructions (Jiri Olsa)
 
  - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)
 
 x86 Intel PMU enhancements:
 
  - Support Intel Auto Counter Reload [ACR] (Kan Liang)
 
  - Add PMU support for Clearwater Forest (Dapeng Mi)
 
  - Arch-PEBS preparatory changes: (Dapeng Mi)
 
    - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
    - Decouple BTS initialization from PEBS initialization
    - Introduce pairs of PEBS static calls
 
 x86 AMD PMU enhancements:
 
  - Use hrtimer for handling overflows in the AMD uncore driver
    (Sandipan Das)
 
  - Prevent UMC counters from saturating (Sandipan Das)
 
 Fixes and cleanups:
 
  - Fix put_ctx() ordering (Frederic Weisbecker)
 
  - Fix irq work dereferencing garbage (Frederic Weisbecker)
 
  - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker,
    Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang,
    Sandipan Das, Thorsten Blum)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmgy4zoRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j6QRAAvQ4GBPrdJLb8oXkLjCmWSp9PfM1h2IW0
 reUrcV0BPRAwz4T60QEU2KyiEjvKxNghR6bNw4i3slAZ8EFwP9eWE/0ZYOo5+W/N
 wv8vsopv/oZd2L2G5TgxDJf+tLPkqnTvp651LmGAbquPFONN1lsya9UHVPnt2qtv
 fvFhjW6D828VoevRcUCsdoEUNlFDkUYQ2c3M1y5H2AI6ILDVxLsp5uYtuVUP+2lQ
 7UI/elqRIIblTGT7G9LvTGiXZMm8T58fe1OOLekT6NdweJ3XEt1kMdFo/SCRYfzU
 eDVVVLSextZfzBXNPtAEAlM3aSgd8+4m5sACiD1EeOUNjo5J9Sj1OOCa+bZGF/Rl
 XNv5Kcp6Kh1T4N5lio8DE/NabmHDqDMbUGfud+VTS8uLLku4kuOWNMxJTD1nQ2Zz
 BMfJhP89G9Vk07F9fOGuG1N6mKhIKNOgXh0S92tB7XDHcdJegueu2xh4ZszBL1QK
 JVXa4DbnDj+y0LvnV+A5Z6VILr5RiCAipDb9ascByPja6BbN10Nf9Aj4nWwRTwbO
 ut5OK/fDKmSjEHn1+a42d4iRxdIXIWhXCyxEhH+hJXEFx9htbQ3oAbXAEedeJTlT
 g9QYGAjL96QEd0CqviorV8KyU59nVkEPoLVCumXBZ0WWhNwU6GdAmsW1hLfxQdLN
 sp+XHhfxf8M=
 =tPRs
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events updates from Ingo Molnar:
 "Core & generic-arch updates:

   - Add support for dynamic constraints and propagate it to the Intel
     driver (Kan Liang)

   - Fix & enhance driver-specific throttling support (Kan Liang)

   - Record sample last_period before updating on the x86 and PowerPC
     platforms (Mark Barnett)

   - Make perf_pmu_unregister() usable (Peter Zijlstra)

   - Unify perf_event_free_task() / perf_event_exit_task_context()
     (Peter Zijlstra)

   - Simplify perf_event_release_kernel() and perf_event_free_task()
     (Peter Zijlstra)

   - Allocate non-contiguous AUX pages by default (Yabin Cui)

  Uprobes updates:

   - Add support to emulate NOP instructions (Jiri Olsa)

   - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa)

  x86 Intel PMU enhancements:

   - Support Intel Auto Counter Reload [ACR] (Kan Liang)

   - Add PMU support for Clearwater Forest (Dapeng Mi)

   - Arch-PEBS preparatory changes: (Dapeng Mi)
       - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs
       - Decouple BTS initialization from PEBS initialization
       - Introduce pairs of PEBS static calls

  x86 AMD PMU enhancements:

   - Use hrtimer for handling overflows in the AMD uncore driver
     (Sandipan Das)

   - Prevent UMC counters from saturating (Sandipan Das)

  Fixes and cleanups:

   - Fix put_ctx() ordering (Frederic Weisbecker)

   - Fix irq work dereferencing garbage (Frederic Weisbecker)

   - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian
     Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan
     Das, Thorsten Blum)"

* tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  perf/headers: Clean up <linux/perf_event.h> a bit
  perf/uapi: Clean up <uapi/linux/perf_event.h> a bit
  perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h>
  mips/perf: Remove driver-specific throttle support
  xtensa/perf: Remove driver-specific throttle support
  sparc/perf: Remove driver-specific throttle support
  loongarch/perf: Remove driver-specific throttle support
  csky/perf: Remove driver-specific throttle support
  arc/perf: Remove driver-specific throttle support
  alpha/perf: Remove driver-specific throttle support
  perf/apple_m1: Remove driver-specific throttle support
  perf/arm: Remove driver-specific throttle support
  s390/perf: Remove driver-specific throttle support
  powerpc/perf: Remove driver-specific throttle support
  perf/x86/zhaoxin: Remove driver-specific throttle support
  perf/x86/amd: Remove driver-specific throttle support
  perf/x86/intel: Remove driver-specific throttle support
  perf: Only dump the throttle log for the leader
  perf: Fix the throttle logic for a group
  perf/core: Add the is_event_in_freq_mode() helper to simplify the code
  ...
2025-05-26 15:40:23 -07:00
Linus Torvalds
181d8e399f vfs-6.16-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 om0+AQDMxKLweJXplqQQ7jxuvW2dEa60YpE2EalEKWGg9YA3KgEA3nI4kyKMKn7Y
 PRFXgIcKvhs62oJLKsq8SGQUqExqvAE=
 =atEw
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Use folios for symlinks in the page cache

     FUSE already uses folios for its symlinks. Mirror that conversion
     in the generic code and the NFS code. That lets us get rid of a few
     folio->page->folio conversions in this path, and some of the few
     remaining users of read_cache_page() / read_mapping_page()

   - Try and make a few filesystem operations killable on the VFS
     inode->i_mutex level

   - Add sysctl vfs_cache_pressure_denom for bulk file operations

     Some workloads need to preserve more dentries than we currently
     allow through out sysctl interface

     A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
     involves scanning all files and caching their metadata (including
     dentries and inodes) in memory. Each HDD contains approximately 2
     million files, resulting in a total of ~20 million cached dentries
     after initialization

     To minimize dentry reclamation, they set vfs_cache_pressure to 1.
     Despite this configuration, memory pressure conditions can still
     trigger reclamation of up to 50% of cached dentries, reducing the
     cache from 20 million to approximately 10 million entries. During
     the subsequent cache rebuild period, any HDFS datanode restart
     operation incurs substantial latency penalties until full cache
     recovery completes

     To maintain service stability, more dentries need to be preserved
     during memory reclamation. The current minimum reclaim ratio (1/100
     of total dentries) remains too aggressive for such workload. This
     patch introduces vfs_cache_pressure_denom for more granular cache
     pressure control

     The configuration [vfs_cache_pressure=1,
     vfs_cache_pressure_denom=10000] effectively maintains the full 20
     million dentry cache under memory pressure, preventing datanode
     restart performance degradation

   - Avoid some jumps in inode_permission() using likely()/unlikely()

   - Avid a memory access which is most likely a cache miss when
     descending into devcgroup_inode_permission()

   - Add fastpath predicts for stat() and fdput()

   - Anonymous inodes currently don't come with a proper mode causing
     issues in the kernel when we want to add useful VFS debug assert.
     Fix that by giving them a proper mode and masking it off when we
     report it to userspace which relies on them not having any mode

   - Anonymous inodes currently allow to change inode attributes because
     the VFS falls back to simple_setattr() if i_op->setattr isn't
     implemented. This means the ownership and mode for every single
     user of anon_inode_inode can be changed. Block that as it's either
     useless or actively harmful. If specific ownership is needed the
     respective subsystem should allocate anonymous inodes from their
     own private superblock

   - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock

   - Add proper tests for anonymous inode behavior

   - Make it easy to detect proper anonymous inodes and to ensure that
     we can detect them in codepaths such as readahead()

  Cleanups:

   - Port pidfs to the new anon_inode_{g,s}etattr() helpers

   - Try to remove the uselib() system call

   - Add unlikely branch hint return path for poll

   - Add unlikely branch hint on return path for core_sys_select

   - Don't allow signals to interrupt getdents copying for fuse

   - Provide a size hint to dir_context for during readdir()

   - Use writeback_iter directly in mpage_writepages

   - Update compression and mtime descriptions in initramfs
     documentation

   - Update main netfs API document

   - Remove useless plus one in super_cache_scan()

   - Remove unnecessary NULL-check guards during setns()

   - Add separate separate {get,put}_cgroup_ns no-op cases

  Fixes:

   - Fix typo in root= kernel parameter description

   - Use KERN_INFO for infof()|info_plog()|infofc()

   - Correct comments of fs_validate_description()

   - Mark an unlikely if condition with unlikely() in
     vfs_parse_monolithic_sep()

   - Delete macro fsparam_u32hex()

   - Remove unused and problematic validate_constant_table()

   - Fix potential unsigned integer underflow in fs_name()

   - Make file-nr output the total allocated file handles"

* tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits)
  fs: Pass a folio to page_put_link()
  nfs: Use a folio in nfs_get_link()
  fs: Convert __page_get_link() to use a folio
  fs/read_write: make default_llseek() killable
  fs/open: make do_truncate() killable
  fs/open: make chmod_common() and chown_common() killable
  include/linux/fs.h: add inode_lock_killable()
  readdir: supply dir_context.count as readdir buffer size hint
  vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
  fuse: don't allow signals to interrupt getdents copying
  Documentation: fix typo in root= kernel parameter description
  include/cgroup: separate {get,put}_cgroup_ns no-op case
  kernel/nsproxy: remove unnecessary guards
  fs: use writeback_iter directly in mpage_writepages
  fs: remove useless plus one in super_cache_scan()
  fs: add S_ANON_INODE
  fs: remove uselib() system call
  device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
  fs/fs_parse: Remove unused and problematic validate_constant_table()
  fs: touch up predicts in inode_permission()
  ...
2025-05-26 09:02:39 -07:00
Lorenz Bauer
828226b69f selftests: bpf: Add a test for mmapable vmlinux BTF
Add a basic test for the ability to mmap /sys/kernel/btf/vmlinux.
Ensure that the data is valid BTF and that it is padded with zero.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20250520-vmlinux-mmap-v5-2-e8c941acc414@isovalent.com
2025-05-23 10:06:28 -07:00
Kuniyuki Iwashima
ae4f2f59e1 tcp: Restrict SO_TXREHASH to TCP socket.
sk->sk_txrehash is only used for TCP.

Let's restrict SO_TXREHASH to TCP to reflect this.

Later, we will make sk_txrehash a part of the union for other
protocol families.

Note that we need to modify BPF selftest not to get/set
SO_TEREHASH for non-TCP sockets.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23 10:24:18 +01:00
Michal Luczaj
c04eeeb2af selftests/bpf: sockmap_listen cleanup: Drop af_inet SOCK_DGRAM redir tests
Remove tests covered by sockmap_redir.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-8-a1ea723f7e7e@rbox.co
2025-05-22 14:26:59 -07:00
Michal Luczaj
f3de1cf621 selftests/bpf: sockmap_listen cleanup: Drop af_unix redir tests
Remove tests covered by sockmap_redir.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-7-a1ea723f7e7e@rbox.co
2025-05-22 14:26:58 -07:00
Michal Luczaj
9266e49d60 selftests/bpf: sockmap_listen cleanup: Drop af_vsock redir tests
Remove tests covered by sockmap_redir.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-6-a1ea723f7e7e@rbox.co
2025-05-22 14:26:58 -07:00
Michal Luczaj
f0709263a0 selftests/bpf: Add selftest for sockmap/hashmap redirection
Test redirection logic. All supported and unsupported redirect combinations
are tested for success and failure respectively.

BPF_MAP_TYPE_SOCKMAP
BPF_MAP_TYPE_SOCKHASH
	x
sk_msg-to-egress
sk_msg-to-ingress
sk_skb-to-egress
sk_skb-to-ingress
	x
AF_INET, SOCK_STREAM
AF_INET6, SOCK_STREAM
AF_INET, SOCK_DGRAM
AF_INET6, SOCK_DGRAM
AF_UNIX, SOCK_STREAM
AF_UNIX, SOCK_DGRAM
AF_VSOCK, SOCK_STREAM
AF_VSOCK, SOCK_SEQPACKET

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-5-a1ea723f7e7e@rbox.co
2025-05-22 14:26:58 -07:00
Michal Luczaj
f266905bb3 selftests/bpf: Introduce verdict programs for sockmap_redir
Instead of piggybacking on test_sockmap_listen, introduce
test_sockmap_redir especially for sockmap redirection tests.

Suggested-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-4-a1ea723f7e7e@rbox.co
2025-05-22 14:26:58 -07:00
Michal Luczaj
b57482b0fe selftests/bpf: Add u32()/u64() to sockmap_helpers
Add integer wrappers for convenient sockmap usage.

While there, fix misaligned trailing slashes.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-3-a1ea723f7e7e@rbox.co
2025-05-22 14:26:58 -07:00
Michal Luczaj
d87857946d selftests/bpf: Add socket_kind_to_str() to socket_helpers
Add function that returns string representation of socket's domain/type.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-2-a1ea723f7e7e@rbox.co
2025-05-22 14:26:58 -07:00
Michal Luczaj
fb1131d5e1 selftests/bpf: Support af_unix SOCK_DGRAM socket pair creation
Handle af_unix in init_addr_loopback(). For pair creation, bind() the peer
socket to make SOCK_DGRAM connect() happy.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250515-selftests-sockmap-redir-v3-1-a1ea723f7e7e@rbox.co
2025-05-22 14:26:58 -07:00
Mykyta Yatsenko
5ead949920 selftests/bpf: Add SKIP_LLVM makefile variable
Introduce SKIP_LLVM makefile variable that allows to avoid using llvm
dependencies when building BPF selftests. This is different from
existing feature-llvm, as the latter is a result of automatic detection
and should not be set by user explicitly.
Avoiding llvm dependencies could be useful for environments that do not
have them, given that as of now llvm dependencies are required only by
jit_disasm_helpers.c.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250522013813.125428-1-mykyta.yatsenko5@gmail.com
2025-05-22 09:31:03 -07:00
Alan Maguire
02f5e7c1f3 selftests/bpf: Test multi-split BTF
Extend split BTF test to cover case where we create split BTF on top of
existing split BTF and add info to it; ensure that such BTF can be
created and handled by searching within it, dumping/comparing to expected.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250519165935.261614-3-alan.maguire@oracle.com
2025-05-20 16:22:30 -07:00
JP Kobryn
5da3bfa029 cgroup: use separate rstat trees for each subsystem
Different subsystems may call cgroup_rstat_updated() within the same
cgroup, resulting in a tree of pending updates from multiple subsystems.
When one of these subsystems is flushed via cgroup_rstat_flushed(), all
other subsystems with pending updates on the tree will also be flushed.

Change the paradigm of having a single rstat tree for all subsystems to
having separate trees for each subsystem. This separation allows for
subsystems to perform flushes without the side effects of other subsystems.
As an example, flushing the cpu stats will no longer cause the memory stats
to be flushed and vice versa.

In order to achieve subsystem-specific trees, change the tree node type
from cgroup to cgroup_subsys_state pointer. Then remove those pointers from
the cgroup and instead place them on the css. Finally, change update/flush
functions to make use of the different node type (css). These changes allow
a specific subsystem to be associated with an update or flush. Separate
rstat trees will now exist for each unique subsystem.

Since updating/flushing will now be done at the subsystem level, there is
no longer a need to keep track of updated css nodes at the cgroup level.
The list management of these nodes done within the cgroup (rstat_css_list
and related) has been removed accordingly.

Conditional guards for checking validity of a given css were placed within
css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc
usage in bpf programs. Guards were also placed within css_rstat_init/exit()
in order to help consolidate calls to them. At call sites for all four
functions, the existing guards were removed.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19 10:28:59 -10:00
Mykyta Yatsenko
b615ce5fbe selftests/bpf: Remove unnecessary link dependencies
Remove llvm dependencies from binaries that do not use llvm libraries.
Filter out libxml2 from llvm dependencies, as it seems that
it is not actually used. This patch reduced link dependencies
for BPF selftests.
The next line was adding llvm dependencies to every target in the
makefile, while the only targets that require those are test
runnners (test_progs, test_progs-no_alu32,...):
```
$(OUTPUT)/$(TRUNNER_BINARY): LDLIBS += $$(LLVM_LDLIBS)
```

Before this change:
ldd linux/tools/testing/selftests/bpf/veristat
    linux-vdso.so.1 (0x00007ffd2c3fd000)
    libelf.so.1 => /lib64/libelf.so.1 (0x00007fe1dcf89000)
    libz.so.1 => /lib64/libz.so.1 (0x00007fe1dcf6f000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fe1dce94000)
    libzstd.so.1 => /lib64/libzstd.so.1 (0x00007fe1dcddd000)
    libxml2.so.2 => /lib64/libxml2.so.2 (0x00007fe1dcc54000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe1dca00000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fe1dc600000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fe1dcfb1000)
    liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fe1dc9d4000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe1dcc38000)

After:
ldd linux/tools/testing/selftests/bpf/veristat
    linux-vdso.so.1 (0x00007ffc83370000)
    libelf.so.1 => /lib64/libelf.so.1 (0x00007f4b87515000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f4b874fb000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f4b87200000)
    libzstd.so.1 => /lib64/libzstd.so.1 (0x00007f4b87444000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f4b8753d000)

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250516195522.311769-1-mykyta.yatsenko5@gmail.com
2025-05-19 09:30:38 -07:00
Kuniyuki Iwashima
4dd372de3f selftests/bpf: Relax TCPOPT_WINDOW validation in test_tcp_custom_syncookie.c.
The custom syncookie test expects TCPOPT_WINDOW to be 7 based on the
kernel’s behaviour at the time, but the upcoming series [0] will bump
it to 10.

Let's relax the test to allow any valid TCPOPT_WINDOW value in the
range 1–14.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/netdev/20250513193919.1089692-1-edumazet@google.com/ #[0]
Link: https://patch.msgid.link/20250514214021.85187-1-kuniyu@amazon.com
2025-05-14 15:13:24 -07:00
Steven Rostedt
ac01fa73f5 tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix
Most tracepoints in the kernel are created with TRACE_EVENT(). The
TRACE_EVENT() macro (and DECLARE_EVENT_CLASS() and DEFINE_EVENT() where in
reality, TRACE_EVENT() is just a helper macro that calls those other two
macros), will create not only a tracepoint (the function trace_<event>()
used in the kernel), it also exposes the tracepoint to user space along
with defining what fields will be saved by that tracepoint.

There are a few places that tracepoints are created in the kernel that are
not exposed to userspace via tracefs. They can only be accessed from code
within the kernel. These tracepoints are created with DEFINE_TRACE()

Most of these tracepoints end with "_tp". This is useful as when the
developer sees that, they know that the tracepoint is for in-kernel only
(meaning it can only be accessed inside the kernel, either directly by the
kernel or indirectly via modules and BPF programs) and is not exposed to
user space.

Instead of making this only a process to add "_tp", enforce it by making
the DECLARE_TRACE() append the "_tp" suffix to the tracepoint. This
requires adding DECLARE_TRACE_EVENT() macros for the TRACE_EVENT() macro
to use that keeps the original name.

Link: https://lore.kernel.org/all/20250418083351.20a60e64@gandalf.local.home/

Cc: netdev <netdev@vger.kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250510163730.092fad5b@gandalf.local.home
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-14 11:19:32 -04:00
Mykyta Yatsenko
c61bcd29ed selftests/bpf: introduce tests for dynptr copy kfuncs
Introduce selftests verifying newly-added dynptr copy kfuncs.
Covering contiguous and non-contiguous memory backed dynptrs.

Disable test_probe_read_user_str_dynptr that triggers bug in
strncpy_from_user_nofault. Patch to fix the issue [1].

[1] https://patchwork.kernel.org/project/linux-mm/patch/20250422131449.57177-1-mykyta.yatsenko5@gmail.com/

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20250512205348.191079-4-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-12 18:32:47 -07:00
Mykyta Yatsenko
3a320ed325 selftests/bpf: Allow skipping docs compilation
Currently rst2man is required to build bpf selftests, as the tool is
used by Makefile.docs. rst2man may be missing in some build
environments and is not essential for selftests. It makes sense to
allow user to skip building docs.

This patch adds SKIP_DOCS variable into bpf selftests Makefile that when
set to 1 allows skipping building docs, for example:
make -C tools/testing/selftests TARGETS=bpf SKIP_DOCS=1

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250510002450.365613-1-mykyta.yatsenko5@gmail.com
2025-05-12 15:18:46 -07:00
Gregory Bell
af8a5125a0 selftests/bpf: test_verifier verbose log overflows
Tests:
 - 458/p ld_dw: xor semi-random 64-bit imms, test 5
 - 501/p scale: scale test 1
 - 502/p scale: scale test 2

fail in verbose mode due to bpf_vlog[] overflowing. These tests
generate large verifier logs that exceed the current buffer size,
causing them to fail to load.

Increase the size of the bpf_vlog[] buffer to accommodate larger
logs and prevent false failures during test runs with verbose output.

Signed-off-by: Gregory Bell <grbell@redhat.com>
Link: https://lore.kernel.org/r/e49267100f07f099a5877a3a5fc797b702bbaf0c.1747058195.git.grbell@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-12 10:43:43 -07:00
Gregory Bell
c5bcc8c781 selftests/bpf: test_verifier verbose causes erroneous failures
When running test_verifier with the -v flag and a test with
`expected_ret==VERBOSE_ACCEPT`, the opts.log_level is unintentionally
overwritten because the verbose flag takes precedence. This leads to
a mismatch in the expected and actual contents of bpf_vlog, causing
tests to fail incorrectly.

Reorder the conditional logic that sets opts.log_level to preserve
the expected log level and prevent it from being overridden by -v.

Signed-off-by: Gregory Bell <grbell@redhat.com>
Link: https://lore.kernel.org/r/182bf00474f817c99f968a9edb119882f62be0f8.1747058195.git.grbell@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-12 10:43:43 -07:00
Jiayuan Chen
be48b8790a selftests/bpf: Add test to cover sockmap with ktls
The selftest can reproduce an issue where we miss the uncharge operation
when freeing msg, which will cause the following warning. We fixed the
issue and added this reproducer to selftest to ensure it will not happen
again.

------------[ cut here ]------------
WARNING: CPU: 1 PID: 40 at net/ipv4/af_inet.c inet_sock_destruct+0x173/0x1d5
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Workqueue: events sk_psock_destroy
RIP: 0010:inet_sock_destruct+0x173/0x1d5
RSP: 0018:ffff8880085cfc18 EFLAGS: 00010202
RAX: 1ffff11003dbfc00 RBX: ffff88801edfe3e8 RCX: ffffffff822f5af4
RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88801edfe16c
RBP: ffff88801edfe184 R08: ffffed1003dbfc31 R09: 0000000000000000
R10: ffffffff822f5ab7 R11: ffff88801edfe187 R12: ffff88801edfdec0
R13: ffff888020376ac0 R14: ffff888020376ac0 R15: ffff888020376a60
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556365155830 CR3: 000000001d6aa000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 __sk_destruct+0x46/0x222
 sk_psock_destroy+0x22f/0x242
 process_one_work+0x504/0x8a8
 ? process_one_work+0x39d/0x8a8
 ? __pfx_process_one_work+0x10/0x10
 ? worker_thread+0x44/0x2ae
 ? __list_add_valid_or_report+0x83/0xea
 ? srso_return_thunk+0x5/0x5f
 ? __list_add+0x45/0x52
 process_scheduled_works+0x73/0x82
 worker_thread+0x1ce/0x2ae

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250425060015.6968-3-jiayuan.chen@linux.dev
2025-05-09 18:10:13 -07:00
Jiri Olsa
d57293db64 selftests/bpf: Add link info test for ref_ctr_offset retrieval
Adding link info test for ref_ctr_offset retrieval for both
uprobe and uretprobe probes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/bpf/20250509153539.779599-3-jolsa@kernel.org
2025-05-09 13:01:08 -07:00
Luis Gerhorst
cf15cdc0f0 selftests/bpf: Fix caps for __xlated/jited_unpriv
Currently, __xlated_unpriv and __jited_unpriv do not work because the
BPF syscall will overwrite info.jited_prog_len and info.xlated_prog_len
with 0 if the process is not bpf_capable(). This bug was not noticed
before, because there is no test that actually uses
__xlated_unpriv/__jited_unpriv.

To resolve this, simply restore the capabilities earlier (but still
after loading the program). Adding this here unconditionally is fine
because the function first checks that the capabilities were initialized
before attempting to restore them.

This will be important later when we add tests that check whether a
speculation barrier was inserted in the correct location.

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Fixes: 9c9f733913 ("selftests/bpf: allow checking xlated programs in verifier_* tests")
Fixes: 7d743e4c75 ("selftests/bpf: __jited test tag to check disassembly after jit")
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250501073603.1402960-2-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09 11:29:11 -07:00
Peilin Ye
d3131466b4 selftests/bpf: Enable non-arena load-acquire/store-release selftests for riscv64
For riscv64, enable all BPF_{LOAD_ACQ,STORE_REL} selftests except the
arena_atomics/* ones (not guarded behind CAN_USE_LOAD_ACQ_STORE_REL),
since arena access is not yet supported.

Acked-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/9d878fa99a72626208a8eed3c04c4140caf77fda.1746588351.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09 10:05:27 -07:00
Peilin Ye
0357f29de8 selftests/bpf: Verify zero-extension behavior in load-acquire tests
Verify that 8-, 16- and 32-bit load-acquires are zero-extending by using
immediate values with their highest bit set.  Do the same for the 64-bit
variant to keep the style consistent.

Acked-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/11097fd515f10308b3941469ee4c86cb8872db3f.1746588351.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09 10:05:27 -07:00
Peilin Ye
6e492ffcab selftests/bpf: Avoid passing out-of-range values to __retval()
Currently, we pass 0x1234567890abcdef to __retval() for the following
two tests:

  verifier_load_acquire/load_acquire_64
  verifier_store_release/store_release_64

However, the upper 32 bits of that value are being ignored, since
__retval() expects an int.  Actually, the tests would still pass even if
I change '__retval(0x1234567890abcdef)' to e.g. '__retval(0x90abcdef)'.

Restructure the tests a bit to test the entire 64-bit values properly.
Do the same to their 8-, 16- and 32-bit variants as well to keep the
style consistent.

Fixes: ff3afe5da9 ("selftests/bpf: Add selftests for load-acquire and store-release instructions")
Acked-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/d67f4c6f6ee0d0388cbce1f4892ec4176ee2d604.1746588351.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09 10:05:27 -07:00
Peilin Ye
13fdecf345 selftests/bpf: Use CAN_USE_LOAD_ACQ_STORE_REL when appropriate
Instead of open-coding the conditions, use
'#ifdef CAN_USE_LOAD_ACQ_STORE_REL' to guard the following tests:

  verifier_precision/bpf_load_acquire
  verifier_precision/bpf_store_release
  verifier_store_release/*

Note that, for the first two tests in verifier_precision.c, switching to
'#ifdef CAN_USE_LOAD_ACQ_STORE_REL' means also checking if
'__clang_major__ >= 18', which has already been guaranteed by the outer
'#if' check.

Acked-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com> # QEMU/RVA23
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/45d7e025f6e390a8ff36f08fc51e31705ac896bd.1746588351.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09 10:05:27 -07:00
Martin KaFai Lau
29318b4d5d selftests/bpf: Add test for bpf_list_{front,back}
This patch adds the "list_peek" test to use the new
bpf_list_{front,back} kfunc.

The test_{front,back}* tests ensure that the return value
is a non_own_ref node pointer and requires the spinlock to be held.

Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # check non_own_ref marking
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-9-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06 10:21:06 -07:00
Martin KaFai Lau
47ada65c5c selftests/bpf: Add tests for bpf_rbtree_{root,left,right}
This patch has a much simplified rbtree usage from the
kernel sch_fq qdisc. It has a "struct node_data" which can be
added to two different rbtrees which are ordered by different keys.

The test first populates both rbtrees. Then search for a lookup_key
from the "groot0" rbtree. Once the lookup_key is found, that node
refcount is taken. The node is then removed from another "groot1"
rbtree.

While searching the lookup_key, the test will also try to remove
all rbnodes in the path leading to the lookup_key.

The test_{root,left,right}_spinlock_true tests ensure that the
return value of the bpf_rbtree functions is a non_own_ref node pointer.
This is done by forcing an verifier error by calling a helper
bpf_jiffies64() while holding the spinlock. The tests then
check for the verifier message
"call bpf_rbtree...R0=rcu_ptr_or_null_node..."

The other test_{root,left,right}_spinlock_false tests ensure that
they must be called with spinlock held.

Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # Check non_own_ref marking
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06 10:21:05 -07:00
Martin KaFai Lau
2ddef1783c bpf: Allow refcounted bpf_rb_node used in bpf_rbtree_{remove,left,right}
The bpf_rbtree_{remove,left,right} requires the root's lock to be held.
They also check the node_internal->owner is still owned by that root
before proceeding, so it is safe to allow refcounted bpf_rb_node
pointer to be used in these kfuncs.

In a bpf fq implementation which is much closer to the kernel fq,
https://lore.kernel.org/bpf/20250418224652.105998-13-martin.lau@linux.dev/,
a networking flow (allocated by bpf_obj_new) can be added to two different
rbtrees. There are cases that the flow is searched from one rbtree,
held the refcount of the flow, and then removed from another rbtree:

struct fq_flow {
	struct bpf_rb_node	fq_node;
	struct bpf_rb_node	rate_node;
	struct bpf_refcount	refcount;
	unsigned long		sk_long;
};

int bpf_fq_enqueue(...)
{
	/* ... */

	bpf_spin_lock(&root->lock);
	while (can_loop) {
		/* ... */
		if (!p)
			break;
		gc_f = bpf_rb_entry(p, struct fq_flow, fq_node);
		if (gc_f->sk_long == sk_long) {
			f = bpf_refcount_acquire(gc_f);
			break;
		}
		/* ... */
	}
	bpf_spin_unlock(&root->lock);

	if (f) {
		bpf_spin_lock(&q->lock);
		bpf_rbtree_remove(&q->delayed, &f->rate_node);
		bpf_spin_unlock(&q->lock);
	}
}

bpf_rbtree_{left,right} do not need this change but are relaxed together
with bpf_rbtree_remove instead of adding extra verifier logic
to exclude these kfuncs.

To avoid bi-sect failure, this patch also changes the selftests together.

The "rbtree_api_remove_unadded_node" is not expecting verifier's error.
The test now expects bpf_rbtree_remove(&groot, &m->node) to return NULL.
The test uses __retval(0) to ensure this NULL return value.

Some of the "only take non-owning..." failure messages are changed also.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20250506015857.817950-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-06 10:21:05 -07:00
Ihor Solodrai
a28fe31603 selftests/bpf: Remove sockmap_ktls disconnect_after_delete test
"sockmap_ktls disconnect_after_delete" is effectively moot after
disconnect has been disabled for TLS [1][2]. Remove the test
completely.

[1] https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev/
[2] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/

Signed-off-by: Ihor Solodrai <isolodrai@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250502185221.1556192-1-isolodrai@meta.com
2025-05-05 14:20:04 -07:00
Jakub Kicinski
b4cd2ee54c bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaBVftAAKCRAraS+Z+3y5
 EzJ7AP9dtYuBHmU0tB5upuLzQ9sVxaCXs2linfRKK40A5YJDcgD/fBQqPzhxCqZR
 moHEqelMLgrAUcyro5egdRJZPcaTfQE=
 =6Las
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-05-02

We've added 14 non-merge commits during the last 10 day(s) which contain
a total of 13 files changed, 740 insertions(+), 121 deletions(-).

The main changes are:

1) Avoid skipping or repeating a sk when using a UDP bpf_iter,
   from Jordan Rife.

2) Fixed a crash when a bpf qdisc is set in
   the net.core.default_qdisc, from Amery Hung.

3) A few other fixes in the bpf qdisc, from Amery Hung.
   - Always call qdisc_watchdog_init() in the .init prologue such that
     the .reset/.destroy epilogue can always call qdisc_watchdog_cancel()
     without issue.
   - bpf_qdisc_init_prologue() was incorrectly returning an error
     when the bpf qdisc is set as the default_qdisc and the mq is creating
     the default_qdisc. It is now fixed.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Cleanup bpf qdisc selftests
  selftests/bpf: Test attaching a bpf qdisc with incomplete operators
  bpf: net_sched: Make some Qdisc_ops ops mandatory
  selftests/bpf: Test setting and creating bpf qdisc as default qdisc
  bpf: net_sched: Fix bpf qdisc init prologue when set as default qdisc
  selftests/bpf: Add tests for bucket resume logic in UDP socket iterators
  selftests/bpf: Return socket cookies from sock_iter_batch progs
  bpf: udp: Avoid socket skips and repeats during iteration
  bpf: udp: Use bpf_udp_iter_batch_item for bpf_udp_iter_state batch items
  bpf: udp: Get rid of st_bucket_done
  bpf: udp: Make sure iter->batch always contains a full bucket snapshot
  bpf: udp: Make mem flags configurable through bpf_iter_udp_realloc_batch
  bpf: net_sched: Fix using bpf qdisc as default qdisc
  selftests/bpf: Fix compilation errors
====================

Link: https://patch.msgid.link/20250503010755.4030524-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-05 13:22:58 -07:00
Amery Hung
2f9838e257 selftests/bpf: Cleanup bpf qdisc selftests
Some cleanups:
- Remove unnecessary kfuncs declaration
- Use _ns in the test name to run tests in a separate net namespace
- Call skeleton __attach() instead of bpf_map__attach_struct_ops() to
  simplify tests.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02 15:51:17 -07:00
Amery Hung
6cda0e2c47 selftests/bpf: Test attaching a bpf qdisc with incomplete operators
Implement .destroy in bpf_fq and bpf_fifo as it is now mandatory.

Test attaching a bpf qdisc with a missing operator .init. This is not
allowed as bpf qdisc qdisc_watchdog_cancel() could have been called with
an uninitialized timer.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02 15:42:48 -07:00
Amery Hung
6d080362c3 selftests/bpf: Test setting and creating bpf qdisc as default qdisc
First, test that bpf qdisc can be set as default qdisc. Then, attach
an mq qdisc to see if bpf qdisc can be successfully created and grafted.

The test is a sequential test as net.core.default_qdisc is global.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02 15:28:29 -07:00
Jordan Rife
c58dcc1dbe selftests/bpf: Add tests for bucket resume logic in UDP socket iterators
Introduce a set of tests that exercise various bucket resume scenarios:

* remove_seen resumes iteration after removing a socket from the bucket
  that we've already processed. Before, with the offset-based approach,
  this test would have skipped an unseen socket after resuming
  iteration. With the cookie-based approach, we now see all sockets
  exactly once.
* remove_unseen exercises the condition where the next socket that we
  would have seen is removed from the bucket before we resume iteration.
  This tests the scenario where we need to scan past the first cookie in
  our remembered cookies list to find the socket from which to resume
  iteration.
* remove_all exercises the condition where all sockets we remembered
  were removed from the bucket to make sure iteration terminates and
  returns no more results.
* add_some exercises the condition where a few, but not enough to
  trigger a realloc, sockets are added to the head of the current bucket
  between reads. Before, with the offset-based approach, this test would
  have repeated sockets we've already seen. With the cookie-based
  approach, we now see all sockets exactly once.
* force_realloc exercises the condition that we need to realloc the
  batch on a subsequent read, since more sockets than can be held in the
  current batch array were added to the current bucket. This exercies
  the logic inside bpf_iter_udp_realloc_batch that copies cookies into
  the new batch to make sure nothing is skipped or repeated.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02 12:07:53 -07:00
Jordan Rife
4a0614e18c selftests/bpf: Return socket cookies from sock_iter_batch progs
Extend the iter_udp_soreuse and iter_tcp_soreuse programs to write the
cookie of the current socket, so that we can track the identity of the
sockets that the iterator has seen so far. Update the existing do_test
function to account for this change to the iterator program output. At
the same time, teach both programs to work with AF_INET as well.

Signed-off-by: Jordan Rife <jordan@jrife.io>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-05-02 12:07:53 -07:00
Jakub Kicinski
337079d31f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc5).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01 15:11:38 -07:00
Alan Maguire
f263336a41 selftests/bpf: Add btf dedup test covering module BTF dedup
Recently issues were observed with module BTF deduplication failures
[1].  Add a dedup selftest that ensures that core kernel types are
referenced from split BTF as base BTF types.  To do this use bpf_testmod
functions which utilize core kernel types, specifically

ssize_t
bpf_testmod_test_write(struct file *file, struct kobject *kobj,
                       struct bin_attribute *bin_attr,
                       char *buf, loff_t off, size_t len);

__bpf_kfunc struct sock *bpf_kfunc_call_test3(struct sock *sk);

__bpf_kfunc void bpf_kfunc_call_test_pass_ctx(struct __sk_buff *skb);

For each of these ensure that the types they reference -
struct file, struct kobject, struct bin_attr etc - are in base BTF.
Note that because bpf_testmod.ko is built with distilled base BTF
the associated reference types - i.e. the PTR that points at a
"struct file" - will be in split BTF.  As a result the test resolves
typedef and pointer references and verifies the pointed-at or
typedef'ed type is in base BTF.  Because we use BTF from
/sys/kernel/btf/bpf_testmod relocation has occurred for the
referenced types and they will be base - not distilled base - types.

For large-scale dedup issues, we see such types appear in split BTF and
as a result this test fails.  Hence it is proposed as a test which will
fail when large-scale dedup issues have occurred.

[1] https://lore.kernel.org/dwarves/CAADnVQL+-LiJGXwxD3jEUrOonO-fX0SZC8496dVzUXvfkB7gYQ@mail.gmail.com/

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250430134249.2451066-1-alan.maguire@oracle.com
2025-05-01 14:05:49 -07:00
Lorenzo Bianconi
3678331ca7 selftests/bpf: xdp_metadata: Check XDP_REDIRCT support for dev-bound progs
Improve xdp_metadata bpf selftest in order to check it is possible for a
XDP dev-bound program to perform XDP_REDIRECT into a DEVMAP but it is still
not allowed to attach a XDP dev-bound program to a DEVMAP entry.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
2025-05-01 12:54:06 -07:00
Feng Yang
1ce65102d2 selftests/bpf: Fix compilation errors
If the CONFIG_NET_SCH_BPF configuration is not enabled,
the BPF test compilation will report the following error:
In file included from progs/bpf_qdisc_fq.c:39:
progs/bpf_qdisc_common.h:17:51: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility]
   17 | void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym;
      |                                                   ^
progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility]
  309 |              struct bpf_sk_buff_ptr *to_free)
      |                     ^
progs/bpf_qdisc_fq.c:309:14: error: declaration of 'struct bpf_sk_buff_ptr' will not be visible outside of this function [-Werror,-Wvisibility]
progs/bpf_qdisc_fq.c:308:5: error: conflicting types for '____bpf_fq_enqueue'

Fixes: 11c701639b ("selftests/bpf: Add a basic fifo qdisc test")
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250428033445.58113-1-yangfeng59949@163.com
2025-04-29 13:47:28 -07:00
T.J. Mercier
38d976c32d selftests/bpf: Fix kmem_cache iterator draining
The closing parentheses around the read syscall is misplaced, causing
single byte reads from the iterator instead of buf sized reads. While
the end result is the same, many more read calls than necessary are
performed.

$ tools/testing/selftests/bpf/vmtest.sh  "./test_progs -t kmem_cache_iter"
145/1   kmem_cache_iter/check_task_struct:OK
145/2   kmem_cache_iter/check_slabinfo:OK
145/3   kmem_cache_iter/open_coded_iter:OK
145     kmem_cache_iter:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED

Fixes: a496d0cdc8 ("selftests/bpf: Add a test for kmem_cache_iter")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20250428180256.1482899-1-tjmercier@google.com
2025-04-29 13:21:48 -07:00
Alexei Starovoitov
224ee86639 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc4
Cross-merge bpf and other fixes after downstream PRs.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-28 08:40:45 -07:00
Peilin Ye
f000791078 selftests/bpf: Correct typo in __clang_major__ macro
Make sure that CAN_USE_BPF_ST test (compute_live_registers/store) is
enabled when __clang_major__ >= 18.

Fixes: 2ea8f6a1cd ("selftests/bpf: test cases for compute_live_registers()")
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/20250425213712.1542077-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-25 16:56:10 -07:00
Brandon Kammerdiener
3d9c463f95 selftests/bpf: add test for softlock when modifying hashmap while iterating
Add test that modifies the map while it's being iterated in such a way that
hangs the kernel thread unless the _safe fix is applied to
bpf_for_each_hash_elem.

Signed-off-by: Brandon Kammerdiener <brandon.kammerdiener@intel.com>
Link: https://lore.kernel.org/r/20250424153246.141677-3-brandon.kammerdiener@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
2025-04-25 08:36:59 -07:00
Ilya Leoshkevich
be55219915 selftests/bpf: Fix endianness issue in __qspinlock declaration
Copy the big-endian field declarations from qspinlock_types.h,
otherwise some properties won't hold on big-endian systems. For
example, assigning lock->val = 1 should result in lock->locked == 1,
which is not the case there.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250424165525.154403-4-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-24 17:24:28 -07:00
Ilya Leoshkevich
0240e5a943 selftests/bpf: Fix arena_spin_lock on systems with less than 16 CPUs
test_arena_spin_lock_size() explicitly requires having at least 2 CPUs,
but if the machine has less than 16, then pthread_setaffinity_np() call
in spin_lock_thread() fails.

Cap threads to the number of CPUs.

Alternative solutions are raising the number of required CPUs to 16, or
pinning multiple threads to the same CPU, but they are not that useful.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250424165525.154403-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-24 17:24:28 -07:00
Ilya Leoshkevich
ddfd1f30b5 selftests/bpf: Fix arena_spin_lock.c build dependency
Changing bpf_arena_spin_lock.h does not lead to recompiling
arena_spin_lock.c. By convention, all BPF progs depend on all
header files in progs/, so move this header file there. There
are no other users besides arena_spin_lock.c.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250424165525.154403-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-24 17:24:27 -07:00
Ilya Leoshkevich
60400cd2b9 selftests/bpf: Set MACs during veth creation in tc_redirect
tc_redirect/tc_redirect_dtime fails intermittently on some systems
with:

   (network_helpers.c:303: errno: Operation now in progress) Failed to connect to server

The problem is that on these systems systemd-networkd and systemd-udevd
are installed in the default configuration, which includes:

    /usr/lib/systemd/network/99-default.link
    /usr/lib/udev/rules.d/80-net-setup-link.rules

These configs instruct systemd to change MAC addresses of newly created
interfaces, which includes the ones created by BPF selftests. In this
particular case it causes SYN+ACK packets to be dropped, because they
get the PACKET_OTHERHOST type - the fact that this causes a connect()
on a blocking socket to return -EINPROGRESS looks like a bug, which
needs to be investigated separately.

systemd won't change the MAC address if the kernel reports that it was
already set by userspace; the NET_ADDR_SET check in
link_generate_new_hw_addr() is responsible for this.

In order to eliminate the race window between systemd and the test,
set MAC addresses during link creation. Ignore checkpatch's "quoted
string split across lines" warning, since it points to a command line,
and not a user-visible message.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250416124845.584362-1-iii@linux.ibm.com
2025-04-23 16:41:49 -07:00
KaFai Wan
4c0a42c500 selftests/bpf: Add test to access const void pointer argument in tracing program
Adding verifier test for accessing const void pointer argument in
tracing programs.

The test program loads 1st argument of bpf_fentry_test10 function
which is const void pointer and checks that verifier allows that.

Signed-off-by: KaFai Wan <mannkafai@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250423121329.3163461-3-mannkafai@gmail.com
2025-04-23 11:26:22 -07:00
Ihor Solodrai
f2858f3081 selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure
"sockmap_ktls disconnect_after_delete" test has been failing on BPF CI
after recent merges from netdev:
* https://github.com/kernel-patches/bpf/actions/runs/14458537639
* https://github.com/kernel-patches/bpf/actions/runs/14457178732

It happens because disconnect has been disabled for TLS [1], and it
renders the test case invalid.

Removing all the test code creates a conflict between bpf and
bpf-next, so for now only remove the offending assert [2].

The test will be removed later on bpf-next.

[1] https://lore.kernel.org/netdev/20250404180334.3224206-1-kuba@kernel.org/
[2] https://lore.kernel.org/bpf/cfc371285323e1a3f3b006bfcf74e6cf7ad65258@linux.dev/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/bpf/20250416170246.2438524-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-23 10:50:46 -07:00
Feng Yang
9b72f3e5b7 selftests/bpf: Add test for attaching kprobe with long event names
This test verifies that attaching kprobe/kretprobe with long event names
does not trigger EINVAL errors.

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250417014848.59321-4-yangfeng59949@163.com
2025-04-22 17:13:37 -07:00
Feng Yang
e1be7c45d2 selftests/bpf: Add test for attaching uprobe with long event names
This test verifies that attaching uprobe/uretprobe with long event names
does not trigger EINVAL errors.

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250417014848.59321-3-yangfeng59949@163.com
2025-04-22 17:13:37 -07:00
Malaya Kumar Rout
be2fea9c07 selftests/bpf: Close the file descriptor to avoid resource leaks
Static analysis found an issue in bench_htab_mem.c and sk_assign.c

cppcheck output before this patch:
tools/testing/selftests/bpf/benchs/bench_htab_mem.c:284:3: error: Resource leak: fd [resourceLeak]
tools/testing/selftests/bpf/prog_tests/sk_assign.c:41:3: error: Resource leak: tc [resourceLeak]

cppcheck output after this patch:
No resource leaks found

Fix the issue by closing the file descriptors fd and tc.

Signed-off-by: Malaya Kumar Rout <malayarout91@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250421174405.26080-1-malayarout91@gmail.com
2025-04-22 14:29:58 -07:00
Jakub Kicinski
07e32237ed bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaAFFvgAKCRAraS+Z+3y5
 E/1OAP9SGmTMgHuHLlF8en+MaYdtwgcHy6uurXgbSQAAV/RwwQEAh2oXZE1D9I7a
 EtxsaJYqbbhD09RPwWa2Rd8iJrJYXQk=
 =qcoU
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Martin KaFai Lau says:

====================
pull-request: bpf-next 2025-04-17

We've added 12 non-merge commits during the last 9 day(s) which contain
a total of 18 files changed, 1748 insertions(+), 19 deletions(-).

The main changes are:

1) bpf qdisc support, from Amery Hung.
   A qdisc can be implemented in bpf struct_ops programs and
   can be used the same as other existing qdiscs in the
   "tc qdisc" command.

2) Add xsk tail adjustment tests, from Tushar Vyavahare.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  selftests/bpf: Test attaching bpf qdisc to mq and non root
  selftests/bpf: Add a bpf fq qdisc to selftest
  selftests/bpf: Add a basic fifo qdisc test
  libbpf: Support creating and destroying qdisc
  bpf: net_sched: Disable attaching bpf qdisc to non root
  bpf: net_sched: Support updating bstats
  bpf: net_sched: Add a qdisc watchdog timer
  bpf: net_sched: Add basic bpf qdisc kfuncs
  bpf: net_sched: Support implementation of Qdisc_ops in bpf
  bpf: Prepare to reuse get_ctx_arg_idx
  selftests/xsk: Add tail adjustment tests and support check
  selftests/xsk: Add packet stream replacement function
====================

Link: https://patch.msgid.link/20250417184338.3152168-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-21 18:51:08 -07:00
Alexei Starovoitov
5709be4c35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3
Cross-merge bpf and other fixes after downstream PRs.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-21 08:04:38 -07:00
Christian Brauner
79beea2db0
fs: remove uselib() system call
This system call has been deprecated for quite a while now.
Let's try and remove it from the kernel completely.

Link: https://lore.kernel.org/20250415-kanufahren-besten-02ac00e6becd@brauner
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-21 10:27:59 +02:00
Jiri Olsa
fe8e5a3215 selftests/bpf: Add 5-byte NOP uprobe trigger benchmark
Add a 5-byte NOP uprobe trigger benchmark (x86_64 specific) to measure
uprobes/uretprobes on top of NOP5 instructions.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20250414083647.1234007-2-jolsa@kernel.org
2025-04-18 09:03:45 +02:00
Amery Hung
2b7b5b7f10 selftests/bpf: Test attaching bpf qdisc to mq and non root
Until we are certain that existing classful qdiscs work with bpf qdisc,
make sure we don't allow attaching a bpf qdisc to non root. Meanwhile,
attaching to mq is allowed.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409214606.2000194-11-ameryhung@gmail.com
2025-04-17 10:54:41 -07:00
Amery Hung
2b59bd9e4e selftests/bpf: Add a bpf fq qdisc to selftest
This test implements a more sophisticated qdisc using bpf. The bpf fair-
queueing (fq) qdisc gives each flow an equal chance to transmit data. It
also respects the timestamp of skb for rate limiting.

Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409214606.2000194-10-ameryhung@gmail.com
2025-04-17 10:54:41 -07:00
Amery Hung
11c701639b selftests/bpf: Add a basic fifo qdisc test
This selftest includes a bare minimum fifo qdisc, which simply enqueues
sk_buffs into the back of a bpf list and dequeues from the front of the
list.

Signed-off-by: Amery Hung <amery.hung@bytedance.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20250409214606.2000194-9-ameryhung@gmail.com
2025-04-17 10:54:41 -07:00
Jiapeng Chong
7d0b43b68d selftest/bpf/benchs: Remove duplicate sys/types.h header
./tools/testing/selftests/bpf/benchs/bench_sockmap.c: sys/types.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=20436
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250415061459.11644-1-jiapeng.chong@linux.alibaba.com
2025-04-15 11:03:57 -07:00
Linus Torvalds
b676ac484f bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmf6sD8ACgkQ6rmadz2v
 bTq86w//bbg2S1ZhSXXQvgRSbxfecvJ0r6XGDOaMsKxPXcqpbaMoSCYx2D8puO+b
 xm0vc+5qXlzuTHq9I8flDKrWdA+/sHxLQhXjcBA796vaY6IgJEnapf3kENyzZ3Vp
 agpNPlZe9FLaANDRivTFPVgzVjr07/3eL7VKItASksb/3yjBSa+vrIJVfGF1krQT
 slxTMzVMzB+p0MdKVjmeGn5EodWXp8TdVzQBPb8vnCn7U1h1HULSh4j1+nZ/Z1yr
 zC4/pVPmdDJe1H8ghBGm4f0nY+EwXPtZiVbXnYS2FhgjvthRKFYIyxN9F6kg7AD7
 NG0T6xw/QYNfPTR40PSiV/WHhH5qa2zRVtlepVU7tqqmsyRXi+0Eq/MfJyiuNzgN
 WWmJec0O/Ax4r2Xs/QgX3mFlRnLNi5gmc7fuOARmayAlqElZ9QdB2x6ebW5Fk4Qx
 9oyQACpcu6/oUKgeMSo52MDa82wUPPxpC6qdsefmQYaAcOKM5MD4SNd+eEnfX03E
 RAaItTW9az57a2BL9C/ejJO/SwY4Er+O8B3PO7GaKiURMSZa5nVlY+2QB2fJy6TA
 7IvSYjFD5E4risMbZgPFCqWkQ0yHbY7zEn/tbcNC5AFZoKv70jELPQTLPXq7UPLe
 BuKoL9VJyeXF7E1MQqQH33q3tfcwlIL++piCNHvTQoPadEba2dM=
 =Mezb
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Followup fixes for resilient spinlock (Kumar Kartikeya Dwivedi):
     - Make res_spin_lock test less verbose, since it was spamming BPF
       CI on failure, and make the check for AA deadlock stronger
     - Fix rebasing mistake and use architecture provided
       res_smp_cond_load_acquire
     - Convert BPF maps (queue_stack and ringbuf) to resilient spinlock
       to address long standing syzbot reports

 - Make sure that classic BPF load instruction from SKF_[NET|LL]_OFF
   offsets works when skb is fragmeneted (Willem de Bruijn)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Convert ringbuf map to rqspinlock
  bpf: Convert queue_stack map to rqspinlock
  bpf: Use architecture provided res_smp_cond_load_acquire
  selftests/bpf: Make res_spin_lock AA test condition stronger
  selftests/net: test sk_filter support for SKF_NET_OFF on frags
  bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
  selftests/bpf: Make res_spin_lock test less verbose
2025-04-12 12:48:10 -07:00
Kumar Kartikeya Dwivedi
1ddb9ad2ac selftests/bpf: Make res_spin_lock AA test condition stronger
Let's make sure that we see a EDEADLK and ETIMEDOUT whenever checking
for the AA tests (in case of simple AA and AA after exhausting 31
entries).

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250410170023.2670683-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-10 12:45:37 -07:00
Tushar Vyavahare
4b30209255 selftests/xsk: Add tail adjustment tests and support check
Introduce tail adjustment functionality in xskxceiver using
bpf_xdp_adjust_tail(). Add `xsk_xdp_adjust_tail` to modify packet sizes
and drop unmodified packets. Implement `is_adjust_tail_supported` to check
helper availability. Develop packet resizing tests, including shrinking
and growing scenarios, with functions for both single-buffer and
multi-buffer cases. Update the test framework to handle various scenarios
and adjust MTU settings. These changes enhance the testing of packet tail
adjustments, improving AF_XDP framework reliability.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250410033116.173617-3-tushar.vyavahare@intel.com
2025-04-10 10:08:55 -07:00
Tushar Vyavahare
3e730fe2af selftests/xsk: Add packet stream replacement function
Add pkt_stream_replace_ifobject function to replace the packet stream for
a given ifobject.

Enable separate TX and RX packet replacement, allowing RX side packet
length adjustments using bpf_xdp_adjust_tail() in the upcoming patch.
Currently, pkt_stream_replace() works on both TX and RX packet streams,
and this new function provides the ability to modify one of them.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250410033116.173617-2-tushar.vyavahare@intel.com
2025-04-10 10:07:48 -07:00
Hou Tao
7c6fb1cf33 selftests/bpf: Add test case for atomic update of fd htab
Add a test case to verify the atomic update of existing elements in the
htab of maps. The test proceeds in three steps:

1) fill the outer map with keys in the range [0, 8]
For each inner array map, the value of its first element is set as the
key used to lookup the inner map.

2) create 16 threads to lookup these keys concurrently
Each lookup thread first lookups the inner map, then it checks whether
the first value of the inner array map is the same as the key used to
lookup the inner map.

3) create 8 threads to overwrite these keys concurrently
Each update thread first creates an inner array, it sets the first value
of the array to the key used to update the outer map, then it uses the
key and the inner map to update the outer map.

Without atomic update support, the lookup operation may return -ENOENT
during the lookup of outer map, or return -EINVAL during the comparison
of the first value in the inner map and the key used for inner map, and
the test will fail. After the atomic update change, both the lookup and
the comparison will succeed.

Given that the update of outer map is slow, the test case sets the loop
number for each thread as 5 to reduce the total running time. However,
the loop number could also be adjusted through FD_HTAB_LOOP_NR
environment variable.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250401062250.543403-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09 20:12:54 -07:00
Jiayuan Chen
7b2fa44de5 selftest/bpf/benchs: Add benchmark for sockmap usage
Add TCP+sockmap-based benchmark.
Since sockmap's own update and delete operations are generally less
critical, the performance of the fast forwarding framework built upon
it is the key aspect.

Also with cgset/cgexec, we can observe the behavior of sockmap under
memory pressure.

The benchmark can be run with:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
'''

In the future, we plan to move socket_helpers.h out of the prog_tests
directory to make it accessible for the benchmark. This will enable
better support for various socket types.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250407142234.47591-5-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-09 19:59:00 -07:00