Commit Graph

5219 Commits

Author SHA1 Message Date
Eduard Zingerman
223ffb6a3d selftests/bpf: add reproducer for spurious precision propagation through calls
Add a test for the scenario described in the previous commit:
an iterator loop with two paths where one ties r2/r7 via
shared scalar id and skips a call, while the other goes
through the call. Precision marks from the linked registers
get spuriously propagated to the call path via
propagate_precision(), hitting "backtracking call unexpected
regs" in backtrack_insn().

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-2-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 21:50:05 -08:00
Eduard Zingerman
2658a1720a bpf: collect only live registers in linked regs
Fix an inconsistency between func_states_equal() and
collect_linked_regs():
- regsafe() uses check_ids() to verify that cached and current states
  have identical register id mapping.
- func_states_equal() calls regsafe() only for registers computed as
  live by compute_live_registers().
- clean_live_states() is supposed to remove dead registers from cached
  states, but it can skip states belonging to an iterator-based loop.
- collect_linked_regs() collects all registers sharing the same id,
  ignoring the marks computed by compute_live_registers().
  Linked registers are stored in the state's jump history.
- backtrack_insn() marks all linked registers for an instruction
  as precise whenever one of the linked registers is precise.

The above might lead to a scenario:
- There is an instruction I with register rY known to be dead at I.
- Instruction I is reached via two paths: first A, then B.
- On path A:
  - There is an id link between registers rX and rY.
  - Checkpoint C is created at I.
  - Linked register set {rX, rY} is saved to the jump history.
  - rX is marked as precise at I, causing both rX and rY
    to be marked precise at C.
- On path B:
  - There is no id link between registers rX and rY,
    otherwise register states are sub-states of those in C.
  - Because rY is dead at I, check_ids() returns true.
  - Current state is considered equal to checkpoint C,
    propagate_precision() propagates spurious precision
    mark for register rY along the path B.
  - Depending on a program, this might hit verifier_bug()
    in the backtrack_insn(), e.g. if rY ∈  [r1..r5]
    and backtrack_insn() spots a function call.

The reproducer program is in the next patch.
This was hit by sched_ext scx_lavd scheduler code.

Changes in tests:
- verifier_scalar_ids.c selftests need modification to preserve
  some registers as live for __msg() checks.
- exceptions_assert.c adjusted to match changes in the verifier log,
  R0 is dead after conditional instruction and thus does not get
  range.
- precise.c adjusted to match changes in the verifier log, register r9
  is dead after comparison and it's range is not important for test.

Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Fixes: 0fb3cf6110 ("bpf: use register liveness information for func_states_equal")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-linked-regs-and-propagate-precision-v1-1-18e859be570d@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 21:49:40 -08:00
Eduard Zingerman
d87c9305a8 Revert "selftests/bpf: Update reg_bound range refinement logic"
This reverts commit da653de268.
Removed logic is now covered by range_refine_in_halves()
which handles both 32-bit and 64-bit refinements.

Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-bpf-32-bit-range-overflow-v3-3-f7f67e060a6b@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 18:16:17 -08:00
Eduard Zingerman
f81fdfd167 selftests/bpf: test refining u32/s32 bounds when ranges cross min/max boundary
Two test cases for signed/unsigned 32-bit bounds refinement
when s32 range crosses the sign boundary:
- s32 range [S32_MIN..1] overlapping with u32 range [3..U32_MAX],
  s32 range tail before sign boundary overlaps with u32 range.
- s32 range [-3..5] overlapping with u32 range [0..S32_MIN+3],
  s32 range head after the sign boundary overlaps with u32 range.

This covers both branches added in the __reg32_deduce_bounds().

Also, crossing_32_bit_signed_boundary_2() no longer triggers invariant
violations.

Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Reviewed-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-bpf-32-bit-range-overflow-v3-2-f7f67e060a6b@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 18:16:17 -08:00
Eduard Zingerman
fbc7aef517 bpf: Fix u32/s32 bounds when ranges cross min/max boundary
Same as in __reg64_deduce_bounds(), refine s32/u32 ranges
in __reg32_deduce_bounds() in the following situations:

- s32 range crosses U32_MAX/0 boundary, positive part of the s32 range
  overlaps with u32 range:

  0                                                   U32_MAX
  |  [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx]              |
  |----------------------------|----------------------------|
  |xxxxx s32 range xxxxxxxxx]                       [xxxxxxx|
  0                     S32_MAX S32_MIN                    -1

- s32 range crosses U32_MAX/0 boundary, negative part of the s32 range
  overlaps with u32 range:

  0                                                   U32_MAX
  |              [xxxxxxxxxxxxxx u32 range xxxxxxxxxxxxxx]  |
  |----------------------------|----------------------------|
  |xxxxxxxxx]                       [xxxxxxxxxxxx s32 range |
  0                     S32_MAX S32_MIN                    -1

- No refinement if ranges overlap in two intervals.

This helps for e.g. consider the following program:

   call %[bpf_get_prandom_u32];
   w0 &= 0xffffffff;
   if w0 < 0x3 goto 1f;    // on fall-through u32 range [3..U32_MAX]
   if w0 s> 0x1 goto 1f;   // on fall-through s32 range [S32_MIN..1]
   if w0 s< 0x0 goto 1f;   // range can be narrowed to  [S32_MIN..-1]
   r10 = 0;
1: ...;

The reg_bounds.c selftest is updated to incorporate identical logic,
refinement based on non-overflowing range halves:

  ((x ∩ [0, smax]) ∩ (y ∩ [0, smax])) ∪
  ((x ∩ [smin,-1]) ∩ (y ∩ [smin,-1]))

Reported-by: Andrea Righi <arighi@nvidia.com>
Reported-by: Emil Tsalapatis <emil@etsalapatis.com>
Closes: https://lore.kernel.org/bpf/aakqucg4vcujVwif@gpd4/T/
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260306-bpf-32-bit-range-overflow-v3-1-f7f67e060a6b@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-06 18:16:06 -08:00
Andrey Grodzovsky
a28441dd29 selftests/bpf: Add tests for kprobe.session optimization
Extend existing kprobe_multi_test subtests to validate the
kprobe.session exact function name optimization:

In kprobe_multi_session.c, add test_kprobe_syms which attaches a
kprobe.session program to an exact function name (bpf_fentry_test1)
exercising the fast syms[] path that bypasses kallsyms parsing.  It
calls session_check() so bpf_fentry_test1 is hit by both the wildcard
and exact probes, and test_session_skel_api validates
kprobe_session_result[0] == 4 (entry + return from each probe).

In test_attach_api_fails, add fail_7 and fail_8 verifying error code
consistency between the wildcard pattern path (slow, parses kallsyms)
and the exact function name path (fast, uses syms[] array).  Both
paths must return -ENOENT for non-existent functions.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260302200837.317907-4-andrey.grodzovsky@crowdstrike.com
2026-03-05 15:14:53 -08:00
Sun Jian
7f20d371fd selftests/bpf: bpf_cookie: Make perf_event subtest trigger reliably
The perf_event subtest relies on SW_CPU_CLOCK sampling to trigger the BPF
program, but the current CPU burn loop can be too short on slower systems
and may fail to generate any overflow sample. This leaves pe_res unchanged
and makes the test flaky.

Make burn_cpu() take a loop count and use a longer burn only for the
perf_event subtest. Also scope perf_event_open() to the current task to
avoid wasting samples on unrelated activity.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260228074555.122950-3-sun.jian.kdev@gmail.com
2026-03-05 15:07:40 -08:00
Sun Jian
74d3305e62 selftests/bpf: bpf_cookie: Skip kprobe_multi tests without bpf_testmod
The kprobe_multi subtests rely on bpf_testmod fentry ksyms.

When bpf_testmod isn't available, libbpf fails to resolve
bpf_testmod_fentry_test* and skeleton load fails with -ESRCH, causing
false failures.

Skip these subtests when env.has_testmod is false.

Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260228074555.122950-2-sun.jian.kdev@gmail.com
2026-03-05 15:07:40 -08:00
Josef Bacik
fefeeec612 selftests/bpf: Add test for btf__add_btf() with split BTF sources
Add a test that verifies btf__add_btf() correctly handles merging
multiple split BTF objects that share the same base BTF. The test
creates two sibling split BTFs on a common base, merges them into
a combined split BTF, and validates that base type references are
preserved while split type references are properly remapped.

Assisted-by: Claude:claude-opus-4-6

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/64a8c947bff1ae89efa9ba8c099466477762490f.1772657690.git.josef@toxicpanda.com
2026-03-05 15:03:04 -08:00
Linus Torvalds
abacaf5599 Including fixes from CAN, netfilter and wireless.
Current release - new code bugs:
 
  - sched: cake: fixup cake_mq rate adjustment for diffserv config
 
  - wifi: fix missing ieee80211_eml_params member initialization
 
 Previous releases - regressions:
 
  - tcp: give up on stronger sk_rcvbuf checks (for now)
 
 Previous releases - always broken:
 
  - net: fix rcu_tasks stall in threaded busypoll
 
  - sched: fq: clear q->band_pkt_count[] in fq_reset()
 
  - sched: only allow act_ct to bind to clsact/ingress qdiscs and
    shared blocks
 
  - bridge: check relevant per-VLAN options in VLAN range grouping
 
  - xsk: fix fragment node deletion to prevent buffer leak
 
 Misc:
 
  - spring cleanup of inactive maintainers
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmmptYEACgkQMUZtbf5S
 Irsraw/+L+L512Sbh1UlmbZjhT+AQkERHNkkfUMfXAeVb4uwHOqaydVdffvqRlTT
 zOK8Oqzqf5ojRezDZ02skXnJTh39MF9IFiugF9JHApxwT2ALv0S7PXPFUJeRQeAY
 +OiLT5+iy8wMfM6eryL6OtpM9PC8zwzH32oCYd5m4Ixf90Woj5G7x8Vooz7wUg1n
 0cAliam8QLIRBrKXqctf7J8n23AK+WcrLcAt58J+qWCGqiiXdJXMvWXv1PjQ7vs/
 KZysy0QaGwh3rw+5SquXmXwjhNIvvs58v3NV/4QbBdIKfJ5uYpTpyVgXJBQ6B4Jv
 8SATHNwGbuUHuZl8OHn9ysaPCE3ZuD5pMnHbLnbKR6fyic95GxMIx/BNAOVvvwOH
 l+GWEqch8hy6r+BVAJsoSEJzIf9aqUAlEhy0wEhVOP15yn5RWfMRQKpAaD6JKQYm
 0Q6i+PsdS8xaANcUzi1Ec6aqyaX+iIBY6srE/twU3PW23Uv2ejqAG89x4s7t9LPu
 GdMQ+iAEsR8Auph8Y5mshs4e9MrdlD3jzPCiFhkrqncWl/UcPpBgmHlD80vkTa1/
 miMyYG5wq3g9pAFT43aAuoE85K6ZdIW0xGp3wGYMiW8Zy6Ea5EdnM2Wg8kbi/om0
 W0pjfcI/2FInsZqK0g/PDeccWFKxl8C1SnfNDvy9rJHBwMkZHm4=
 =XGBM
 -----END PGP SIGNATURE-----

Merge tag 'net-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from CAN, netfilter and wireless.

  Current release - new code bugs:

   - sched: cake: fixup cake_mq rate adjustment for diffserv config

   - wifi: fix missing ieee80211_eml_params member initialization

  Previous releases - regressions:

   - tcp: give up on stronger sk_rcvbuf checks (for now)

  Previous releases - always broken:

   - net: fix rcu_tasks stall in threaded busypoll

   - sched:
      - fq: clear q->band_pkt_count[] in fq_reset()
      - only allow act_ct to bind to clsact/ingress qdiscs and shared
        blocks

   - bridge: check relevant per-VLAN options in VLAN range grouping

   - xsk: fix fragment node deletion to prevent buffer leak

  Misc:

   - spring cleanup of inactive maintainers"

* tag 'net-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
  xdp: produce a warning when calculated tailroom is negative
  net: enetc: use truesize as XDP RxQ info frag_size
  libeth, idpf: use truesize as XDP RxQ info frag_size
  i40e: use xdp.frame_sz as XDP RxQ info frag_size
  i40e: fix registering XDP RxQ info
  ice: change XDP RxQ frag_size from DMA write length to xdp.frame_sz
  ice: fix rxq info registering in mbuf packets
  xsk: introduce helper to determine rxq->frag_size
  xdp: use modulo operation to calculate XDP frag tailroom
  selftests/tc-testing: Add tests exercising act_ife metalist replace behaviour
  net/sched: act_ife: Fix metalist update behavior
  selftests: net: add test for IPv4 route with loopback IPv6 nexthop
  net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop
  net: vxlan: fix nd_tbl NULL dereference when IPv6 is disabled
  net: bridge: fix nd_tbl NULL dereference when IPv6 is disabled
  MAINTAINERS: remove Thomas Falcon from IBM ibmvnic
  MAINTAINERS: remove Claudiu Manoil and Alexandre Belloni from Ocelot switch
  MAINTAINERS: replace Taras Chornyi with Elad Nachman for Marvell Prestera
  MAINTAINERS: remove Jonathan Lemon from OpenCompute PTP
  MAINTAINERS: replace Clark Wang with Frank Li for Freescale FEC
  ...
2026-03-05 11:00:46 -08:00
Feng Yang
e8ae16d65a selftests/bpf: Add selftests for the invocation of bpf_lwt_xmit_push_encap
Calling bpf_lwt_xmit_push_encap will not cause a crash when dst is missing.

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260304094429.168521-3-yangfeng59949@163.com
2026-03-04 16:44:29 -08:00
Viktor Malik
05c9b2eda8 selftests/bpf: Split module_attach into subtests
The test verifies attachment to various hooks in a kernel module,
however, everything is flattened into a single test. This makes it
impossible to run or skip test cases selectively.

Isolate each BPF program into a separate subtest. This is done by
disabling auto-loading of programs and loading and testing each program
separately.

At the same time, modernize the test to use ASSERT* instead of CHECK and
replace `return` by `goto cleanup` where necessary.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20260225120904.1529112-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 09:23:57 -08:00
Emil Tsalapatis
422f1efabd selftests: bpf: Add tests for void global subprogs
Add additional testing for void global functions. The tests
ensure that calls to void global functions properly keep
R0 invalid. Also make sure that exception callbacks still
require a return value.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260228184759.108145-6-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:47:23 -08:00
Emil Tsalapatis
8446ded1e1 bpf: Allow void global functions in the verifier
Global subprogs are currently not allowed to return void. Adjust
verifier logic to allow global functions with a void return type.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260228184759.108145-5-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:47:23 -08:00
Alexis Lothoré (eBPF Foundation)
7242b0951d selftests/bpf: drop test_bpftool.sh
The test_bpftool.sh script runs a python unittest script checking
bpftool json output on different commands. As part of the ongoing effort
to get rid of any standalone test, this script should either be
converted to test_progs or removed.

As validating bpftool json output does not bring much value to the test
base (and because it would need test_progs to bring in a json parser),
remove the standalone test script.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20260227-bpftool_feature-v1-1-a25860fd52fb@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:45:53 -08:00
Kumar Kartikeya Dwivedi
f6312e7175 selftests/bpf: Add tests for ctx fixed offset support
Add tests to ensure PTR_TO_CTX supports fixed offsets for program types
that don't rewrite accesses to it. Ensure that variable offsets and
negative offsets are still rejected. An extra test also checks writing
into ctx with modified offset for syscall progs. Other program types do
not support writes (notably, writable tracepoints offer a pointer for a
writable buffer through ctx, but don't allow writing to the ctx itself).

Before the fix made in the previous commit, these tests do not succeed,
except the ones testing for failures regardless of the change.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260227005725.1247305-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:45:16 -08:00
Puranjay Mohan
b1d6bd5462 bpf, arm64: Use ORR-based MOV for general-purpose registers
The A64_MOV macro unconditionally uses ADD Rd, Rn, #0 to implement
register moves. While functionally correct, this is not the canonical
encoding when both operands are general-purpose registers.

On AArch64, MOV has two aliases depending on the operand registers:
  - MOV <Xd|SP>, <Xn|SP>  →  ADD <Xd|SP>, <Xn|SP>, #0
  - MOV <Xd>, <Xn>        →  ORR <Xd>, XZR, <Xn>

The ADD form is required when the stack pointer is involved (as ORR
does not accept SP), while the ORR form is the preferred encoding for
general-purpose registers.

The ORR encoding is also measurably faster on modern microarchitectures.
A microbenchmark [1] comparing dependent chains of MOV (ORR) vs ADD #0
on an ARM Neoverse-V2 (72-core, 3.4 GHz) shows:

  === mov (ORR Xd, XZR, Xn) ===
  run1 cycles/op=0.749859456
  run2 cycles/op=0.749991250
  run3 cycles/op=0.749601847
  avg cycles/op=0.749817518

  === add0 (ADD Xd, Xn, #0) ===
  run1 cycles/op=1.004777689
  run2 cycles/op=1.004558266
  run3 cycles/op=1.004806559
  avg cycles/op=1.004714171

The ORR form completes in ~0.75 cycles/op vs ~1.00 cycles/op for ADD #0,
a ~25% improvement. This is likely because the CPU's register renaming
hardware can eliminate ORR-based moves, while ADD #0 must go through the
ALU pipeline.

Update A64_MOV to select the appropriate encoding at JIT time:
use ADD when either register is A64_SP, and ORR (via
aarch64_insn_gen_move_reg()) otherwise.

Update verifier_private_stack selftests to expect "mov x7, x0" instead
of "add x7, x0, #0x0" in the JITed instruction checks, matching the
new ORR-based encoding.

[1] https://github.com/puranjaymohan/scripts/blob/main/arm64/bench/run_mov_vs_add0.sh

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20260225134339.2723288-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:43:10 -08:00
Jiri Olsa
0c4fc6bd61 selftests/bpf: Add usdt trigger bench
Adding usdt trigger bench for usdt:
 trig-usdt-nop  - usdt on top of nop1 instruction
 trig-usdt-nop5 - usdt on top of nop1/nop5 combo

Adding it to benchs/run_bench_uprobes.sh script.

Example run on x86_64 kernel with uprobe syscall:

  # ./benchs/run_bench_uprobes.sh
  usermode-count :  152.507 ± 0.098M/s
  syscall-count  :   14.309 ± 0.093M/s
  uprobe-nop     :    3.190 ± 0.012M/s
  uprobe-push    :    3.057 ± 0.004M/s
  uprobe-ret     :    1.095 ± 0.009M/s
  uprobe-nop5    :    7.305 ± 0.034M/s
  uretprobe-nop  :    2.175 ± 0.005M/s
  uretprobe-push :    2.109 ± 0.003M/s
  uretprobe-ret  :    0.945 ± 0.002M/s
  uretprobe-nop5 :    3.530 ± 0.006M/s
  usdt-nop       :    3.235 ± 0.008M/s   <-- added
  usdt-nop5      :    7.511 ± 0.045M/s   <-- added

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260224103915.1369690-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:39:22 -08:00
Jiri Olsa
304841967c selftests/bpf: Add test for checking correct nop of optimized usdt
Adding test that attaches bpf program on usdt probe in 2 scenarios;

- attach program on top of usdt_1, which is single nop instruction,
  so the probe stays on nop instruction and is not optimized.

- attach program on top of usdt_2 which is probe defined on top
  of nop,nop5 combo, so the probe is placed on top of nop5 and
  is optimized.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260224103915.1369690-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:39:22 -08:00
Jiri Olsa
0c178e9deb selftests/bpf: Emit nop,nop5 instructions combo for x86_64 arch
Syncing latest usdt.h change [1].

Now that we have nop5 optimization support in kernel, let's emit
nop,nop5 for usdt probe. We leave it up to the library to use
desirable nop instruction.

[1] c9865d1589
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260224103915.1369690-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:39:22 -08:00
Menglong Dong
c1eee8d1e8 selftests/bpf: factor out get_func_* tests for fsession
The fsession is already supported by x86_64, arm64, riscv and s390, so we
don't need to disable it in the compile time according to the
architecture. Factor out the testings for it. Therefore, the testing can
be disabled for the architecture that doesn't support it manually.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20260224092208.1395085-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:36:26 -08:00
Ilya Leoshkevich
0f87614c4d bpf/s390: Implement get_preempt_count()
exe_ctx test fails on s390, because get_preempt_count() is not
implemented and its fallback path always returns 0. Implement it
using the new bpf_get_lowcore() kfunc.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20260217160813.100855-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:35:07 -08:00
Ilya Leoshkevich
6fe54677bc s390: Introduce bpf_get_lowcore() kfunc
Implementing BPF version of preempt_count() requires accessing lowcore
from BPF. Since lowcore can be relocated, open-coding
(struct lowcore *)0 does not work, so add a kfunc.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20260217160813.100855-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03 08:35:07 -08:00
Jiayuan Chen
181cafbd8a selftests/bpf: add test for xdp_bonding xmit_hash_policy compat
Add a selftest to verify that changing xmit_hash_policy to vlan+srcmac
is rejected when a native XDP program is loaded on a bond in 802.3ad
mode.  Without the fix in bond_option_xmit_hash_policy_set(), the change
succeeds silently, creating an inconsistent state that triggers a kernel
WARNING in dev_xdp_uninstall() when the bond is torn down.

The test attaches native XDP to a bond0 (802.3ad, layer2+3), then
attempts to switch xmit_hash_policy to vlan+srcmac and asserts the
operation fails.  It also verifies the change succeeds after XDP is
detached, confirming the rejection is specific to the XDP-loaded state.

Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://patch.msgid.link/20260226080306.98766-3-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-03 10:47:38 +01:00
Alexei Starovoitov
309d8808ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf before 7.0-rc2
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-01 09:04:00 -08:00
Paul Chaignon
024cea2d64 selftests/bpf: Avoid simplification of crafted bounds test
The reg_bounds_crafted tests validate the verifier's range analysis
logic. They focus on the actual ranges and thus ignore the tnum. As a
consequence, they carry the assumption that the tested cases can be
reproduced in userspace without using the tnum information.

Unfortunately, the previous change the refinement logic breaks that
assumption for one test case:

  (u64)2147483648 (u32)<op> [4294967294; 0x100000000]

The tested bytecode is shown below. Without our previous improvement, on
the false branch of the condition, R7 is only known to have u64 range
[0xfffffffe; 0x100000000]. With our improvement, and using the tnum
information, we can deduce that R7 equals 0x100000000.

  19: (bc) w0 = w6                ; R6=0x80000000
  20: (bc) w0 = w7                ; R7=scalar(smin=umin=0xfffffffe,smax=umax=0x100000000,smin32=-2,smax32=0,var_off=(0x0; 0x1ffffffff))
  21: (be) if w6 <= w7 goto pc+3  ; R6=0x80000000 R7=0x100000000

R7's tnum is (0; 0x1ffffffff). On the false branch, regs_refine_cond_op
refines R7's u32 range to [0; 0x7fffffff]. Then, __reg32_deduce_bounds
refines the s32 range to 0 using u32 and finally also sets u32=0.
From this, __reg_bound_offset improves the tnum to (0; 0x100000000).
Finally, our previous patch uses this new tnum to deduce that it only
intersect with u64=[0xfffffffe; 0x100000000] in a single value:
0x100000000.

Because the verifier uses the tnum to reach this constant value, the
selftest is unable to reproduce it by only simulating ranges. The
solution implemented in this patch is to change the test case such that
there is more than one overlap value between u64 and the tnum. The max.
u64 value is thus changed from 0x100000000 to 0x300000000.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/50641c6a7ef39520595dcafa605692427c1006ec.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-27 16:11:50 -08:00
Paul Chaignon
e6ad477d1b selftests/bpf: Test refinement of single-value tnum
This patch introduces selftests to cover the new bounds refinement
logic introduced in the previous patch. Without the previous patch,
the first two tests fail because of the invariant violation they
trigger. The last test fails because the R10 access is not detected as
dead code. In addition, all three tests fail because of R0 having a
non-constant value in the verifier logs.

In addition, the last two cases are covering the negative cases: when we
shouldn't refine the bounds because the u64 and tnum overlap in at least
two values.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/90d880c8cf587b9f7dc715d8961cd1b8111d01a8.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-27 16:11:50 -08:00
Kumar Kartikeya Dwivedi
2939d7b3b0 selftests/bpf: Add tests for special fields races
Add a couple of tests to ensure that the refcount drops to zero when we
exercise the race where creation of a special field succeeds the logical
bpf_obj_free_fields done when deleting an element. Prior to previous
changes, the fields would be freed eagerly and repopulate and end up
leaking, causing the reference to not drop down correctly. Running this
test on a kernel without fixes will cause a hang in delete_module, since
the module reference stays active due to the leaked kptr not dropping
it. After the fixes tests succeed as expected.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260227224806.646888-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-27 15:39:00 -08:00
T.J. Mercier
6881af27f9 selftests/bpf: Fix OOB read in dmabuf_collector
Dmabuf name allocations can be less than DMA_BUF_NAME_LEN characters,
but bpf_probe_read_kernel always tries to read exactly that many bytes.
If a name is less than DMA_BUF_NAME_LEN characters,
bpf_probe_read_kernel will read past the end. bpf_probe_read_kernel_str
stops at the first NUL terminator so use it instead, like
iter_dmabuf_for_each already does.

Fixes: ae5d2c59ec ("selftests/bpf: Add test for dmabuf_iter")
Reported-by: Jerome Lee <jaewookl@quicinc.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20260225003349.113746-1-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-26 11:28:04 -08:00
Ihor Solodrai
60e3cbef35 selftests/bpf: Fix a memory leak in xdp_flowtable test
test_progs run with ASAN reported [1]:

  ==126==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
      #0 0x7f1ff3cfa340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
      #1 0x5610c15bb520 in bpf_program_attach_fd /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13164
      #2 0x5610c15bb740 in bpf_program__attach_xdp /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13204
      #3 0x5610c14f91d3 in test_xdp_flowtable /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c:138
      #4 0x5610c1533566 in run_one_test /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:1406
      #5 0x5610c1537fb0 in main /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:2097
      #6 0x7f1ff25df1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb)
      #7 0x7f1ff25df28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb)
      #8 0x5610c0bd3180 in _start (/tmp/work/vmtest/vmtest/selftests/bpf/test_progs+0x593180) (BuildId: cdf9f103f42307dc0a2cd6cfc8afcbc1366cf8bd)

Fix by properly destroying bpf_link on exit in xdp_flowtable test.

[1] https://github.com/kernel-patches/vmtest/actions/runs/22361085418/job/64716490680

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20260225003351.465104-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-26 11:27:08 -08:00
Hari Bathini
8ebfe65e22 selftests/bpf: Test accounting of tail calls when prog is NULL
Test whether tail call count is incorrectly accounted for, when the
tail call fails due to a missing BPF program.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20260216090802.1805655-1-hbathini@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 17:09:27 -08:00
Ihor Solodrai
c89b50cc6b selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit
The test_sys_enter_exit test was setting target_pid before attaching
the BPF programs, which causes syscalls made during the attach phase
to be counted. This is flaky because, apparently, there is no
guarantee that both on_enter and on_exit will trigger during the
attachment.

Move the target_pid assignment to after task_local_storage__attach()
so that only explicit sys_gettid() calls are counted.

Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
Closes: https://github.com/kernel-patches/vmtest/issues/448
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260224211202.214325-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 14:53:28 -08:00
Ihor Solodrai
4c9d07865c selftests/bpf: Don't override SIGSEGV handler with ASAN
test_progs has custom SIGSEGV handler, which interferes with the
address sanitizer [1]. Add an #ifndef to avoid this.

Additionally, declare an __asan_on_error() to dump the test logs in
the same way it happens in the custom SIGSEGV handler.

[1] https://lore.kernel.org/bpf/73d832948b01dbc0ebc60d85574bdf8537f3a810.camel@gmail.com/

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-3-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
a2714e7303 selftests/bpf: Check BPFTOOL env var in detect_bpftool_path()
The bpftool_maps_access and bpftool_metadata tests may fail on BPF CI
with "command not found", depending on a workflow.
This happens because detect_bpftool_path() only checks two hardcoded
relative paths:
  - ./tools/sbin/bpftool
  - ../tools/sbin/bpftool

Add support for a BPFTOOL environment variable that allows specifying
the exact path to the bpftool binary.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
ad90ecedad selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN
- kmem_cache_iter: remove unnecessary debug output
- lwt_seg6local: change the type of foobar to char[]
  - the sizeof(foobar) returned the pointer size and not a string
    length as intended
- verifier_log: increase prog_name buffer size in verif_log_subtest()
  - compiler has a conservative estimate of fixed_log_sz value, making
    ASAN complain on snprint() call

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
3e711c8e47 selftests/bpf: Fix array bounds warning in jit_disasm_helpers
Compiler cannot infer upper bound for labels.cnt and warns about
potential buffer overflow in snprintf. Add an explicit bounds
check (... && i < MAX_LOCAL_LABELS) in the loop condition to fix the
warning.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-18-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
2bb270a0ac selftests/bpf: Free bpf_object in test_sysctl
ASAN reported a resource leak due to the bpf_object not being tracked
in test_sysctl. Add obj field to struct sysctl_test to properly clean
it up.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-17-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
71dca2950f selftests/bpf: Fix resource leaks caused by missing cleanups
ASAN reported a number of resource leaks:
  - Add missing  *__destroy(skel) calls
  - Replace bpf_link__detach() with bpf_link__destroy() where appropriate
  - cgrp_local_storage: Add bpf_link__destroy() when bpf_iter_create fails
  - dynptr: Add missing bpf_object__close()

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-16-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
68b8bea1ee selftests/bpf: Fix double thread join in uprobe_multi_test
ASAN reported a "joining already joined thread" error. The
release_child() may be called multiple times for the same struct
child.

Fix by resetting child->thread to 0 after pthread_join.

Also memset(0) static child variable in test_attach_api().

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-15-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
f7ac552be7 selftests/bpf: Fix use-after-free in xdp_metadata test
ASAN reported a use-after-free in close_xsk().

The xsk->socket internally references xsk->umem via socket->ctx->umem,
so the socket must be deleted before the umem. Fix the order of
operations in close_xsk().

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-14-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
ff9511410d veristat: Fix a memory leak for preset ENUMERATOR
ASAN detected a memory leak in veristat. The cleanup code handling
ENUMERATOR value missed freeing strdup-ed svalue. Fix it.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-13-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
3eb4a2e399 selftests/bpf: Fix cleanup in check_fd_array_cnt__fd_array_too_big()
The Close() macro uses the passed in expression three times, which
leads to repeated execution in case it has side effects. That is,
Close(i--) would decrement i three times.

ASAN caught a stack-buffer-undeflow error at a point where this was
overlooked. Fix it.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-12-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
9d0272c91f selftests/bpf: Fix memory leaks in tests
Fix trivial memory leaks detected by userspace ASAN:
  - htab_update: free value buffer in test_reenter_update cleanup
  - test_xsk: inline pkt_stream_replace() in testapp_stats_rx_full()
    and testapp_stats_fill_empty()
  - testing_helpers: free buffer allocated by getline() in
    parse_test_list_file

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-11-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
a1a771bd64 selftests/bpf: Refactor bpf_get_ksyms() trace helper
ASAN reported a memory leak in bpf_get_ksyms(): it allocates a struct
ksyms internally and never frees it.

Move struct ksyms to trace_helpers.h and return it from the
bpf_get_ksyms(), giving ownership to the caller. Add filtered_syms and
filtered_cnt fields to the ksyms to hold the filtered array of
symbols, previously returned by bpf_get_ksyms().

Fixup the call sites: kprobe_multi_test and bench_trigger.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260223190736.649171-10-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
45897ced3c selftests/bpf: Add DENYLIST.asan
Add a denylist file for tests that should be skipped when built with
userspace ASAN:

    $ make ... SAN_CFLAGS="-fsanitize=address -fno-omit-frame-pointer"

Skip the following tests:

- *arena*: userspace ASAN does not understand BPF arena maps and gets
  confused particularly when map_extra is non-zero
  - non-zero map_extra leads to mmap with MAP_FIXED, and ASAN treats
    this as an unknown memory region
- task_local_data: ASAN complains about "incorrect" aligned_alloc()
  usage, but it's intentional in the test
- uprobe_multi_test: very slow with ASAN enabled

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-9-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
4021848a90 selftests/bpf: Pass through build flags to bpftool and resolve_btfids
EXTRA_* and SAN_* build flags were not correctly propagated to bpftool
and resolve_btids when building selftests/bpf. This led to various
build errors on attempt to build with SAN_CFLAGS="-fsanitize=address",
for example.

Fix the makefiles to address this:
  - Pass SAN_CFLAGS/SAN_LDFLAGS to bpftool and resolve_btfids build
  - Propagate EXTRA_LDFLAGS to resolve_btfids link command
  - Use pkg-config to detect zlib and zstd for resolve_btfids, similar
    libelf handling

Also check for ASAN flag in selftests/bpf/Makefile for convenience.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-7-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
9d8685239e selftests/bpf: Use memcpy() for bounded non-NULL-terminated copies
Replace strncpy() with memcpy() in cases where the source is
non-NULL-terminated and the copy length is known.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-6-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:49 -08:00
Ihor Solodrai
3ed0bc2d49 selftests/bpf: Use strscpy in bpftool_helpers.c
Replace strncpy() calls in bpftool_helpers.c with strscpy().

Pass the destination buffer size to detect_bpftool_path() instead of
hardcoding BPFTOOL_PATH_MAX_LEN.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-5-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:48 -08:00
Ihor Solodrai
43277d8f57 selftests/bpf: Replace strncpy() with strscpy()
strncpy() does not guarantee NULL-termination and is considered
deprecated [1]. Replace strncpy() calls with strscpy().

[1] https://docs.kernel.org/process/deprecated.html#strncpy-on-nul-terminated-strings

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-4-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-24 08:19:33 -08:00
Ihor Solodrai
6a087ac23f selftests/bpf: Replace strcpy() calls with strscpy()
strcpy() does not perform bounds checking and is considered deprecated
[1]. Replace strcpy() calls with strscpy() defined in bpf_util.h.

[1] https://docs.kernel.org/process/deprecated.html#strcpy

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-3-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-23 18:40:07 -08:00
Ihor Solodrai
63c49efc98 selftests/bpf: Add simple strscpy() implementation
Replace bpf_strlcpy() in bpf_util.h with a sized_strscpy(), which is a
simplified sized_strscpy() from the kernel (lib/string.c [1]). It:
  * takes a count (destination size) parameter
  * guarantees NULL-termination
  * returns the number of characters copied or -E2BIG

Re-define strscpy macro similar to in-kernel implementation [2]: allow
the count parameter to be optional.

Add #ifdef-s to tools/include/linux/args.h, as they may be defined in
other system headers (for example, __CONCAT in sys/cdefs.h).

Fixup the single existing bpf_strlcpy() call in cgroup_helpers.c

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?h=v6.19#n113
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/string.h?h=v6.19#n91

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-23 18:40:07 -08:00
Amery Hung
055d8dd553 selftests/bpf: Update task local data license
Change the license of task local data mini library to LGPL-2.1 or
BSD-2-Clause to allow it being in a wider range of projects.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260219225849.2426421-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-23 18:06:47 -08:00
Abhishek Dubey
99726ece0c selftests/bpf: Enable test for instruction array on arm64
As arm64 JIT now supports instruction array, make sure
all relevant tests run on this architecture.

Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Acked-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260223203511.118475-1-adubey@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-23 18:05:07 -08:00
Kaitao Cheng
ed8fa4b889 selftests/bpf: Add test case for rbtree nodes that contain both bpf_refcount and kptr fields.
Allow bpf_kptr_xchg to directly operate on pointers marked with
NON_OWN_REF | MEM_RCU.

In the example demonstrated in this patch, as long as "struct
bpf_refcount ref" exists, the __kptr pointer is guaranteed to
carry the MEM_RCU flag. The ref member itself does not need to
be explicitly used.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
Link: https://lore.kernel.org/r/20260214124042.62229-6-pilgrimtao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-23 17:37:06 -08:00
Kaitao Cheng
580fa3430b selftests/bpf: Add supplementary tests for bpf_kptr_xchg
1. Allow using bpf_kptr_xchg while holding a lock.
2. When the rb_node contains a __kptr pointer, we do not need to
   perform a remove-read-add operation.

This patch implements the following workflow:
1. Construct a rbtree with 16 elements.
2. Traverse the rbtree, locate the kptr pointer in the target node,
   and read the content pointed to by the pointer.
3. Remove all nodes from the rbtree.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Link: https://lore.kernel.org/r/20260214124042.62229-4-pilgrimtao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-23 17:37:06 -08:00
Gregory Bell
18a1d365e8 selftests/bpf: align build_id test mapping to 64K page size
Some architectures require mappings to be aligned to the system page size.
The build_id selftest currently uses a smaller alignment, which can result
in madvise operations executing on a different page than intended.

Increase the mapping alignment to 64K so the buffer is page-aligned on
all supported architectures.

Signed-off-by: Gregory Bell <grbell@redhat.com>
Link: https://lore.kernel.org/r/93543253b32d1cb178ab6e31e4291e387ba1c372.1771338492.git.grbell@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-19 11:29:41 -08:00
Gregory Bell
d820fa3114 selftests/bpf: fix flaky build_id test
The build_id selftest occasionally fails because MADV_PAGEOUT
does not guarantee the immediate eviction of the page. The test
assumes eviction happens and proceeds without verifying
that the page was actually reclaimed, leading to false test
failures.

Fix the test by retrying the page-out sequence until eviction
is successful, instead of relying on a single MADV_PAGEOUT attempt.

Signed-off-by: Gregory Bell <grbell@redhat.com>
Link: https://lore.kernel.org/r/038bd27c69dd3a16958894fcb19e4fb6fbfe317e.1771338492.git.grbell@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-19 11:29:41 -08:00
Alexei Starovoitov
b0a67f310b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf before 7.0-rc1
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-19 11:08:53 -08:00
Matthieu Baerts (NGI0)
1e5c009126 selftests/bpf: Remove hexdump dependency
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with xxd,
and a switch to hexdump has been done in commit b640d556a2
("selftests/bpf: Remove xxd util dependency").

hexdump is a more common utility program, yet it might not be installed
by default. When it is not installed, BPF selftests build without
errors, but tests_progs is unusable: it exits with the 255 code and
without any error messages. When manually reproducing the issue, it is
not too hard to find out that the generated verification_cert.h file is
incorrect, but that's time consuming. When digging the BPF selftests
build logs, this line can be seen amongst thousands others, but ignored:

  /bin/sh: 2: hexdump: not found

Here, od is used instead of hexdump. od is coming from the coreutils
package, and this new od command produces the same output when using od
from GNU coreutils, uutils, and even busybox. This is more portable, and
it produces a similar results to what was done before with hexdump:
there is an extra comma at the end instead of trailing whitespaces,
but the C code is not impacted.

Fixes: b640d556a2 ("selftests/bpf: Remove xxd util dependency")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20260218-bpf-sft-hexdump-od-v2-1-2f9b3ee5ab86@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-18 15:11:20 -08:00
Ihor Solodrai
b3dfa128f7 selftests/bpf: Use vmlinux.h in test_xdp_meta
- Replace linux/* includes with vmlinux.h
- Include errno.h
- Include bpf_tracing_net.h for TC_ACT_* and ETH_*
- Use BPF_STDERR instead of BPF_STREAM_STDERR

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260218215651.2057673-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-18 14:58:47 -08:00
Saket Kumar Bhaskar
4c51f90d45 selftests/bpf: Add powerpc support for get_preempt_count() in selftest
get_preempt_count() is enabled to return preempt_count for powerpc,
so that bpf_in_interrupt()/bpf_in_nmi()/bpf_in_serving_softirq()/
bpf_in_task()/bpf_in_hardirq()/get_preempt_count() works for
powerpc as well.

Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20260212092558.370623-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:42:38 -08:00
Eduard Zingerman
022ac07508 bpf: use reg->var_off instead of reg->off for pointers
This commit consolidates static and varying pointer offset tracking
logic. All offsets are now represented solely using `.var_off` and
min/max fields. The reasons are twofold:
- This simplifies pointer tracking code, as each relevant function
  needs to check the `.var_off` field anyway.
- It makes it easier to widen pointer registers for the purpose of loop
  convergence checks, by forgoing the `regsafe()` logic demanding
  `.off` fields to be identical.

The changes are spread across many functions and are hard to group
into smaller patches. Some of the logical changes include:
- Checks in __check_ptr_off_reg() are reordered so that the
  tnum_is_const() check is done before operating on reg->var_off.value.
- check_packet_access() now uses check_mem_region_access() to handle
  possible 'off' overflow cases.
- In check_helper_mem_access() utility functions like
  check_packet_access() are now called with 'off=0', as these utility
  functions now account for the complete register offset range.
- In check_reg_type() a call to __check_ptr_off_reg() is added before
  a call to btf_struct_ids_match(). This prevents
  btf_struct_ids_match() from potentially working on non-constant
  reg->var_off.value.
- regsafe() is relaxed to avoid comparing '.off' field for pointers.

As a precaution, the changes are verified in [1] by adding a pass
checking that no pointer has non-zero '.off' field on each
do_check_insn() iteration.

[1] https://github.com/eddyz87/bpf/tree/ptrs-off-migration

Notable selftests changes:
- `.var_off` value changed because it now combines static and varying
  offsets. Affected tests:
  - linked_list/incorrect_node_var_off
  - linked_list/incorrect_head_var_off2
  - verifier_align/packet_variable_offset

- Overflowing `smax_value` bound leads to a pointer with big negative
  or positive offset to be rejected immediately (previously overflowing
  `rX += const` instruction updated `.off` field avoiding the overflow).
  Affected tests:
  - verifier_align/dubious_pointer_arithmetic
  - verifier_bounds/var_off_insn_off_test1

- Invalid access to packet now reports full offset inside a packet.
  Affected tests:
  - verifier_direct_packet_access/test23_x_pkt_ptr_4

- A change in check_mem_region_access() behavior:
  when register `.smin_value` is negative, it reports
  "rX min value is negative..." before calling into __check_mem_access()
  which reports "invalid access to ...".
  In the tests below, the `.off` field was negative, while `.smin_value`
  remained positive. This is no longer the case after the changes in
  this commit. Affected tests:
  - verifier_gotox/jump_table_invalid_mem_acceess_neg
  - verifier_helper_packet_access/test15_cls_helper_fail_sub
  - verifier_helper_value_access/imm_out_of_bound_2
  - verifier_helper_value_access/reg_out_of_bound_2
  - verifier_meta_access/meta_access_test2
  - verifier_value_ptr_arith/known_scalar_from_different_maps
  - lower_oob_arith_test_1
  - value_ptr_known_scalar_3
  - access_value_ptr_known_scalar

- Usage of check_mem_region_access() instead of __check_mem_access()
  in check_packet_access() changes the reported message from
  "rX offset is outside ..." to "rX min/max value is outside ...".
  Affected tests:
  - verifier_xdp_direct_packet_access/*

- In check_func_arg_reg_off() the check for zero offset now operates
  on `.var_off` field instead of `.off` field. For tests where the
  pattern looks like `kfunc(reg_with_var_off, ...)`, this changes the
  reported error:
  - previously the error "variable ... access ... disallowed"
    was reported by __check_ptr_off_reg();
  - now "R1 must have zero offset ..." is reported by
    check_func_arg_reg_off() itself.
  Affected tests:
  - verifier/calls.c
    "calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset"

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260212-ptrs-off-migration-v2-2-00820e4d3438@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:41:22 -08:00
Menglong Dong
0265c1fd91 selftests/bpf: enable fsession_test on riscv64
Now that the RISC-V trampoline JIT supports BPF_TRACE_FSESSION, run
the fsession selftest on riscv64 as well as x86_64.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Tested-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20260208053311.698352-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:14:27 -08:00
Kumar Kartikeya Dwivedi
2669dde7a8 selftests/bpf: Fix map_kptr grace period wait
Commit c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
broke map_kptr selftest since it removed the function we were kprobing.
Use a new kfunc that invokes call_rcu_tasks_trace and sets a program
provided pointer to an integer to 1. Technically this can be unsafe if
the memory being written to from the callback disappears, but this is
just for usage in a test where we ensure we spin until we see the value
to be set to 1, so it's ok.

Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Fixes: c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260211185747.3630539-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:14:27 -08:00
Ihor Solodrai
48f624c3dc selftests/bpf: Adjust selftest due to function rename
do_filp_open() was renamed in commit
541003b576 ("rename do_filp_open() to do_file_open()")

This broke test_profiler, because it uses a kretprobe on that
function. Fix it by renaming accordingly.

Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Closes: https://lore.kernel.org/bpf/djwjf2vfb7gro3rfag666bojod6ytcectahnb5z6hx2hawimtj@sx47ghzjg4lw/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260210235855.215679-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:14:27 -08:00
Eduard Zingerman
f632de6e19 selftests/bpf: Migrate align.c tests to test_loader framework
While working on pointer tracking changes I found it necessary to
update expected log messages in align.c series of tests.
As a preliminary step, migrate these tests to test_loader framework.

The tests in question load BPF program and check if expected log is
produced, the log is specified as:

	.matches = {
		...
		{4, "R3", "32"},
		...
	}

Where:
- '4' is an *instruction number* (contrary to the field name in
  struct bpf_reg_match).
- 'R3' is the name of the register to check.
- '32' is the value expected for this register.

Mimic the same logic using __msg macro.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260211051310.2782558-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-13 14:12:28 -08:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Amery Hung
97b859b5ed selftests/bpf: Fix outdated test on storage->smap
bpf_local_storage_free() already does not rely on local_storage->smap
since switching to kmalloc_nolock(). As local_storage->smap is removed,
fix the outdated test by dropping the local_storage->smap check. Keep
the second map in task local storage map test to test that multiple
elements can be added to the storage similar to sk storage test.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-18-ameryhung@gmail.com
2026-02-06 14:48:05 -08:00
Amery Hung
cdce7b0848 selftests/bpf: Choose another percpu variable in bpf for btf_dump test
bpf_cgrp_storage_busy has been removed. Use bpf_bprintf_nest_level
instead. This percpu variable is also in the bpf subsystem so that
if it is removed in the future, BPF-CI will catch this type of CI-
breaking change.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-17-ameryhung@gmail.com
2026-02-06 14:48:05 -08:00
Amery Hung
e02cf06b85 selftests/bpf: Remove test_task_storage_map_stress_lookup
Remove a test in test_maps that checks if the updating of the percpu
counter in task local storage map is preemption safe as the percpu
counter is now removed.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-16-ameryhung@gmail.com
2026-02-06 14:48:05 -08:00
Amery Hung
902a79b638 selftests/bpf: Update task_local_storage/task_storage_nodeadlock test
Adjust the error code we are checking against as
bpf_task_storage_delete() now returns -EDEADLK or -ETIMEDOUT when
deadlock happens.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-15-ameryhung@gmail.com
2026-02-06 14:48:05 -08:00
Amery Hung
e4772031d1 selftests/bpf: Update task_local_storage/recursion test
Update the expected result of the selftest as recursion of task local
storage syscall and helpers have been relaxed. Now that the percpu
counter is removed, task local storage helpers, bpf_task_storage_get()
and bpf_task_storage_delete() can now run on the same CPU at the same
time unless they cause deadlock.

Note that since there is no percpu counter preventing recursion in
task local storage helpers, bpf_trampoline now catches the recursion
of on_update as reported by recursion_misses.

on_enter: tp_btf/sys_enter
on_update: fentry/bpf_local_storage_update

           Old behavior                         New behavior
           ____________                         ____________
on_enter                             on_enter
  bpf_task_storage_get(&map_a)         bpf_task_storage_get(&map_a)
    bpf_task_storage_trylock succeed     bpf_local_storage_update(&map_a)
    bpf_local_storage_update(&map_a)

    on_update                            on_update
      bpf_task_storage_get(&map_a)         bpf_task_storage_get(&map_a)
        bpf_task_storage_trylock fail        on_update::misses++ (1)
        return NULL                        create and return map_a::ptr

                                           map_a::ptr += 1 (1)

                                           bpf_task_storage_delete(&map_a)
                                             return 0

      bpf_task_storage_get(&map_b)         bpf_task_storage_get(&map_b)
        bpf_task_storage_trylock fail        on_update::misses++ (2)
        return NULL                        create and return map_b::ptr

                                           map_b::ptr += 1 (1)

    create and return map_a::ptr         create and return map_a::ptr
  map_a::ptr = 200                     map_a::ptr = 200

  bpf_task_storage_get(&map_b)         bpf_task_storage_get(&map_b)
    bpf_task_storage_trylock succeed     lockless lookup succeed
    bpf_local_storage_update(&map_b)     return map_b::ptr

    on_update
      bpf_task_storage_get(&map_a)
        bpf_task_storage_trylock fail
        lockless lookup succeed
        return map_a::ptr

      map_a::ptr += 1 (201)

      bpf_task_storage_delete(&map_a)
        bpf_task_storage_trylock fail
        return -EBUSY
      nr_del_errs++ (1)

      bpf_task_storage_get(&map_b)
        bpf_task_storage_trylock fail
        return NULL

    create and return ptr

  map_b::ptr = 100

Expected result:

map_a::ptr = 201                          map_a::ptr = 200
map_b::ptr = 100                          map_b::ptr = 1
nr_del_err = 1                            nr_del_err = 0
on_update::recursion_misses = 0           on_update::recursion_misses = 2
On_enter::recursion_misses = 0            on_enter::recursion_misses = 0

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-14-ameryhung@gmail.com
2026-02-06 14:48:05 -08:00
Amery Hung
d652f425d5 selftests/bpf: Update sk_storage_omem_uncharge test
Check sk_omem_alloc when the caller of bpf_local_storage_destroy()
returns. bpf_local_storage_destroy() now returns the memory to uncharge
to the caller instead of directly uncharge. Therefore, in the
sk_storage_omem_uncharge, check sk_omem_alloc when bpf_sk_storage_free()
returns instead of bpf_local_storage_destroy().

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-13-ameryhung@gmail.com
2026-02-06 14:48:04 -08:00
Larysa Zaremba
88af9fefed selftests/xsk: fix number of Tx frags in invalid packet
The issue occurs in TOO_MANY_FRAGS test case when xdp_zc_max_segs is set to
an odd number.

TOO_MANY_FRAGS test case contains an invalid packet consisting of
(xdp_zc_max_segs) frags. Every frag, even the last one has XDP_PKT_CONTD
flag set. This packet is expected to be dropped. After that, there is a
valid linear packet, which is expected to be received back.

Once (xdp_zc_max_segs) is an odd number, the last packet cannot be
received, if packet forwarding between Rx and Tx interfaces relies on
the ethernet header, e.g. checks for ETH_P_LOOPBACK. Packet is malformed,
if all traffic is looped.

Turns out, sending function processes multiple invalid frags as if they
were in 2-frag packets. So once the invalid mbuf packet contains an odd
number of those, the valid packet after gets paired with the previous
invalid descriptor, and hence does not get an ethernet header generated, so
it is either dropped or malformed.

Make invalid packets in verbatim mode always have only a single frag. For
such packets, number of frags is otherwise meaningless, as descriptor flags
are pre-configured in verbatim mode and packet data is not generated for
invalid descriptors.

Fixes: 697604492b ("selftests/xsk: add invalid descriptor test for multi-buffer")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-3-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-06 09:36:36 -08:00
Larysa Zaremba
42e41b2a0a selftests/xsk: properly handle batch ending in the middle of a packet
Referenced commit reduced the scope of the variable pkt, so now it has to
be reinitialized via pkt_stream_get_next_rx_pkt(), which also increments
some counters. When the packet is interrupted by the batch ending, pkt
stream therefore proceeds to the next packet, while xsk ring still contains
the previous one, this results in a pkt_nb mismatch.

Decrement the affected counters when packet is interrupted.

Fixes: 8913e653e9 ("selftests/xsk: Iterate over all the sockets in the receive pkts function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-2-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-06 09:36:36 -08:00
Puranjay Mohan
47fcf4dc0a selftests/bpf: Add tests for improved linked register tracking
Add tests for linked register tracking with negative offsets, BPF_SUB,
and alu32. These test for all edge cases like overflows, etc.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204151741.2678118-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-04 13:35:29 -08:00
Puranjay Mohan
7a433e5193 bpf: Support negative offsets, BPF_SUB, and alu32 for linked register tracking
Previously, the verifier only tracked positive constant deltas between
linked registers using BPF_ADD. This limitation meant patterns like:

  r1 = r0;
  r1 += -4;
  if r1 s>= 0 goto l0_%=;   // r1 >= 0 implies r0 >= 4
  // verifier couldn't propagate bounds back to r0
  if r0 != 0 goto l0_%=;
	r0 /= 0; // Verifier thinks this is reachable
  l0_%=:

Similar limitation exists for 32-bit registers.

With this change, the verifier can now track negative deltas in reg->off
enabling bound propagation for the above pattern.

For alu32, we make sure the destination register has the upper 32 bits
as 0s before creating the link. BPF_ADD_CONST is split into
BPF_ADD_CONST64 and BPF_ADD_CONST32, the latter is used in case of alu32
and sync_linked_regs uses this to zext the result if known_reg has this
flag.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204151741.2678118-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-04 13:35:28 -08:00
Tianci Cao
56415363e0 selftests/bpf: Add tests for BPF_END bitwise tracking
Now BPF_END has bitwise tracking support. This patch adds selftests to
cover various cases of BPF_END (`bswap(16|32|64)`, `be(16|32|64)`,
`le(16|32|64)`) with bitwise propagation.

This patch is based on existing `verifier_bswap.c`, and add several
types of new tests:

1. Unconditional byte swap operations:
   - bswap16/bswap32/bswap64 with unknown bytes

2. Endian conversion operations (architecture-aware):
   - be16/be32/be64: convert to big-endian
     * on little-endian: do swap
     * on big-endian: truncation (16/32-bit) or no-op (64-bit)
   - le16/le32/le64: convert to little-endian
     * on big-endian: do swap
     * on little-endian: truncation (16/32-bit) or no-op (64-bit)

Each test simulates realistic networking scenarios where a value is
masked with unknown bits (e.g., var_off=(0x0; 0x3f00), range=[0,0x3f00]),
then byte-swapped, and the verifier must prove the result stays within
expected bounds.

Specifically, these selftests are based on dead code elimination:
If the BPF verifier can precisely track bitwise through byte swap
operations, it can prune the trap path (invalid memory access) that
should be unreachable, allowing the program to pass verification.
If bitwise tracking is incorrect, the verifier cannot prove the trap
is unreachable, causing verification failure.

The tests use preprocessor conditionals (#ifdef __BYTE_ORDER__) to
verify correct behavior on both little-endian and big-endian
architectures, and require Clang 18+ for bswap instruction support.

Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Yazhou Tang <tangyazhou518@outlook.com>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204111503.77871-3-ziye@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-04 13:23:28 -08:00
Alexei Starovoitov
6e65cf81ac selftests/bpf: Strengthen timer_start_deadlock test
Strengthen timer_start_deadlock test and check for recursion now

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260204055147.54960-5-alexei.starovoitov@gmail.com
2026-02-04 13:12:50 -08:00
Alexei Starovoitov
67ee5ad27d selftests/bpf: Add a testcase for deadlock avoidance
Add a testcase that checks that deadlock avoidance is working
as expected.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260204055147.54960-3-alexei.starovoitov@gmail.com
2026-02-04 13:12:50 -08:00
Alexei Starovoitov
b135beb077 selftests/bpf: Add a test to stress bpf_timer_start and map_delete race
Add a test to stress bpf_timer_start and map_delete race

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-10-alexei.starovoitov@gmail.com
2026-02-03 16:58:47 -08:00
Mykyta Yatsenko
3f7a841520 selftests/bpf: Removed obsolete tests
Now bpf_timer can be used in tracepoints, so these tests are no longer
relevant.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-9-alexei.starovoitov@gmail.com
2026-02-03 16:58:47 -08:00
Mykyta Yatsenko
083c5a4bab selftests/bpf: Add timer stress test in NMI context
Add stress tests for BPF timers that run in NMI context using perf_event
programs attached to PERF_COUNT_HW_CPU_CYCLES.

The tests cover three scenarios:
- nmi_race: Tests concurrent timer start and async cancel operations
- nmi_update: Tests updating a map element (effectively deleting and
  inserting new for array map) from within a timer callback
- nmi_cancel: Tests timer self-cancellation attempt.

A common test_common() helper is used to share timer setup logic across
all test modes.

The tests spawn multiple threads in a child process to generate
perf events, which trigger the BPF programs in NMI context. Hit counters
verify that the NMI code paths were actually exercised.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-8-alexei.starovoitov@gmail.com
2026-02-03 16:58:47 -08:00
Mykyta Yatsenko
fe9d205cec selftests/bpf: Verify bpf_timer_cancel_async works
Add test that verifies that bpf_timer_cancel_async works: can cancel
callback successfully.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-7-alexei.starovoitov@gmail.com
2026-02-03 16:58:47 -08:00
Mykyta Yatsenko
d02fdd7195 selftests/bpf: Add stress test for timer async cancel
Extend BPF timer selftest to run stress test for async cancel.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-6-alexei.starovoitov@gmail.com
2026-02-03 16:58:47 -08:00
Mykyta Yatsenko
10653c0dd8 selftests/bpf: Refactor timer selftests
Refactor timer selftests, extracting stress test into a separate test.
This makes it easier to debug test failures and allows to extend.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-5-alexei.starovoitov@gmail.com
2026-02-03 16:58:47 -08:00
Emil Tsalapatis
4d99137eea selftests/bpf: Add selftests for stream functions under lock
Add a selftest to ensure BPF stream functions can now be called
while holding a lock.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260203180424.14057-5-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-03 10:41:16 -08:00
Emil Tsalapatis
954fa97e21 selftests/bpf: Add selftests for bpf_stream_print_stack
Add selftests for the new bpf_stream_print_stack kfunc.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260203180424.14057-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-03 10:41:16 -08:00
Puranjay Mohan
f6ef5584cc selftests/bpf: Add a test for ids=0 to verifier_scalar_ids test
Test that two registers with their id=0 (unlinked) in the cached state
can be mapped to a single id (linked) in the current state.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260203165102.2302462-6-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-03 10:34:33 -08:00
Puranjay Mohan
b0388bafa4 bpf: Relax scalar id equivalence for state pruning
Scalar register IDs are used by the verifier to track relationships
between registers and enable bounds propagation across those
relationships. Once an ID becomes singular (i.e. only a single
register/stack slot carries it), it can no longer contribute to bounds
propagation and effectively becomes stale. The previous commit makes the
verifier clear such ids before caching the state.

When comparing the current and cached states for pruning, these stale
IDs can cause technically equivalent states to be considered different
and thus prevent pruning.

For example, in the selftest added in the next commit, two registers -
r6 and r7 are not linked to any other registers and get cached with
id=0, in the current state, they are both linked to each other with
id=A.  Before this commit, check_scalar_ids would give temporary ids to
r6 and r7 (say tid1 and tid2) and then check_ids() would map tid1->A,
and when it would see tid2->A, it would not consider these state
equivalent.

Relax scalar ID equivalence by treating rold->id == 0 as "independent":
if the old state did not rely on any ID relationships for a register,
then any ID/linking present in the current state only adds constraints
and is always safe to accept for pruning. Implement this by returning
true immediately in check_scalar_ids() when old_id == 0.

Maintain correctness for the opposite direction (old_id != 0 && cur_id
== 0) by still allocating a temporary ID for cur_id == 0. This avoids
incorrectly allowing multiple independent current registers (id==0) to
satisfy a single linked old ID during mapping.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260203165102.2302462-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-03 10:34:23 -08:00
Leon Hwang
7f10da2133 selftests/bpf: Enable get_func_args and get_func_ip tests on arm64
Allow get_func_args, and get_func_ip fsession selftests to run on arm64.

Acked-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260131144950.16294-4-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-31 13:51:04 -08:00
Leon Hwang
8798902f2b bpf: Add bpf_jit_supports_fsession()
The added fsession does not prevent running on those architectures, that
haven't added fsession support.

For example, try to run fsession tests on arm64:

test_fsession_basic:PASS:fsession_test__open_and_load 0 nsec
test_fsession_basic:PASS:fsession_attach 0 nsec
check_result:FAIL:test_run_opts err unexpected error: -14 (errno 14)

In order to prevent such errors, add bpf_jit_supports_fsession() to guard
those architectures.

Fixes: 2d419c4465 ("bpf: add fsession support")
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260131144950.16294-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-31 13:51:04 -08:00
Paul Chaignon
f0b5b3d6b5 selftests/bpf: Test access from RO map from xdp_store_bytes
This new test simply checks that helper bpf_xdp_store_bytes can
successfully read from a read-only map.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/4fdb934a713b2d7cf133288c77f6cfefe9856440.1769875479.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-31 13:49:43 -08:00
Jiri Olsa
4173b494d9 selftests/bpf: Allow to benchmark trigger with stacktrace
Adding support to call bpf_get_stackid helper from trigger programs,
so far added for kprobe multi.

Adding the --stacktrace/-g option to enable it.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-7-jolsa@kernel.org
2026-01-30 13:40:09 -08:00
Jiri Olsa
e5d532be4a selftests/bpf: Add stacktrace ips test for fentry/fexit
Adding test that attaches fentry/fexitand verifies the
ORC stacktrace matches expected functions.

The test is only for ORC unwinder to keep it simple.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-6-jolsa@kernel.org
2026-01-30 13:40:08 -08:00
Jiri Olsa
7373f97e86 selftests/bpf: Add stacktrace ips test for kprobe/kretprobe
Adding test that attaches kprobe/kretprobe and verifies the
ORC stacktrace matches expected functions.

The test is only for ORC unwinder to keep it simple.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-5-jolsa@kernel.org
2026-01-30 13:40:08 -08:00
Jiri Olsa
0207f94971 selftests/bpf: Fix kprobe multi stacktrace_ips test
We now include the attached function in the stack trace,
fixing the test accordingly.

Fixes: c9e208fa93 ("selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260126211837.472802-4-jolsa@kernel.org
2026-01-30 13:40:08 -08:00
Changwoo Min
cd77618c41 selftests/bpf: Make bpf get_preempt_count() work for v6.14+ kernels
Recent x86 kernels export __preempt_count as a ksym, while some old kernels
between v6.1 and v6.14 expose the preemption counter via
pcpu_hot.preempt_count. The existing selftest helper unconditionally
dereferenced __preempt_count, which breaks BPF program loading on such old
kernels.

Make the x86 preemption count lookup version-agnostic by:
- Marking __preempt_count and pcpu_hot as weak ksyms.
- Introducing a BTF-described pcpu_hot___local layout with
  preserve_access_index.
- Selecting the appropriate access path at runtime using ksym availability
  and bpf_ksym_exists() and bpf_core_field_exists().

This allows a single BPF binary to run correctly across kernel versions
(e.g., v6.18 vs. v6.13) without relying on compile-time version checks.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://lore.kernel.org/r/20260130021843.154885-1-changwoo@igalia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-30 12:20:25 -08:00
Jiri Olsa
15ac1adf0f selftests/bpf: Add test for sleepable program tailcalls
Adding test that makes sure we can't mix sleepable and non-sleepable
bpf programs in the BPF_MAP_TYPE_PROG_ARRAY map and that we can do
tail call in the sleepable program.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260130081208.1130204-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-30 12:17:47 -08:00
Luis Gerhorst
60d2c438c1 bpf: Test nospec after dead stack write in helper
Without the fix from the previous commit, the selftest fails:

$ ./tools/testing/selftests/bpf/vmtest.sh -- \
        ./test_progs -t verifier_unpriv
[...]
run_subtest:PASS:obj_open_mem 0 nsec
libbpf: BTF loading error: -EPERM
libbpf: Error loading .BTF into kernel: -EPERM. BTF is optional, ignoring.
libbpf: prog 'unpriv_nospec_after_helper_stack_write': BPF program load failed: -EFAULT
libbpf: prog 'unpriv_nospec_after_helper_stack_write': failed to load: -EFAULT
libbpf: failed to load object 'verifier_unpriv'
run_subtest:FAIL:unexpected_load_failure unexpected error: -14 (errno 14)
VERIFIER LOG:
=============
0: R1=ctx() R10=fp0
0: (b7) r0 = 0                        ; R0=P0
1: (55) if r0 != 0x1 goto pc+6 2: R0=Pscalar() R1=ctx() R10=fp0
2: (b7) r2 = 0                        ; R2=P0
3: (bf) r3 = r10                      ; R3=fp0 R10=fp0
4: (07) r3 += -16                     ; R3=fp-16
5: (b7) r4 = 4                        ; R4=P4
6: (b7) r5 = 0                        ; R5=P0
7: (85) call bpf_skb_load_bytes_relative#68
verifier bug: speculation barrier after jump instruction may not have the desired effect (BPF_CLASS(insn->code) == BPF_JMP || BPF_CLASS(insn->code) == BPF_JMP32)
processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
=============
[...]

The test is based on the PoC from the report.

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/bpf/7678017d-b760-4053-a2d8-a6879b0dbeeb@hust.edu.cn/
Link: https://lore.kernel.org/r/20260127115912.3026761-3-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-28 18:41:57 -08:00
Mykyta Yatsenko
b640d556a2 selftests/bpf: Remove xxd util dependency
The verification signature header generation requires converting a
binary certificate to a C array. Previously this only worked with
xxd (part of vim-common package).
As xxd may not be available on some systems building selftests, it makes
sense to substitute it with more common utils: hexdump, wc, sed to
generate equivalent C array output.

Tested by generating header with both xxd and hexdump and comparing
them.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20260128190552.242335-1-mykyta.yatsenko5@gmail.com
2026-01-28 13:23:51 -08:00
Jiayuan Chen
17e2ce02bf selftests/bpf: Add tests for FIONREAD and copied_seq
This commit adds two new test functions: one to reproduce the bug reported
by syzkaller [1], and another to cover the calculation of copied_seq.

The tests primarily involve installing  and uninstalling sockmap on
sockets, then reading data to verify proper functionality.

Additionally, extend the do_test_sockmap_skb_verdict_fionread() function
to support UDP FIONREAD testing.

[1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20260124113314.113584-4-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27 09:12:04 -08:00
Matt Bobrowski
1456ebb291 selftests/bpf: cover BPF_CGROUP_ITER_CHILDREN control option
Extend some of the existing CSS iterator selftests such that they
cover the newly introduced BPF_CGROUP_ITER_CHILDREN iterator control
option.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20260127085112.3608687-2-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27 09:06:03 -08:00
Alexis Lothoré (eBPF Foundation)
2d96bbdfd3 selftests/bpf: convert test_bpftool_map_access.sh into test_progs framework
The test_bpftool_map.sh script tests that maps read/write accesses
are being properly allowed/refused by the kernel depending on a specific
fmod_ret program being attached on security_bpf_map function.

Rewrite this test to integrate it in the test_progs. The
new test spawns a few subtests:

  #36/1    bpftool_maps_access/unprotected_unpinned:OK
  #36/2    bpftool_maps_access/unprotected_pinned:OK
  #36/3    bpftool_maps_access/protected_unpinned:OK
  #36/4    bpftool_maps_access/protected_pinned:OK
  #36/5    bpftool_maps_access/nested_maps:OK
  #36/6    bpftool_maps_access/btf_list:OK
  #36      bpftool_maps_access:OK
  Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-3-a6653a7f28e7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-26 18:47:31 -08:00
Alexis Lothoré (eBPF Foundation)
1c0b505908 selftests/bpf: convert test_bpftool_metadata.sh into test_progs framework
The test_bpftool_metadata.sh script validates that bpftool properly
returns in its ouptput any metadata generated by bpf programs through
some .rodata sections.

Port this test to the test_progs framework so that it can be executed
automatically in CI. The new test, similarly to the former script,
checks that valid data appears both for textual output and json output,
as well as for both data not used at all and used data. For the json
check part, the expected json string is hardcoded to avoid bringing a
new external dependency (eg: a json deserializer) for test_progs.
As the test is now converted into test_progs, remove the former script.

The newly converted test brings two new subtests:

  #37/1    bpftool_metadata/metadata_unused:OK
  #37/2    bpftool_metadata/metadata_used:OK
  #37      bpftool_metadata:OK
  Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-2-a6653a7f28e7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-26 18:47:31 -08:00
Alexis Lothoré (eBPF Foundation)
f21fae5774 selftests/bpf: Add a few helpers for bpftool testing
In order to integrate some bpftool tests into test_progs, define a few
specific helpers that allow to execute bpftool commands, while possibly
retrieving the command output. Those helpers most notably set the
path to the bpftool binary under test. This version checks different
possible paths relative to the directories where the different
test_progs runners are executed, as we want to make sure not to
accidentally use a bootstrap version of the binary.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-1-a6653a7f28e7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-26 18:47:21 -08:00
Leon Hwang
78980b4c7f selftests/bpf: Harden cpu flags test for lru_percpu_hash map
CI occasionally reports failures in the
percpu_alloc/cpu_flag_lru_percpu_hash selftest, for example:

 First test_progs failure (test_progs_no_alu32-x86_64-llvm-21):
 #264/15 percpu_alloc/cpu_flag_lru_percpu_hash
 ...
 test_percpu_map_op_cpu_flag:FAIL:bpf_map_lookup_batch value on specified cpu unexpected bpf_map_lookup_batch value on specified cpu: actual 0 != expected 3735929054

The unexpected value indicates that an element was removed from the map.
However, the test never calls delete_elem(), so the only possible cause
is LRU eviction.

This can happen when the current task migrates to another CPU: an
update_elem() triggers eviction because there is no available LRU node
on local freelist and global freelist.

Harden the test against this behavior by provisioning sufficient spare
elements. Set max_entries to 'nr_cpus * 2' and restrict the test to using
the first nr_cpus entries, ensuring that updates do not spuriously trigger
LRU eviction.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260119133417.19739-1-leon.hwang@linux.dev
2026-01-26 10:59:07 -08:00
Changwoo Min
221b5e76c1 selftests/bpf: Add tests for execution context helpers
Add a new selftest suite `exe_ctx` to verify the accuracy of the
bpf_in_task(), bpf_in_hardirq(), and bpf_in_serving_softirq() helpers
introduced in bpf_experimental.h.

Testing these execution contexts deterministically requires crossing
context boundaries within a single CPU. To achieve this, the test
implements a "Trigger-Observer" pattern using bpf_testmod:

1. Trigger: A BPF syscall program calls a new bpf_testmod kfunc
   bpf_kfunc_trigger_ctx_check().
2. Task to HardIRQ: The kfunc uses irq_work_queue() to trigger a
   self-IPI on the local CPU.
3. HardIRQ to SoftIRQ: The irq_work handler calls a dummy function
   (observed by BPF fentry) and then schedules a tasklet to
   transition into SoftIRQ context.

The user-space runner ensures determinism by pinning itself to CPU 0
before execution, forcing the entire interrupt chain to remain on a
single core. Dummy noinline functions with compiler barriers are
added to bpf_testmod.c to serve as stable attachment points for
fentry programs. A retry loop is used in user-space to wait for the
asynchronous SoftIRQ to complete.

Note that testing on s390x is avoided because supporting those helpers
purely in BPF on s390x is not possible at this point.

Reviewed-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://lore.kernel.org/r/20260125115413.117502-3-changwoo@igalia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25 08:20:50 -08:00
Changwoo Min
c31df36bd2 selftests/bpf: Introduce execution context detection helpers
Introduce bpf_in_nmi(), bpf_in_hardirq(), bpf_in_serving_softirq(), and
bpf_in_task() inline helpers in bpf_experimental.h. These allow BPF
programs to query the current execution context with higher granularity
than the existing bpf_in_interrupt() helper.

While BPF programs can often infer their context from attachment points,
subsystems like sched_ext may call the same BPF logic from multiple
contexts (e.g., task-to-task wake-ups vs. interrupt-to-task wake-ups).
These helpers provide a reliable way for logic to branch based on
the current CPU execution state.

Implementing these as BPF-native inline helpers wrapping
get_preempt_count() allows the compiler and JIT to inline the logic. The
implementation accounts for differences in preempt_count layout between
standard and PREEMPT_RT kernels.

Reviewed-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://lore.kernel.org/r/20260125115413.117502-2-changwoo@igalia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25 08:20:50 -08:00
Menglong Dong
cb4bfacfb0 selftests/bpf: test fsession mixed with fentry and fexit
Test the fsession when it is used together with fentry, fexit.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260124062008.8657-14-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24 18:49:37 -08:00
Menglong Dong
8909b3fb23 selftests/bpf: add testcases for fsession cookie
Test session cookie for fsession. Multiple fsession BPF progs is attached
to bpf_fentry_test1() and session cookie is read and write in the
testcase.

bpf_get_func_ip() will influence the layout of the session cookies, so we
test the cookie in two case: with and without bpf_get_func_ip().

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260124062008.8657-13-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24 18:49:36 -08:00
Menglong Dong
a5533a6eaa selftests/bpf: test bpf_get_func_* for fsession
Test following bpf helper for fsession:
  bpf_get_func_arg()
  bpf_get_func_arg_cnt()
  bpf_get_func_ret()
  bpf_get_func_ip()

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260124062008.8657-12-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24 18:49:36 -08:00
Menglong Dong
f7afef5617 selftests/bpf: add testcases for fsession
Add testcases for BPF_TRACE_FSESSION. The function arguments and return
value are tested both in the entry and exit. And the kfunc
bpf_session_is_ret() is also tested.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260124062008.8657-11-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24 18:49:36 -08:00
Menglong Dong
8fe4dc4f64 bpf: change prototype of bpf_session_{cookie,is_return}
Add the function argument of "void *ctx" to bpf_session_cookie() and
bpf_session_is_return(), which is a preparation of the next patch.

The two kfunc is seldom used now, so it will not introduce much effect
to change their function prototype.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20260124062008.8657-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24 18:49:35 -08:00
Menglong Dong
2d419c4465 bpf: add fsession support
The fsession is something that similar to kprobe session. It allow to
attach a single BPF program to both the entry and the exit of the target
functions.

Introduce the struct bpf_fsession_link, which allows to add the link to
both the fentry and fexit progs_hlist of the trampoline.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260124062008.8657-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24 18:49:35 -08:00
Yonghong Song
c7900f225a selftests/bpf: Fix xdp_pull_data failure with 64K page
If the argument 'pull_len' of run_test() is 'PULL_MAX' or
'PULL_MAX | PULL_PLUS_ONE', the eventual pull_len size
will close to the page size. On arm64 systems with 64K pages,
the pull_len size will be close to 64K. But the existing buffer
will be close to 9000 which is not enough to pull.

For those failed run_tests(), make buff size to
  pg_sz + (pg_sz / 2)
This way, there will be enough buffer space to pull
regardless of page size.

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Cc: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260123055128.495265-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24 16:51:22 -08:00
Yonghong Song
d8df878140 selftests/bpf: Fix task_local_data failure with 64K page
On arm64 systems with 64K pages, the selftest task_local_data has the following
failures:
  ...
  test_task_local_data_basic:PASS:tld_create_key 0 nsec
  test_task_local_data_basic:FAIL:tld_create_key unexpected tld_create_key: actual 0 != expected -28
  ...
  test_task_local_data_basic_thread:PASS:run task_main 0 nsec
  test_task_local_data_basic_thread:FAIL:task_main retval unexpected error: 2 (errno 0)
  test_task_local_data_basic_thread:FAIL:tld_get_data value0 unexpected tld_get_data value0: actual 0 != expected 6268
  ...
  #447/1   task_local_data/task_local_data_basic:FAIL
  ...
  #447/2   task_local_data/task_local_data_race:FAIL
  #447     task_local_data:FAIL

When TLD_DYN_DATA_SIZE is 64K page size, for
  struct tld_meta_u {
       _Atomic __u8 cnt;
       __u16 size;
        struct tld_metadata metadata[];
  };
field 'cnt' would overflow. For example, for 4K page, 'cnt' will
be 4096/64 = 64. But for 64K page, 'cnt' will be 65536/64 = 1024
and 'cnt' is not enough for 1024. To accommodate 64K page,
'_Atomic __u8 cnt' becomes '_Atomic __u16 cnt'. A few other places
are adjusted accordingly.

In test_task_local_data.c, the value for TLD_DYN_DATA_SIZE is changed
from 4096 to (getpagesize() - 8) since the maximum buffer size for
TLD_DYN_DATA_SIZE is (getpagesize() - 8).

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Cc: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260123055122.494352-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24 16:51:21 -08:00
Kery Qi
a32ae26584 selftests/bpf: Fix resource leak in serial_test_wq on attach failure
When wq__attach() fails, serial_test_wq() returns early without calling
wq__destroy(), leaking the skeleton resources allocated by
wq__open_and_load(). This causes ASAN leak reports in selftests runs.

Fix this by jumping to a common clean_up label that calls wq__destroy()
on all exit paths after successful open_and_load.

Note that the early return after wq__open_and_load() failure is correct
and doesn't need fixing, since that function returns NULL on failure
(after internally cleaning up any partial allocations).

Fixes: 8290dba519 ("selftests/bpf: wq: add bpf_wq_start() checks")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20260121094114.1801-3-qikeyu2017@gmail.com
2026-01-21 16:12:10 -08:00
Yuzuki Ishiyama
f4924ad0b1 selftests/bpf: Test kfunc bpf_strncasecmp
Add testsuites for kfunc bpf_strncasecmp.

Signed-off-by: Yuzuki Ishiyama <ishiyama@hpc.is.uec.ac.jp>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20260121033328.1850010-3-ishiyama@hpc.is.uec.ac.jp
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21 09:42:53 -08:00
Menglong Dong
1ed7977643 selftests/bpf: test bpf_get_func_arg() for tp_btf
Test bpf_get_func_arg() and bpf_get_func_arg_cnt() for tp_btf. The code
is most copied from test1 and test2.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260121044348.113201-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21 09:31:35 -08:00
Menglong Dong
4fca95095c selftests/bpf: test the jited inline of bpf_get_current_task
Add the testcase for the jited inline of bpf_get_current_task().

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260120070555.233486-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 20:39:01 -08:00
Li RongQing
e700f5d156 watchdog: softlockup: panic when lockup duration exceeds N thresholds
The softlockup_panic sysctl is currently a binary option: panic
immediately or never panic on soft lockups.

Panicking on any soft lockup, regardless of duration, can be overly
aggressive for brief stalls that may be caused by legitimate operations. 
Conversely, never panicking may allow severe system hangs to persist
undetected.

Extend softlockup_panic to accept an integer threshold, allowing the
kernel to panic only when the normalized lockup duration exceeds N
watchdog threshold periods.  This provides finer-grained control to
distinguish between transient delays and persistent system failures.

The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic when duration >= 1 * threshold (20s default, original behavior)
- N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.)

The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility while allowing systems to tolerate brief lockups
while still catching severe, persistent hangs.

[lirongqing@baidu.com: v2]
  Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com
Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:44:20 -08:00
Matt Bobrowski
dd341eacdb selftests/bpf: update verifier test for default trusted pointer semantics
Replace the verifier test for default trusted pointer semantics, which
previously relied on BPF kfunc bpf_get_root_mem_cgroup(), with a new
test utilizing dedicated BPF kfuncs defined within the bpf_testmod.

bpf_get_root_mem_cgroup() was modified such that it again relies on
KF_ACQUIRE semantics, therefore no longer making it a suitable
candidate to test BPF verifier default trusted pointer semantics
against.

Link: https://lore.kernel.org/bpf/20260113083949.2502978-2-mattbobrowski@google.com
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20260120091630.3420452-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 17:11:24 -08:00
Yazhou Tang
c9e440bf25 selftests/bpf: Add tests for BPF_DIV and BPF_MOD range tracking
Now BPF_DIV has range tracking support via interval analysis. This patch
adds selftests to cover various cases of BPF_DIV and BPF_MOD operations
when the divisor is a constant, also covering both signed and unsigned variants.

This patch includes several types of tests in 32-bit and 64-bit variants:

1. For UDIV
   - positive divisor
   - zero divisor

2. For SDIV
   - positive divisor, positive dividend
   - positive divisor, negative dividend
   - positive divisor, mixed sign dividend
   - negative divisor, positive dividend
   - negative divisor, negative dividend
   - negative divisor, mixed sign dividend
   - zero divisor
   - overflow (SIGNED_MIN/-1), normal dividend
   - overflow (SIGNED_MIN/-1), constant dividend

3. For UMOD
   - positive divisor
   - positive divisor, small dividend
   - zero divisor

4. For SMOD
   - positive divisor, positive dividend
   - positive divisor, negative dividend
   - positive divisor, mixed sign dividend
   - positive divisor, mixed sign dividend, small dividend
   - negative divisor, positive dividend
   - negative divisor, negative dividend
   - negative divisor, mixed sign dividend
   - negative divisor, mixed sign dividend, small dividend
   - zero divisor
   - overflow (SIGNED_MIN/-1), normal dividend
   - overflow (SIGNED_MIN/-1), constant dividend

Specifically, these selftests are based on dead code elimination:
If the BPF verifier can precisely analyze the result of BPF_DIV/BPF_MOD
instruction, it can prune the path that leads to an error (here we use
invalid memory access as the error case), allowing the program to pass
verification.

Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Link: https://lore.kernel.org/r/20260119085458.182221-3-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:41:54 -08:00
Yazhou Tang
44fdd581d2 bpf: Add range tracking for BPF_DIV and BPF_MOD
This patch implements range tracking (interval analysis) for BPF_DIV and
BPF_MOD operations when the divisor is a constant, covering both signed
and unsigned variants.

While LLVM typically optimizes integer division and modulo by constants
into multiplication and shift sequences, this optimization is less
effective for the BPF target when dealing with 64-bit arithmetic.

Currently, the verifier does not track bounds for scalar division or
modulo, treating the result as "unbounded". This leads to false positive
rejections for safe code patterns.

For example, the following code (compiled with -O2):

```c
int test(struct pt_regs *ctx) {
    char buffer[6] = {1};
    __u64 x = bpf_ktime_get_ns();
    __u64 res = x % sizeof(buffer);
    char value = buffer[res];
    bpf_printk("res = %llu, val = %d", res, value);
    return 0;
}
```

Generates a raw `BPF_MOD64` instruction:

```asm
;     __u64 res = x % sizeof(buffer);
       1:	97 00 00 00 06 00 00 00	r0 %= 0x6
;     char value = buffer[res];
       2:	18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00	r1 = 0x0 ll
       4:	0f 01 00 00 00 00 00 00	r1 += r0
       5:	91 14 00 00 00 00 00 00	r4 = *(s8 *)(r1 + 0x0)
```

Without this patch, the verifier fails with "math between map_value
pointer and register with unbounded min value is not allowed" because
it cannot deduce that `r0` is within [0, 5].

According to the BPF instruction set[1], the instruction's offset field
(`insn->off`) is used to distinguish between signed (`off == 1`) and
unsigned division (`off == 0`). Moreover, we also follow the BPF division
and modulo runtime behavior (semantics) to handle special cases, such as
division by zero and signed division overflow.

- UDIV: dst = (src != 0) ? (dst / src) : 0
- SDIV: dst = (src == 0) ? 0 : ((src == -1 && dst == LLONG_MIN) ? LLONG_MIN : (dst / src))
- UMOD: dst = (src != 0) ? (dst % src) : dst
- SMOD: dst = (src == 0) ? dst : ((src == -1 && dst == LLONG_MIN) ? 0: (dst s% src))

Here is the overview of the changes made in this patch (See the code comments
for more details and examples):

1. For BPF_DIV: Firstly check whether the divisor is zero. If so, set the
   destination register to zero (matching runtime behavior).

   For non-zero constant divisors: goto `scalar(32)?_min_max_(u|s)div` functions.
   - General cases: compute the new range by dividing max_dividend and
     min_dividend by the constant divisor.
   - Overflow case (SIGNED_MIN / -1) in signed division: mark the result
     as unbounded if the dividend is not a single number.

2. For BPF_MOD: Firstly check whether the divisor is zero. If so, leave the
   destination register unchanged (matching runtime behavior).

   For non-zero constant divisors: goto `scalar(32)?_min_max_(u|s)mod` functions.
   - General case: For signed modulo, the result's sign matches the
     dividend's sign. And the result's absolute value is strictly bounded
     by `min(abs(dividend), abs(divisor) - 1)`.
     - Special care is taken when the divisor is SIGNED_MIN. By casting
       to unsigned before negation and subtracting 1, we avoid signed
       overflow and correctly calculate the maximum possible magnitude
       (`res_max_abs` in the code).
   - "Small dividend" case: If the dividend is already within the possible
     result range (e.g., [-2, 5] % 10), the operation is an identity
     function, and the destination register remains unchanged.

3. In `scalar(32)?_min_max_(u|s)(div|mod)` functions: After updating current
   range, reset other ranges and tnum to unbounded/unknown.

   e.g., in `scalar_min_max_sdiv`, signed 64-bit range is updated. Then reset
   unsigned 64-bit range and 32-bit range to unbounded, and tnum to unknown.

   Exception: in BPF_MOD's "small dividend" case, since the result remains
   unchanged, we do not reset other ranges/tnum.

4. Also updated existing selftests based on the expected BPF_DIV and
   BPF_MOD behavior.

[1] https://www.kernel.org/doc/Documentation/bpf/standardization/instruction-set.rst

Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Tested-by: syzbot@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20260119085458.182221-2-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:41:53 -08:00
Ihor Solodrai
bd06b977e0 selftests/bpf: Migrate struct_ops_assoc test to KF_IMPLICIT_ARGS
A test kfunc named bpf_kfunc_multi_st_ops_test_1_impl() is a user of
__prog suffix. Subsequent patch removes __prog support in favor of
KF_IMPLICIT_ARGS, so migrate this kfunc to use implicit argument.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-12-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:22:38 -08:00
Ihor Solodrai
d806f31012 bpf: Migrate bpf_stream_vprintk() to KF_IMPLICIT_ARGS
Implement bpf_stream_vprintk with an implicit bpf_prog_aux argument,
and remote bpf_stream_vprintk_impl from the kernel.

Update the selftests to use the new API with implicit argument.

bpf_stream_vprintk macro is changed to use the new bpf_stream_vprintk
kfunc, and the extern definition of bpf_stream_vprintk_impl is
replaced accordingly.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-11-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:22:38 -08:00
Ihor Solodrai
6e663ffdf7 bpf: Migrate bpf_task_work_schedule_* kfuncs to KF_IMPLICIT_ARGS
Implement bpf_task_work_schedule_* with an implicit bpf_prog_aux
argument, and remove corresponding _impl funcs from the kernel.

Update special kfunc checks in the verifier accordingly.

Update the selftests to use the new API with implicit argument.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-10-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:22:20 -08:00
Ihor Solodrai
b97931a25a bpf: Migrate bpf_wq_set_callback_impl() to KF_IMPLICIT_ARGS
Implement bpf_wq_set_callback() with an implicit bpf_prog_aux
argument, and remove bpf_wq_set_callback_impl().

Update special kfunc checks in the verifier accordingly.

Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-8-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:15:57 -08:00
Ihor Solodrai
e939f3d16d selftests/bpf: Add tests for KF_IMPLICIT_ARGS
Add trivial end-to-end tests to validate that KF_IMPLICIT_ARGS flag is
properly handled by both resolve_btfids and the verifier.

Declare kfuncs in bpf_testmod. Check that bpf_prog_aux pointer is set
in the kfunc implementation. Verify that calls with implicit args and
a legacy case all work.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-7-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20 16:15:57 -08:00
Gyutae Bae
2e6690d4f7 selftests/bpf: Add perfbuf multi-producer benchmark
Add a multi-producer benchmark for perfbuf to complement the existing
ringbuf multi-producer test. Unlike ringbuf which uses a shared buffer
and experiences contention, perfbuf uses per-CPU buffers so the test
measures scaling behavior rather than contention.

This allows developers to compare perfbuf vs ringbuf performance under
multi-producer workloads when choosing between the two for their systems.

Signed-off-by: Gyutae Bae <gyutae.bae@navercorp.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260120090716.82927-1-gyutae.opensource@navercorp.com
2026-01-20 11:37:25 -08:00
Yonghong Song
efad162f5a selftests/bpf: Fix map_kptr test failure
On my arm64 machine, I get the following failure:
  ...
  tester_init:PASS:tester_log_buf 0 nsec
  process_subtest:PASS:obj_open_mem 0 nsec
  process_subtest:PASS:specs_alloc 0 nsec
  serial_test_map_kptr:PASS:rcu_tasks_trace_gp__open_and_load 0 nsec
  ...
  test_map_kptr_success:PASS:map_kptr__open_and_load 0 nsec
  test_map_kptr_success:PASS:test_map_kptr_ref1 refcount 0 nsec
  test_map_kptr_success:FAIL:test_map_kptr_ref1 retval unexpected error: 2 (errno 2)
  test_map_kptr_success:PASS:test_map_kptr_ref2 refcount 0 nsec
  test_map_kptr_success:FAIL:test_map_kptr_ref2 retval unexpected error: 1 (errno 2)
  ...
  #201/21  map_kptr/success-map:FAIL

In serial_test_map_kptr(), before test_map_kptr_success(), one
kern_sync_rcu() is used to have some delay for freeing the map.
But in my environment, one kern_sync_rcu() seems not enough and
caused the test failure.

In bpf_map_free_in_work() in syscall.c, the queue time for
  queue_work(system_dfl_wq, &map->work)
may be longer than expected. This may cause the test failure
since test_map_kptr_success() expects all previous maps having been freed.

Since it is not clear how long queue_work() time takes, a bpf prog
is added to count the reference after bpf_kfunc_call_test_acquire().
If the number of references is 2 (for initial ref and the one just
acquired), all previous maps should have been released. This will
resolve the above 'retval unexpected error' issue.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20260116052245.3692405-1-yonghong.song@linux.dev
2026-01-16 14:59:57 -08:00
Alan Maguire
47d440d0a5 selftests/bpf: Support when CONFIG_VXLAN=m
If CONFIG_VXLAN is 'm', struct vxlanhdr will not be in vmlinux.h.
Add a ___local variant to support cases where vxlan is a module.

Fixes: 8517b1abe5 ("selftests/bpf: Integrate test_tc_tunnel.sh tests into test_progs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260115163457.146267-1-alan.maguire@oracle.com
2026-01-16 14:54:10 -08:00
Puranjay Mohan
086c99fbe4 selftests: bpf: Add test for multiple syncs from linked register
Before the last commit, sync_linked_regs() corrupted the register whose
bounds are being updated by copying known_reg's id to it. The ids are
the same in value but known_reg has the BPF_ADD_CONST flag which is
wrongly copied to reg.

This later causes issues when creating new links to this reg.
assign_scalar_id_before_mov() sees this BPF_ADD_CONST and gives a new id
to this register and breaks the old links. This is exposed by the added
selftest.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260115151143.1344724-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-16 10:08:59 -08:00
Jiri Olsa
934d9746ed selftests/bpf: Add test for bpf_override_return helper
We do not actually test the bpf_override_return helper functionality
itself at the moment, only the bpf program being able to attach it.

Adding test that override prctl syscall return value on top of
kprobe and kprobe.multi.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20260112121157.854473-2-jolsa@kernel.org
2026-01-15 16:15:26 -08:00
Anton Protopopov
7c8e817e44 selftests/bpf: Extend live regs tests with a test for gotox
Add a test which checks that the destination register of a gotox
instruction is marked as used and that the union of jump targets
is considered as live.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260114162544.83253-3-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-14 19:08:09 -08:00
Alexei Starovoitov
e3d0dbb3b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5
Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Adjacent:
Auto-merging MAINTAINERS
Auto-merging Makefile
Auto-merging kernel/bpf/verifier.c
Auto-merging kernel/sched/ext.c
Auto-merging mm/memcontrol.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-14 15:22:01 -08:00
Anton Protopopov
c656807675 selftests/bpf: Add tests for loading insn array values with offsets
The ldimm64 instruction for map value supports an offset.
For insn array maps it wasn't tested before, as normally
such instructions aren't generated. However, this is still
possible to pass such instructions, so add a few tests to
check that correct offsets work properly and incorrect
offsets are rejected.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260111153047.8388-4-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 19:36:47 -08:00
Matt Bobrowski
bbdbed193b selftests/bpf: assert BPF kfunc default trusted pointer semantics
The BPF verifier was recently updated to treat pointers to struct types
returned from BPF kfuncs as implicitly trusted by default. Add a new
test case to exercise this new implicit trust semantic.

The KF_ACQUIRE flag was dropped from the bpf_get_root_mem_cgroup()
kfunc because it returns a global pointer to root_mem_cgroup without
performing any explicit reference counting. This makes it an ideal
candidate to verify the new implicit trusted pointer semantics.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260113083949.2502978-3-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 19:19:13 -08:00
Matt Bobrowski
f8ade2342e bpf: return PTR_TO_BTF_ID | PTR_TRUSTED from BPF kfuncs by default
Teach the BPF verifier to treat pointers to struct types returned from
BPF kfuncs as implicitly trusted (PTR_TO_BTF_ID | PTR_TRUSTED) by
default. Returning untrusted pointers to struct types from BPF kfuncs
should be considered an exception only, and certainly not the norm.

Update existing selftests to reflect the change in register type
printing (e.g. `ptr_` becoming `trusted_ptr_` in verifier error
messages).

Link: https://lore.kernel.org/bpf/aV4nbCaMfIoM0awM@google.com/
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260113083949.2502978-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 19:19:13 -08:00
Donglin Peng
a3acd7d434 selftests/bpf: Add test cases for btf__permute functionality
This patch introduces test cases for the btf__permute function to ensure
it works correctly with both base BTF and split BTF scenarios.

The test suite includes:
- test_permute_base: Validates permutation on base BTF
- test_permute_split: Tests permutation on split BTF

Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20260109130003.3313716-3-dolinux.peng@gmail.com
2026-01-13 16:10:39 -08:00
Alexei Starovoitov
9160335317 selftests/bpf: Add tests for s>>=31 and s>>=63
Add tests for special arithmetic shift right.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260112201424.816836-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 09:33:38 -08:00
Yonghong Song
951d79017e selftests/bpf: Fix verifier_arena_globals1 failure with 64K page
With 64K page on arm64, verifier_arena_globals1 failed like below:
  ...
  libbpf: map 'arena': failed to create: -E2BIG
  ...
  #509/1   verifier_arena_globals1/check_reserve1:FAIL
  ...

For 64K page, if the number of arena pages is (1UL << 20), the total
memory will exceed 4G and this will cause map creation failure.
Adjusting ARENA_PAGES based on the actual page size fixed the problem.

Cc: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260113061033.3798549-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 09:32:16 -08:00
Yonghong Song
d2f7cd20a7 selftests/bpf: Fix sk_bypass_prot_mem failure with 64K page
The current selftest sk_bypass_prot_mem only supports 4K page.
When running with 64K page on arm64, the following failure happens:
  ...
  check_bypass:FAIL:no bypass unexpected no bypass: actual 3 <= expected 32
  ...
  #385/1   sk_bypass_prot_mem/TCP  :FAIL
  ...
  check_bypass:FAIL:no bypass unexpected no bypass: actual 4 <= expected 32
  ...
  #385/2   sk_bypass_prot_mem/UDP  :FAIL
  ...

Adding support to 64K page as well fixed the failure.

Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260113061028.3798326-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 09:32:16 -08:00
Yonghong Song
2465a08d43 selftests/bpf: Fix dmabuf_iter/lots_of_buffers failure with 64K page
On arm64 with 64K page , I observed the following test failure:
  ...
  subtest_dmabuf_iter_check_lots_of_buffers:FAIL:total_bytes_read unexpected total_bytes_read:
      actual 4696 <= expected 65536
  #97/3    dmabuf_iter/lots_of_buffers:FAIL

With 4K page on x86, the total_bytes_read is 4593.
With 64K page on arm64, the total_byte_read is 4696.

In progs/dmabuf_iter.c, for each iteration, the output is
  BPF_SEQ_PRINTF(seq, "%lu\n%llu\n%s\n%s\n", inode, size, name, exporter);

The only difference between 4K and 64K page is 'size' in
the above BPF_SEQ_PRINTF. The 4K page will output '4096' and
the 64K page will output '65536'. So the total_bytes_read with 64K page
is slighter greater than 4K page.

Adjusting the total_bytes_read from 65536 to 4096 fixed the issue.

Cc: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260113061023.3798085-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-13 09:32:16 -08:00
Sami Tolvanen
ba7f1024a1 selftests/bpf: Use the correct destructor kfunc type
With CONFIG_CFI enabled, the kernel strictly enforces that indirect
function calls use a function pointer type that matches the target
function. As bpf_testmod_ctx_release() signature differs from the
btf_dtor_kfunc_t pointer type used for the destructor calls in
bpf_obj_free_fields(), add a stub function with the correct type to
fix the type mismatch.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260110082548.113748-9-samitolvanen@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-12 18:53:57 -08:00
Jose E. Marchesi
681600647c bpf: GCC requires function attributes before the declarator
GCC insists in placing attributes before the declarators in function
declarations.  Now that GCC supports btf_decl_tag and therefore __tag1
and __tag2 expand to actual attributes, the compiler is complaining
about it for

  static __noinline int foo(int x __tag1 __tag2) __tag1 __tag2

progs/test_btf_decl_tag.c:36:1: error: attributes should be specified \
before the declarator in a function definition

This patch simply places the tags before the declarator.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260106173650.18191-3-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 21:04:11 -08:00
Jose E. Marchesi
97fb54d86d bpf: adapt selftests to GCC 16 -Wunused-but-set-variable
GCC 16 has changed the semantics of -Wunused-but-set-variable, as well
as introducing new options -Wunused-but-set-variable={0,1,2,3} to
adjust the level of support.

One of the changes is that GCC now treats 'sum += 1' and 'sum++' as
non-usage, whereas clang (and GCC < 16) considers the first as usage
and the second as non-usage, which is sort of inconsistent.

The GCC 16 -Wunused-but-set-variable=2 option implements the previous
semantics of -Wunused-but-set-variable, but since it is a new option,
it cannot be used unconditionally for forward-compatibility, just for
backwards-compatibility.

So this patch adds pragmas to the two self-tests impacted by this,
progs/free_timer.c and progs/rcu_read_lock.c, to make gcc to ignore
-Wunused-but-set-variable warnings when compiling them with GCC > 15.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44677#c25 for details
on why this regression got introduced in GCC upstream.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260106173650.18191-2-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 21:04:11 -08:00
Leon Hwang
07bf7aa58e selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags
Add test coverage for the new BPF_F_CPU and BPF_F_ALL_CPUS flags support
in percpu maps. The following APIs are exercised:

* bpf_map_update_batch()
* bpf_map_lookup_batch()
* bpf_map_update_elem()
* bpf_map__update_elem()
* bpf_map_lookup_elem_flags()
* bpf_map__lookup_elem()

For lru_percpu_hash map, set max_entries to
'libbpf_num_possible_cpus() + 1' and only use the first
'libbpf_num_possible_cpus()' entries. This ensures a spare entry is always
available in the LRU free list, avoiding eviction.

When updating an existing key in lru_percpu_hash map:

1. l_new = prealloc_lru_pop();  /* Borrow from free list */
2. l_old = lookup_elem_raw();   /* Found, key exists */
3. pcpu_copy_value();           /* In-place update */
4. bpf_lru_push_free();         /* Return l_new to free list */

Also add negative tests to verify that non-percpu array and hash maps
reject the BPF_F_CPU and BPF_F_ALL_CPUS flags.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260107022022.12843-8-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 20:48:32 -08:00
Emil Tsalapatis
b81d5e9d96 selftests/bpf: add tests for arena kfuncs under lock
Add selftests to ensure the verifier permits calling the arena
kfunc API while holding a lock.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260106-arena-under-lock-v2-3-378e9eab3066@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 17:44:13 -08:00
Toke Høiland-Jørgensen
ab86d0bf01 selftests/bpf: Update xdp_context_test_run test to check maximum metadata size
Update the selftest to check that the metadata size check takes the
xdp_frame size into account in bpf_prog_test_run. The original
check (for meta size 256) was broken because the data frame supplied was
smaller than this, triggering a different EINVAL return. So supply a
larger data frame for this test to make sure we actually exercise the
check we think we are.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260105114747.1358750-2-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-06 11:41:41 -08:00
Puranjay Mohan
cf503eb2c6 selftests: bpf: Fix test_bpf_nf for trusted args becoming default
With trusted args now being the default, passing NULL to kfunc
parameters that are pointers causes verifier rejection rather than a
runtime error. The test_bpf_nf test was failing because it attempted to
pass NULL to bpf_xdp_ct_lookup() to verify runtime error handling.

Since the NULL check now happens at verification time, remove the
runtime test case that passed NULL to the bpf_tuple parameter and
instead add verification-time tests to ensure the verifier correctly
rejects programs that pass NULL to trusted arguments.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-11-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
cf82580c86 selftests: bpf: fix cgroup_hierarchical_stats
The cgroup_hierarchical_stats selftests uses an fentry program attached
to cgroup_attach_task and then passes the received &dst_cgrp->self to
the css_rstat_updated() kfunc. The verifier now assumes that all kfuncs
only takes trusted pointer arguments, and pointers received by fentry
are not marked trustes by default.

Use a tp_btf program in place for fentry for this test, pointers
received by tp_btf programs are marked trusted by the verifier.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-10-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
230b0118e4 selftests: bpf: fix test_kfunc_dynptr_param
As verifier now assumes that all kfuncs only takes trusted pointer
arguments, passing 0 (NULL) to a kfunc that doesn't mark the argument as
__nullable or __opt will be rejected with a failure message of: Possibly
NULL pointer passed to trusted arg<n>

Pass a non-null value to the kfunc to test the expected failure mode.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-9-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
03cc77b10e selftests: bpf: Update failure message for rbtree_fail
The rbtree_api_use_unchecked_remove_retval() selftest passes a pointer
received from bpf_rbtree_remove() to bpf_rbtree_add() without checking
for NULL, this was earlier caught by __check_ptr_off_reg() in the
verifier. Now the verifier assumes every kfunc only takes trusted pointer
arguments, so it catches this NULL pointer earlier in the path and
provides a more accurate failure message.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-8-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
df5004579b selftests: bpf: Update kfunc_param_nullable test for new error message
With trusted args now being the default, the NULL pointer check runs
before type-specific validation. Update test3 to expect the new error
message "Possibly NULL pointer passed to trusted arg0" instead of the
old dynptr-specific error message.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-7-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:29 -08:00
Puranjay Mohan
7646c7afd9 bpf: Remove redundant KF_TRUSTED_ARGS flag from all kfuncs
Now that KF_TRUSTED_ARGS is the default for all kfuncs, remove the
explicit KF_TRUSTED_ARGS flag from all kfunc definitions and remove the
flag itself.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-02 12:04:28 -08:00
Puranjay Mohan
c286e7e9d1 selftests/bpf: veristat: fix printing order in output_stats()
The order of the variables in the printf() doesn't match the text and
therefore veristat prints something like this:

Done. Processed 24 files, 0 programs. Skipped 62 files, 0 programs.

When it should print:

Done. Processed 24 files, 62 programs. Skipped 0 files, 0 programs.

Fix the order of variables in the printf() call.

Fixes: 518fee8bfa ("selftests/bpf: make veristat skip non-BPF and failing-to-open BPF objects")
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251231221052.759396-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-31 16:11:49 -08:00
Ihor Solodrai
1a8fa7faf4 resolve_btfids: Implement --patch_btfids
Recent changes in BTF generation [1] rely on ${OBJCOPY} command to
update .BTF_ids section data in target ELF files.

This exposed a bug in llvm-objcopy --update-section code path, that
may lead to corruption of a target ELF file. Specifically, because of
the bug st_shndx of some symbols may be (incorrectly) set to 0xffff
(SHN_XINDEX) [2][3].

While there is a pending fix for LLVM, it'll take some time before it
lands (likely in 22.x). And the kernel build must keep working with
older LLVM toolchains in the foreseeable future.

Using GNU objcopy for .BTF_ids update would work, but it would require
changes to LLVM-based build process, likely breaking existing build
environments as discussed in [2].

To work around llvm-objcopy bug, implement --patch_btfids code path in
resolve_btfids as a drop-in replacement for:

    ${OBJCOPY} --update-section .BTF_ids=${btf_ids} ${elf}

Which works specifically for .BTF_ids section:

    ${RESOLVE_BTFIDS} --patch_btfids ${btf_ids} ${elf}

This feature in resolve_btfids can be removed at some point in the
future, when llvm-objcopy with a relevant bugfix becomes common.

[1] https://lore.kernel.org/bpf/20251219181321.1283664-1-ihor.solodrai@linux.dev/
[2] https://lore.kernel.org/bpf/20251224005752.201911-1-ihor.solodrai@linux.dev/
[3] https://github.com/llvm/llvm-project/issues/168060#issuecomment-3533552952

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251231012558.1699758-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-31 09:04:09 -08:00
Eduard Zingerman
4fd99103ee selftests/bpf: iterator based loop and STACK_MISC states pruning
The test case first initializes 9 stack slots as STACK_MISC,
then conditionally updates each of them to SCALAR spill inside an
iterator based loop. This leads to 2**9 combinations of MISC/SPILL
marks for these slots at the iterator next call.
The loop converges only if the verifier treats such states as
equivalent, otherwise visited states are evicted from the states cache
too quickly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251230-loop-stack-misc-pruning-v1-2-585cfd6cec51@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-31 09:01:13 -08:00
Eduard Zingerman
e6f2612f0e selftests/bpf: test cases for bpf_loop SCC and state graph backedges
Test for state graph backedges accumulation for SCCs formed by
bpf_loop(). Equivalent to the following C program:

  int main(void) {
    1: fp[-8] = bpf_get_prandom_u32();
    2: fp[-16] = -32;                       // used in a memory access below
    3: bpf_loop(7, loop_cb4, fp, 0);
    4: return 0;
  }

  int loop_cb4(int i, void *ctx) {
    5: if (unlikely(ctx[-8] > bpf_get_prandom_u32()))
    6:   *(u64 *)(fp + ctx[-16]) = 42;      // aligned access expected
    7: if (unlikely(fp[-8] > bpf_get_prandom_u32()))
    8:   ctx[-16] = -31;                    // makes said access unaligned
    9: return 0;
  }

If state graph backedges are not accumulated properly at the SCC
formed by loop_cb4() call from bpf_loop(), the state {ctx[-16]=-32}
injected at instruction 9 on verification path 1,2,3,5,7,9,4 would be
considered fully verified and would lack precision mark for ctx[-16].
This would lead to early pruning of verification path 1,2,3,5,7,8,9 in
state {ctx[-16]=-31}, which in turn leads to the incorrect assumption
that the above program is safe.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251229-scc-for-callbacks-v1-2-ceadfe679900@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-30 15:42:42 -08:00
Puranjay Mohan
317a5df78f selftests/bpf: Fix verifier_arena_large/big_alloc3 test
The big_alloc3() test tries to allocate 2051 pages at once in
non-sleepable context and this can fail sporadically on resource
contrained systems, so skip this test in case of such failures.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251230195134.599463-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-30 12:35:23 -08:00
Puranjay Mohan
efecc9e825 selftests: bpf: test non-sleepable arena allocations
As arena kfuncs can now be called from non-sleepable contexts, test this
by adding non-sleepable copies of tests in verifier_arena, this is done
by using a socket program instead of syscall.

Add a new test case in verifier_arena_large to check that the
bpf_arena_alloc_pages() works for more than 1024 pages.
1024 * sizeof(struct page *) is the upper limit of kmalloc_nolock() but
bpf_arena_alloc_pages() should still succeed because it re-uses this
array in a loop.

Augment the arena_list selftest to also run in non-sleepable context by
taking rcu_read_lock.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251222195022.431211-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-23 11:30:00 -08:00
Puranjay Mohan
83dd46ecb6 selftests: bpf: fix tests with raw_tp calling kfuncs
As the previous commit allowed raw_tp programs to call kfuncs, so of the
selftests that were expected to fail will now succeed.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251222133250.1890587-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22 22:23:38 -08:00
JP Kobryn
6bce6ddbe6 bpf: selftests: selftests for memcg stat kfuncs
Add test coverage for the kfuncs that fetch memcg stats. Using some common
stats, test scenarios ensuring that the given stat increases by some
arbitrary amount. The stats selected cover the three categories represented
by the enums: node_stat_item, memcg_stat_item, vm_event_item.

Since only a subset of all stats are queried, use a static struct made up
of fields for each stat. Write to the struct with the fetched values when
the bpf program is invoked and read the fields in the user mode program for
verification.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20251223044156.208250-6-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22 22:20:22 -08:00
Matt Bobrowski
d2749ae85a selftests/bpf: add test case for BPF LSM hook bpf_lsm_mmap_file
Add a trivial test case asserting that the BPF verifier enforces
PTR_MAYBE_NULL semantics on the struct file pointer argument of BPF
LSM hook bpf_lsm_mmap_file().

Dereferencing the struct file pointer passed into bpf_lsm_mmap_file()
without explicitly performing a NULL check first should not be
permitted by the BPF verifier as it can lead to NULL pointer
dereferences and a kernel crash.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20251216133000.3690723-2-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-21 10:56:33 -08:00
Ihor Solodrai
522397d05e resolve_btfids: Change in-place update with raw binary output
Currently resolve_btfids updates .BTF_ids section of an ELF file
in-place, based on the contents of provided BTF, usually within the
same input file, and optionally a BTF base.

Change resolve_btfids behavior to enable BTF transformations as part
of its main operation. To achieve this, in-place ELF write in
resolve_btfids is replaced with generation of the following binaries:
  * ${1}.BTF with .BTF section data
  * ${1}.BTF_ids with .BTF_ids section data if it existed in ${1}
  * ${1}.BTF.base with .BTF.base section data for out-of-tree modules

The execution of resolve_btfids and consumption of its output is
orchestrated by scripts/gen-btf.sh introduced in this patch.

The motivation for emitting binary data is that it allows simplifying
resolve_btfids implementation by delegating ELF update to the $OBJCOPY
tool [1], which is already widely used across the codebase.

There are two distinct paths for BTF generation and resolve_btfids
application in the kernel build: for vmlinux and for kernel modules.

For the vmlinux binary a .BTF section is added in a roundabout way to
ensure correct linking. The patch doesn't change this approach, only
the implementation is a little different.

Before this patch it worked as follows:

  * pahole consumed .tmp_vmlinux1 [2] and added .BTF section with
    llvm-objcopy [3] to it
  * then everything except the .BTF section was stripped from .tmp_vmlinux1
    into a .tmp_vmlinux1.bpf.o object [2], later linked into vmlinux
  * resolve_btfids was executed later on vmlinux.unstripped [4],
    updating it in-place

After this patch gen-btf.sh implements the following:

  * pahole consumes .tmp_vmlinux1 and produces a *detached* file with
    raw BTF data
  * resolve_btfids consumes .tmp_vmlinux1 and detached BTF to produce
    (potentially modified) .BTF, and .BTF_ids sections data
  * a .tmp_vmlinux1.bpf.o object is then produced with objcopy copying
    BTF output of resolve_btfids
  * .BTF_ids data gets embedded into vmlinux.unstripped in
    link-vmlinux.sh by objcopy --update-section

For kernel modules, creating a special .bpf.o file is not necessary,
and so embedding of sections data produced by resolve_btfids is
straightforward with objcopy.

With this patch an ELF file becomes effectively read-only within
resolve_btfids, which allows deleting elf_update() call and satellite
code (like compressed_section_fix [5]).

Endianness handling of .BTF_ids data is also changed. Previously the
"flags" part of the section was bswapped in sets_patch() [6], and then
Elf_Type was modified before elf_update() to signal to libelf that
bswap may be necessary. With this patch we explicitly bswap entire
data buffer on load and on dump.

[1] https://lore.kernel.org/bpf/131b4190-9c49-4f79-a99d-c00fac97fa44@linux.dev/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n110
[3] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/btf_encoder.c?h=v1.31#n1803
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n284
[5] https://lore.kernel.org/bpf/20200819092342.259004-1-jolsa@kernel.org/
[6] https://lore.kernel.org/bpf/cover.1707223196.git.vmalik@redhat.com/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev
2025-12-19 10:55:40 -08:00
Ihor Solodrai
014e1cdb5f selftests/bpf: Run resolve_btfids only for relevant .test.o objects
A selftest targeting resolve_btfids functionality relies on a resolved
.BTF_ids section to be available in the TRUNNER_BINARY. The underlying
BTF data is taken from a special BPF program (btf_data.c), and so
resolve_btfids is executed as a part of a TRUNNER_BINARY build recipe
on the final binary.

Subsequent patches in this series allow resolve_btfids to modify BTF
before resolving the symbols, which means that the test needs access
to that modified BTF [1]. Currently the test simply reads in
btf_data.bpf.o on the assumption that BTF hasn't changed.

Implement resolve_btfids call only for particular test objects (just
resolve_btfids.test.o for now). The test objects are linked into the
TRUNNER_BINARY, and so .BTF_ids section will be available there.

This will make it trivial for the resolve_btfids test to access BTF
modified by resolve_btfids.

[1] https://lore.kernel.org/bpf/CAErzpmvsgSDe-QcWH8SFFErL6y3p3zrqNri5-UHJ9iK2ChyiBw@mail.gmail.com/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-2-ihor.solodrai@linux.dev
2025-12-19 10:55:40 -08:00
Alexei Starovoitov
ec439c3801 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.19-rc1
Cross-merge BPF and other fixes after downstream PR.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-16 21:29:38 -08:00
Linus Torvalds
ea1013c153 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmlCBmwACgkQ6rmadz2v
 bToUZA//ZY0IE1x1nCixEAqGF/nGpDzVX4YQQfjrUoXQOD4ykzt35yTNXl6B1IVA
 dliVSI6kUtdoThUa7xJUxMSkDsVBsEMT/zYXQEXJG1zXvJANCB9wTzsC3OCBWbXt
 BRczcEkq0OHC9/l5CrILR6ocwxKGDIMIysfeOSABgfqckSEhylWy3+EWZQCk08ka
 gNpXlDJUG7dYpcZD/zhuC7e5Rg1uNvN7WiTv+Biig8xZCsEtYOq+qC5C/sOnsypI
 nqfECfbx48cVl49SjatdgquuHn/INESdLRCHisshkurA2Mp5PQuCmrwlXbv4JG59
 v9b7lsFQlkpvEXMdo9VYe6K2gjfkOPRdWsVPu2oXA1qISRmrDqX8cKOpapUIwRhL
 p3ASruMOnz0KFqVaET8+5u2SwtALeW+c+1p1aHMfVGF/qbXuyG05qBkLoGFJR+Xr
 WznXUXY80Z7pjD57SpA6U3DigAkGqKCBXUwdifaOq8HQonwsnQGqkW/3NngNULGP
 IC4u0JXn61VgQsM/kAw+ucc4bdKI0g4oKJR56lT48elrj6Yxrjpde4oOqzZ0IQKu
 VQ0YnzWqqT2tjh4YNMOwkNPbFR4ALd329zI6TUkWib/jByEBNcfjSj9BRANd1KSx
 JgSHAE6agrbl6h3nOx584YCasX3Zq+nfv1Sj4Z/5GaHKKW3q/Vw=
 =wHLt
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix BPF builds due to -fms-extensions. selftests (Alexei
   Starovoitov), bpftool (Quentin Monnet).

 - Fix build of net/smc when CONFIG_BPF_SYSCALL=y, but CONFIG_BPF_JIT=n
   (Geert Uytterhoeven)

 - Fix livepatch/BPF interaction and support reliable unwinding through
   BPF stack frames (Josh Poimboeuf)

 - Do not audit capability check in arm64 JIT (Ondrej Mosnacek)

 - Fix truncated dmabuf BPF iterator reads (T.J. Mercier)

 - Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu)

 - Fix warnings in libbpf when built with -Wdiscarded-qualifiers under
   C23 (Mikhail Gavrilov)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: add regression test for bpf_d_path()
  bpf: Fix verifier assumptions of bpf_d_path's output buffer
  selftests/bpf: Add test for truncated dmabuf_iter reads
  bpf: Fix truncated dmabuf iterator reads
  x86/unwind/orc: Support reliable unwinding through BPF stack frames
  bpf: Add bpf_has_frame_pointer()
  bpf, arm64: Do not audit capability check in do_jit()
  libbpf: Fix -Wdiscarded-qualifiers under C23
  bpftool: Fix build warnings due to MS extensions
  net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT
  selftests/bpf: Add -fms-extensions to bpf build flags
2025-12-17 15:54:58 +12:00
Emil Tsalapatis
19f12431b6 selftests/bpf: Add tests for the arena offset of globals
Add tests for the new libbpf globals arena offset logic. The
tests cover the case of globals being as large as the arena
itself, and being smaller than the arena. In that case, the
data is placed at the end of the arena, and the beginning
of the arena is free.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251216173325.98465-6-emil@etsalapatis.com
2025-12-16 10:42:55 -08:00
Emil Tsalapatis
c1f61171d4 libbpf: Move arena globals to the end of the arena
Arena globals are currently placed at the beginning of the arena
by libbpf. This is convenient, but prevents users from reserving
guard pages in the beginning of the arena to identify NULL pointer
dereferences. Adjust the load logic to place the globals at the
end of the arena instead.

Also modify bpftool to set the arena pointer in the program's BPF
skeleton to point to the globals. Users now call bpf_map__initial_value()
to find the beginning of the arena mapping and use the arena pointer
in the skeleton to determine which part of the mapping holds the
arena globals and which part is free.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251216173325.98465-5-emil@etsalapatis.com
2025-12-16 10:42:55 -08:00
Emil Tsalapatis
12a1fe6e12 bpf/verifier: Do not limit maximum direct offset into arena map
The verifier currently limits direct offsets into a map to 512MiB
to avoid overflow during pointer arithmetic. However, this prevents
arena maps from using direct addressing instructions to access data
at the end of > 512MiB arena maps. This is necessary when moving
arena globals to the end of the arena instead of the front.

Refactor the verifier code to remove the offset calculation during
direct value access calculations. This is possible because the only
two map types that implement .map_direct_value_addr() are arrays and
arenas, and they both do their own internal checks to ensure the
offset is within bounds.

Adjust selftests that expect the old error. These tests still fail
because the verifier identifies the access as out of bounds for the
map, so change them to expect an "invalid access to map value pointer"
error instead.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251216173325.98465-3-emil@etsalapatis.com
2025-12-16 10:42:55 -08:00
Emil Tsalapatis
0355911ac0 selftests/bpf: Explicitly account for globals in verifier_arena_large
The big_alloc1 test in verifier_arena_large assumes that the arena base
and the first page allocated by bpf_arena_alloc_pages are identical.
This is not the case, because the first page in the arena is populated
by global arena data. The test still passes because the code makes the
tacit assumption that the first page is on offset PAGE_SIZE instead of
0.

Make this distinction explicit in the code, and adjust the page offsets
requested during the test to count from the beginning of the arena
instead of using the address of the first allocated page.

Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251216173325.98465-2-emil@etsalapatis.com
2025-12-16 10:42:55 -08:00
Shuran Liu
79e247d660 selftests/bpf: add regression test for bpf_d_path()
Add a regression test for bpf_d_path() to cover incorrect verifier
assumptions caused by an incorrect function prototype. The test
attaches to the fallocate hook, calls bpf_d_path() and verifies that
a simple prefix comparison on the returned pathname behaves correctly
after the fix in patch 1. It ensures the verifier does not assume
the buffer remains unwritten.

Co-developed-by: Zesen Liu <ftyg@live.com>
Signed-off-by: Zesen Liu <ftyg@live.com>
Co-developed-by: Peili Gao <gplhust955@gmail.com>
Signed-off-by: Peili Gao <gplhust955@gmail.com>
Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com>
Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com>
Signed-off-by: Shuran Liu <electronlsr@gmail.com>
Link: https://lore.kernel.org/r/20251206141210.3148-3-electronlsr@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10 01:36:26 -08:00
Cupertino Miranda
a5b4867fad selftests/bpf: add verifier sign extension bound computation tests.
This commit adds 3 tests to verify a common compiler generated
pattern for sign extension (r1 <<= 32; r1 s>>= 32).
The tests make sure the register bounds are correctly computed both for
positive and negative register values.

Signed-off-by: Cupertino Miranda  <cupertino.miranda@oracle.com>
Signed-off-by: Andrew Pinski  <andrew.pinski@oss.qualcomm.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: David Faust  <david.faust@oracle.com>
Cc: Jose Marchesi  <jose.marchesi@oracle.com>
Cc: Elena Zannoni  <elena.zannoni@oracle.com>
Link: https://lore.kernel.org/r/20251202180220.11128-3-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10 00:12:09 -08:00
Kohei Enju
18352f8fae selftests/bpf: add tests for attaching invalid fd
Add test cases for situations where adding the following types of file
descriptors to a cpumap entry should fail:
- Non-BPF file descriptor (expect -EINVAL)
- Nonexistent file descriptor (expect -EBADF)

Also tighten the assertion for the expected error when adding a
non-BPF_XDP_CPUMAP program to a cpumap entry.

Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20251208131449.73036-3-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-09 23:53:27 -08:00
T.J. Mercier
9489d457d4 selftests/bpf: Add test for truncated dmabuf_iter reads
If many dmabufs are present, reads of the dmabuf iterator can be
truncated at PAGE_SIZE or user buffer size boundaries before the fix in
"bpf: Fix truncated dmabuf iterator reads". Add a test to
confirm truncation does not occur.

Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20251204000348.1413593-2-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-09 23:49:04 -08:00
H. Peter Anvin
c278d72b99 selftests/bpf: replace "__auto_type" with "auto"
Replace instances of "__auto_type" with "auto" in:

	tools/testing/selftests/bpf/prog_tests/socket_helpers.h

This file does not seem to be including <linux/compiler_types.h>
directly or indirectly, so copy the definition but guard it with
!defined(auto).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-12-08 15:32:15 -08:00
Linus Torvalds
509d3f4584 Significant patch series in this pull request:
- The 6 patch series "panic: sys_info: Refactor and fix a potential
   issue" from Andy Shevchenko fixes a build issue and does some cleanup in
   ib/sys_info.c.
 
 - The 9 patch series "Implement mul_u64_u64_div_u64_roundup()" from
   David Laight enhances the 64-bit math code on behalf of a PWM driver and
   beefs up the test module for these library functions.
 
 - The 2 patch series "scripts/gdb/symbols: make BPF debug info available
   to GDB" from Ilya Leoshkevich makes BPF symbol names, sizes, and line
   numbers available to the GDB debugger.
 
 - The 4 patch series "Enable hung_task and lockup cases to dump system
   info on demand" from Feng Tang adds a sysctl which can be used to cause
   additional info dumping when the hung-task and lockup detectors fire.
 
 - The 6 patch series "lib/base64: add generic encoder/decoder, migrate
   users" from Kuan-Wei Chiu adds a general base64 encoder/decoder to lib/
   and migrates several users away from their private implementations.
 
 - The 2 patch series "rbree: inline rb_first() and rb_last()" from Eric
   Dumazet makes TCP a little faster.
 
 - The 9 patch series "liveupdate: Rework KHO for in-kernel users" from
   Pasha Tatashin reworks the KEXEC Handover interfaces in preparation for
   Live Update Orchestrator (LUO), and possibly for other future clients.
 
 - The 13 patch series "kho: simplify state machine and enable dynamic
   updates" from Pasha Tatashin increases the flexibility of KEXEC
   Handover.  Also preparation for LUO.
 
 - The 18 patch series "Live Update Orchestrator" from Pasha Tatashin is
   a major new feature targeted at cloud environments.  Quoting the [0/N]:
 
     This series introduces the Live Update Orchestrator, a kernel subsystem
     designed to facilitate live kernel updates using a kexec-based reboot.
     This capability is critical for cloud environments, allowing hypervisors
     to be updated with minimal downtime for running virtual machines.  LUO
     achieves this by preserving the state of selected resources, such as
     memory, devices and their dependencies, across the kernel transition.
 
     As a key feature, this series includes support for preserving memfd file
     descriptors, which allows critical in-memory data, such as guest RAM or
     any other large memory region, to be maintained in RAM across the kexec
     reboot.
 
   Mike Rappaport merits a mention here, for his extensive review and
   testing work.
 
 - The 3 patch series "kexec: reorganize kexec and kdump sysfs" from
   Sourabh Jain moves the kexec and kdump sysfs entries from /sys/kernel/
   to /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day.
 
 - The 2 patch series "kho: fixes for vmalloc restoration" from Mike
   Rapoport fixes a BUG which was being hit during KHO restoration of
   vmalloc() regions.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTSAkQAKCRDdBJ7gKXxA
 jrkiAP9QKfsRv46XZaM5raScjY1ayjP+gqb2rgt6BQ/gZvb2+wD/cPAYOR6BiX52
 n0pVpQmG5P/KyOmpLztn96ejL4heKwQ=
 =JY96
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
   fixes a build issue and does some cleanup in ib/sys_info.c

 - "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
   enhances the 64-bit math code on behalf of a PWM driver and beefs up
   the test module for these library functions

 - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
   makes BPF symbol names, sizes, and line numbers available to the GDB
   debugger

 - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
   adds a sysctl which can be used to cause additional info dumping when
   the hung-task and lockup detectors fire

 - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
   adds a general base64 encoder/decoder to lib/ and migrates several
   users away from their private implementations

 - "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
   makes TCP a little faster

 - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
   reworks the KEXEC Handover interfaces in preparation for Live Update
   Orchestrator (LUO), and possibly for other future clients

 - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
   increases the flexibility of KEXEC Handover. Also preparation for LUO

 - "Live Update Orchestrator" (Pasha Tatashin)
   is a major new feature targeted at cloud environments. Quoting the
   cover letter:

      This series introduces the Live Update Orchestrator, a kernel
      subsystem designed to facilitate live kernel updates using a
      kexec-based reboot. This capability is critical for cloud
      environments, allowing hypervisors to be updated with minimal
      downtime for running virtual machines. LUO achieves this by
      preserving the state of selected resources, such as memory,
      devices and their dependencies, across the kernel transition.

      As a key feature, this series includes support for preserving
      memfd file descriptors, which allows critical in-memory data, such
      as guest RAM or any other large memory region, to be maintained in
      RAM across the kexec reboot.

   Mike Rappaport merits a mention here, for his extensive review and
   testing work.

 - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
   moves the kexec and kdump sysfs entries from /sys/kernel/ to
   /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day

 - "kho: fixes for vmalloc restoration" (Mike Rapoport)
   fixes a BUG which was being hit during KHO restoration of vmalloc()
   regions

* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
  calibrate: update header inclusion
  Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
  vmcoreinfo: track and log recoverable hardware errors
  kho: fix restoring of contiguous ranges of order-0 pages
  kho: kho_restore_vmalloc: fix initialization of pages array
  MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
  init: replace simple_strtoul with kstrtoul to improve lpj_setup
  KHO: fix boot failure due to kmemleak access to non-PRESENT pages
  Documentation/ABI: new kexec and kdump sysfs interface
  Documentation/ABI: mark old kexec sysfs deprecated
  kexec: move sysfs entries to /sys/kernel/kexec
  test_kho: always print restore status
  kho: free chunks using free_page() instead of kfree()
  selftests/liveupdate: add kexec test for multiple and empty sessions
  selftests/liveupdate: add simple kexec-based selftest for LUO
  selftests/liveupdate: add userspace API selftests
  docs: add documentation for memfd preservation via LUO
  mm: memfd_luo: allow preserving memfd
  liveupdate: luo_file: add private argument to store runtime state
  mm: shmem: export some functions to internal.h
  ...
2025-12-06 14:01:20 -08:00
Amery Hung
0e841d1926 selftests/bpf: Test getting associated struct_ops in timer callback
Make sure 1) a timer callback can also reference the associated
struct_ops, and then make sure 2) the timer callback cannot get a
dangled pointer to the struct_ops when the map is freed.

The test schedules a timer callback from a struct_ops program since
struct_ops programs do not pin the map. It is possible for the timer
callback to run after the map is freed. The timer callback calls a
kfunc that runs .test_1() of the associated struct_ops, which should
return MAP_MAGIC when the map is still alive or -1 when the map is
gone.

The first subtest added in this patch schedules the timer callback to
run immediately, while the map is still alive. The second subtest added
schedules the callback to run 500ms after syscall_prog runs and then
frees the map right after syscall_prog runs. Both subtests then wait
until the callback runs to check the return of the kfunc.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-7-ameryhung@gmail.com
2025-12-05 16:17:58 -08:00
Amery Hung
04fd12df4e selftests/bpf: Test ambiguous associated struct_ops
Add a test to make sure implicit struct_ops association does not
break backward compatibility nor return incorrect struct_ops.
struct_ops programs should still be allowed to be reused in
different struct_ops map. The associated struct_ops map set implicitly
however will be poisoned. Trying to read it through the helper
bpf_prog_get_assoc_struct_ops() should result in a NULL pointer.

While recursion of test_1() cannot happen due to the associated
struct_ops being ambiguois, explicitly check for it to prevent stack
overflow if the test regresses.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-6-ameryhung@gmail.com
2025-12-05 16:17:58 -08:00
Amery Hung
33a165f9c2 selftests/bpf: Test BPF_PROG_ASSOC_STRUCT_OPS command
Test BPF_PROG_ASSOC_STRUCT_OPS command that associates a BPF program
with a struct_ops. The test follows the same logic in commit
ba7000f1c3 ("selftests/bpf: Test multi_st_ops and calling kfuncs from
different programs"), but instead of using map id to identify a specific
struct_ops, this test uses the new BPF command to associate a struct_ops
with a program.

The test consists of two sets of almost identical struct_ops maps and BPF
programs associated with the map. Their only difference is the unique
value returned by bpf_testmod_multi_st_ops::test_1().

The test first loads the programs and associates them with struct_ops
maps. Then, it exercises the BPF programs. They will in turn call kfunc
bpf_kfunc_multi_st_ops_test_1_prog_arg() to trigger test_1() of the
associated struct_ops map, and then check if the right unique value is
returned.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251203233748.668365-5-ameryhung@gmail.com
2025-12-05 16:17:58 -08:00
Alexei Starovoitov
835a507535 selftests/bpf: Add -fms-extensions to bpf build flags
The kernel is now built with -fms-extensions, therefore
generated vmlinux.h contains types like:
struct slab {
   ..
   struct freelist_counters;
};

Use -fms-extensions and -Wno-microsoft-anon-tag flags
to build bpf programs that #include "vmlinux.h"

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-03 20:20:11 -08:00
Linus Torvalds
8f7aa3d3c7 Networking changes for 6.19.
Core & protocols
 ----------------
 
  - Replace busylock at the Tx queuing layer with a lockless list. Resulting
    in a 300% (4x) improvement on heavy TX workloads, sending twice the
    number of packets per second, for half the cpu cycles.
 
  - Allow constantly busy flows to migrate to a more suitable CPU/NIC
    queue. Normally we perform queue re-selection when flow comes out
    of idle, but under extreme circumstances the flows may be constantly
    busy. Add sysctl to allow periodic rehashing even if it'd risk packet
    reordering.
 
  - Optimize the NAPI skb cache, make it larger, use it in more paths.
 
  - Attempt returning Tx skbs to the originating CPU (like we already did
    for Rx skbs).
 
  - Various data structure layout and prefetch optimizations from Eric.
 
  - Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly
    quite expensive on recent AMD machines.
 
  - Extend threaded NAPI polling to allow the kthread busy poll for packets.
 
  - Make MPTCP use Rx backlog processing. This lowers the lock pressure,
    improving the Rx performance.
 
  - Support memcg accounting of MPTCP socket memory.
 
  - Allow admin to opt sockets out of global protocol memory accounting
    (using a sysctl or BPF-based policy). The global limits are a poor fit
    for modern container workloads, where limits are imposed using cgroups.
 
  - Improve heuristics for when to kick off AF_UNIX garbage collection.
 
  - Allow users to control TCP SACK compression, and default to 33% of RTT.
 
  - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily
    aggressive rcvbuf growth and overshot when the connection RTT is low.
 
  - Preserve skb metadata space across skb_push / skb_pull operations.
 
  - Support for IPIP encapsulation in the nftables flowtable offload.
 
  - Support appending IP interface information to ICMP messages (RFC 5837).
 
  - Support setting max record size in TLS (RFC 8449).
 
  - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
 
  - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
 
  - Let users configure the number of write buffers in SMC.
 
  - Add new struct sockaddr_unsized for sockaddr of unknown length,
    from Kees.
 
  - Some conversions away from the crypto_ahash API, from Eric Biggers.
 
  - Some preparations for slimming down struct page.
 
  - YAML Netlink protocol spec for WireGuard.
 
  - Add a tool on top of YAML Netlink specs/lib for reporting commonly
    computed derived statistics and summarized system state.
 
 Driver API
 ----------
 
  - Add CAN XL support to the CAN Netlink interface.
 
  - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics,
    as defined by the OPEN Alliance's "Advanced diagnostic features
    for 100BASE-T1 automotive Ethernet PHYs" specification.
 
  - Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x).
 
  - Refactor xfrm_input lock to reduce contention when NIC offloads IPsec
    and performs RSS.
 
  - Add info to devlink params whether the current setting is the default
    or a user override. Allow resetting back to default.
 
  - Add standard device stats for PSP crypto offload.
 
  - Leverage DSA frame broadcast to implement simple HSR frame duplication
    for a lot of switches without dedicated HSR offload.
 
  - Add uAPI defines for 1.6Tbps link modes.
 
 Device drivers
 --------------
 
  - Add Motorcomm YT921x gigabit Ethernet switch support.
 
  - Add MUCSE driver for N500/N210 1GbE NIC series.
 
  - Convert drivers to support dedicated ops for timestamping control,
    and away from the direct IOCTL handling. While at it support GET
    operations for PHY timestamping.
 
  - Add (and convert most drivers to) a dedicated ethtool callback
    for reading the Rx ring count.
 
  - Significant refactoring efforts in the STMMAC driver, which supports
    Synopsys turn-key MAC IP integrated into a ton of SoCs.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - support PPS in/out on all pins
    - Intel (100G, ice, idpf):
      - ice: implement standard ethtool and timestamping stats
      - i40e: support setting the max number of MAC addresses per VF
      - iavf: support RSS of GTP tunnels for 5G and LTE deployments
    - nVidia/Mellanox (mlx5):
      - reduce downtime on interface reconfiguration
      - disable being an XDP redirect target by default (same as other
        drivers) to avoid wasting resources if feature is unused
    - Meta (fbnic):
      - add support for Linux-managed PCS on 25G, 50G, and 100G links
    - Wangxun:
      - support Rx descriptor merge, and Tx head writeback
      - support Rx coalescing offload
      - support 25G SPF and 40G QSFP modules
 
  - Ethernet virtual:
    - Google (gve):
      - allow ethtool to configure rx_buf_len
      - implement XDP HW RX Timestamping support for DQ descriptor format
    - Microsoft vNIC (mana):
      - support HW link state events
      - handle hardware recovery events when probing the device
 
  - Ethernet NICs consumer, and embedded:
    - usbnet: add support for Byte Queue Limits (BQL)
    - AMD (amd-xgbe):
      - add device selftests
    - NXP (enetc):
      - add i.MX94 support
    - Broadcom integrated MACs (bcmgenet, bcmasp):
      - bcmasp: add support for PHY-based Wake-on-LAN
    - Broadcom switches (b53):
      - support port isolation
      - support BCM5389/97/98 and BCM63XX ARL formats
    - Lantiq/MaxLinear switches:
      - support bridge FDB entries on the CPU port
      - use regmap for register access
      - allow user to enable/disable learning
      - support Energy Efficient Ethernet
      - support configuring RMII clock delays
      - add tagging driver for MaxLinear GSW1xx switches
    - Synopsys (stmmac):
      - support using the HW clock in free running mode
      - add Eswin EIC7700 support
      - add Rockchip RK3506 support
      - add Altera Agilex5 support
    - Cadence (macb):
      - cleanup and consolidate descriptor and DMA address handling
      - add EyeQ5 support
    - TI:
      - icssg-prueth: support AF_XDP
    - Airoha access points:
      - add missing Ethernet stats and link state callback
      - add AN7583 support
      - support out-of-order Tx completion processing
    - Power over Ethernet:
      - pd692x0: preserve PSE configuration across reboots
      - add support for TPS23881B devices
 
  - Ethernet PHYs:
    - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
    - Support 50G SerDes and 100G interfaces in Linux-managed PHYs
    - micrel:
      - support for non PTP SKUs of lan8814
      - enable in-band auto-negotiation on lan8814
    - realtek:
      - cable testing support on RTL8224
      - interrupt support on RTL8221B
    - motorcomm: support for PHY LEDs on YT853
    - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
    - mscc: support for PHY LED control
 
  - CAN drivers:
    - m_can: add support for optional reset and system wake up
    - remove can_change_mtu() obsoleted by core handling
    - mcp251xfd: support GPIO controller functionality
 
  - Bluetooth:
    - add initial support for PASTa
 
  - WiFi:
    - split ieee80211.h file, it's way too big
    - improvements in VHT radiotap reporting, S1G, Channel Switch
      Announcement handling, rate tracking in mesh networks
    - improve multi-radio monitor mode support, and add a cfg80211 debugfs
      interface for it
    - HT action frame handling on 6 GHz
    - initial chanctx work towards NAN
    - MU-MIMO sniffer improvements
 
  - WiFi drivers:
    - RealTek (rtw89):
      - support USB devices RTL8852AU and RTL8852CU
      - initial work for RTL8922DE
      - improved injection support
    - Intel:
      - iwlwifi: new sniffer API support
    - MediaTek (mt76):
      - WED support for >32-bit DMA
      - airoha NPU support
      - regdomain improvements
      - continued WiFi7/MLO work
    - Qualcomm/Atheros:
      - ath10k: factory test support
      - ath11k: TX power insertion support
      - ath12k: BSS color change support
      - ath12k: statistics improvements
    - brcmfmac: Acer A1 840 tablet quirk
    - rtl8xxxu: 40 MHz connection fixes/support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkveRQACgkQMUZtbf5S
 IrvY7A/+Nb0o4BxLHjPkAl1m3t3q2d0Y29B7SNkwnwEtxAV8EkNeZ3GWrdtDnTQY
 MYhmc7LEzvz8/lihapr7UJkcokzSASUV54hbez5jDBKC8EEoyUk8FdWDPerwlcRI
 zmCFNAVFyh9GX8i7wcrzKbDTHT5+GZLbSlGl9U5mhLsDdRlJgH7d8PJ7vWcmtLFY
 XN0paDyaeHfCl8wReWNAYx4C/I0ODOvlscpO0tnAKhB0ngJbQCKY2t6tn3rOYdif
 ZSQ5KwVRnJtQ4fYOFMOy9+FSCjVXtyrxF8KLxD+mqom2ZhmO00UpOMl09tqhq3uT
 WnvwoHUVBt6F+iITHwg5kMgIDPUq1kpUvL4S4UbVSuUm9ZKD+4KRU2ZHRBYMx+MU
 bsqmtY8/IULClUoRz+tZhltA8eb0NEqNZE2JPOFDiJHn1YiCCkFwxibhir893oM3
 sB7x65D7LQI2ty2BBGVGYnwYDPtyaxOA/s3WTwPvLEi3+Y/TGNIIrS9lBLA4U+Yr
 Gi93WQGVjttMmVyaHgXBUGmi3L52hvolm0AZ8zSRGrnIEpecjhly2KfYuaOzuxXC
 IHEQ6AFLdRh6JzafXGb/mQwGCHNmhwsY8A49i94fakWQamaL/L6A+1dyPu4LXMqi
 NwqCmlVb/LKGlfNG+V4wT27srJ+yBA2Vk3tpR1sZQQytFh0LKHI=
 =UoDR
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Replace busylock at the Tx queuing layer with a lockless list.

     Resulting in a 300% (4x) improvement on heavy TX workloads, sending
     twice the number of packets per second, for half the cpu cycles.

   - Allow constantly busy flows to migrate to a more suitable CPU/NIC
     queue.

     Normally we perform queue re-selection when flow comes out of idle,
     but under extreme circumstances the flows may be constantly busy.

     Add sysctl to allow periodic rehashing even if it'd risk packet
     reordering.

   - Optimize the NAPI skb cache, make it larger, use it in more paths.

   - Attempt returning Tx skbs to the originating CPU (like we already
     did for Rx skbs).

   - Various data structure layout and prefetch optimizations from Eric.

   - Remove ktime_get() from the recvmsg() fast path, ktime_get() is
     sadly quite expensive on recent AMD machines.

   - Extend threaded NAPI polling to allow the kthread busy poll for
     packets.

   - Make MPTCP use Rx backlog processing. This lowers the lock
     pressure, improving the Rx performance.

   - Support memcg accounting of MPTCP socket memory.

   - Allow admin to opt sockets out of global protocol memory accounting
     (using a sysctl or BPF-based policy). The global limits are a poor
     fit for modern container workloads, where limits are imposed using
     cgroups.

   - Improve heuristics for when to kick off AF_UNIX garbage collection.

   - Allow users to control TCP SACK compression, and default to 33% of
     RTT.

   - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
     unnecessarily aggressive rcvbuf growth and overshot when the
     connection RTT is low.

   - Preserve skb metadata space across skb_push / skb_pull operations.

   - Support for IPIP encapsulation in the nftables flowtable offload.

   - Support appending IP interface information to ICMP messages (RFC
     5837).

   - Support setting max record size in TLS (RFC 8449).

   - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.

   - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.

   - Let users configure the number of write buffers in SMC.

   - Add new struct sockaddr_unsized for sockaddr of unknown length,
     from Kees.

   - Some conversions away from the crypto_ahash API, from Eric Biggers.

   - Some preparations for slimming down struct page.

   - YAML Netlink protocol spec for WireGuard.

   - Add a tool on top of YAML Netlink specs/lib for reporting commonly
     computed derived statistics and summarized system state.

  Driver API:

   - Add CAN XL support to the CAN Netlink interface.

   - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
     defined by the OPEN Alliance's "Advanced diagnostic features for
     100BASE-T1 automotive Ethernet PHYs" specification.

   - Add DPLL phase-adjust-gran pin attribute (and implement it in
     zl3073x).

   - Refactor xfrm_input lock to reduce contention when NIC offloads
     IPsec and performs RSS.

   - Add info to devlink params whether the current setting is the
     default or a user override. Allow resetting back to default.

   - Add standard device stats for PSP crypto offload.

   - Leverage DSA frame broadcast to implement simple HSR frame
     duplication for a lot of switches without dedicated HSR offload.

   - Add uAPI defines for 1.6Tbps link modes.

  Device drivers:

   - Add Motorcomm YT921x gigabit Ethernet switch support.

   - Add MUCSE driver for N500/N210 1GbE NIC series.

   - Convert drivers to support dedicated ops for timestamping control,
     and away from the direct IOCTL handling. While at it support GET
     operations for PHY timestamping.

   - Add (and convert most drivers to) a dedicated ethtool callback for
     reading the Rx ring count.

   - Significant refactoring efforts in the STMMAC driver, which
     supports Synopsys turn-key MAC IP integrated into a ton of SoCs.

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support PPS in/out on all pins
      - Intel (100G, ice, idpf):
         - ice: implement standard ethtool and timestamping stats
         - i40e: support setting the max number of MAC addresses per VF
         - iavf: support RSS of GTP tunnels for 5G and LTE deployments
      - nVidia/Mellanox (mlx5):
         - reduce downtime on interface reconfiguration
         - disable being an XDP redirect target by default (same as
           other drivers) to avoid wasting resources if feature is
           unused
      - Meta (fbnic):
         - add support for Linux-managed PCS on 25G, 50G, and 100G links
      - Wangxun:
         - support Rx descriptor merge, and Tx head writeback
         - support Rx coalescing offload
         - support 25G SPF and 40G QSFP modules

   - Ethernet virtual:
      - Google (gve):
         - allow ethtool to configure rx_buf_len
         - implement XDP HW RX Timestamping support for DQ descriptor
           format
      - Microsoft vNIC (mana):
         - support HW link state events
         - handle hardware recovery events when probing the device

   - Ethernet NICs consumer, and embedded:
      - usbnet: add support for Byte Queue Limits (BQL)
      - AMD (amd-xgbe):
         - add device selftests
      - NXP (enetc):
         - add i.MX94 support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - bcmasp: add support for PHY-based Wake-on-LAN
      - Broadcom switches (b53):
         - support port isolation
         - support BCM5389/97/98 and BCM63XX ARL formats
      - Lantiq/MaxLinear switches:
         - support bridge FDB entries on the CPU port
         - use regmap for register access
         - allow user to enable/disable learning
         - support Energy Efficient Ethernet
         - support configuring RMII clock delays
         - add tagging driver for MaxLinear GSW1xx switches
      - Synopsys (stmmac):
         - support using the HW clock in free running mode
         - add Eswin EIC7700 support
         - add Rockchip RK3506 support
         - add Altera Agilex5 support
      - Cadence (macb):
         - cleanup and consolidate descriptor and DMA address handling
         - add EyeQ5 support
      - TI:
         - icssg-prueth: support AF_XDP
      - Airoha access points:
         - add missing Ethernet stats and link state callback
         - add AN7583 support
         - support out-of-order Tx completion processing
      - Power over Ethernet:
         - pd692x0: preserve PSE configuration across reboots
         - add support for TPS23881B devices

   - Ethernet PHYs:
      - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
      - Support 50G SerDes and 100G interfaces in Linux-managed PHYs
      - micrel:
         - support for non PTP SKUs of lan8814
         - enable in-band auto-negotiation on lan8814
      - realtek:
         - cable testing support on RTL8224
         - interrupt support on RTL8221B
      - motorcomm: support for PHY LEDs on YT853
      - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
      - mscc: support for PHY LED control

   - CAN drivers:
      - m_can: add support for optional reset and system wake up
      - remove can_change_mtu() obsoleted by core handling
      - mcp251xfd: support GPIO controller functionality

   - Bluetooth:
      - add initial support for PASTa

   - WiFi:
      - split ieee80211.h file, it's way too big
      - improvements in VHT radiotap reporting, S1G, Channel Switch
        Announcement handling, rate tracking in mesh networks
      - improve multi-radio monitor mode support, and add a cfg80211
        debugfs interface for it
      - HT action frame handling on 6 GHz
      - initial chanctx work towards NAN
      - MU-MIMO sniffer improvements

   - WiFi drivers:
      - RealTek (rtw89):
         - support USB devices RTL8852AU and RTL8852CU
         - initial work for RTL8922DE
         - improved injection support
      - Intel:
         - iwlwifi: new sniffer API support
      - MediaTek (mt76):
         - WED support for >32-bit DMA
         - airoha NPU support
         - regdomain improvements
         - continued WiFi7/MLO work
      - Qualcomm/Atheros:
         - ath10k: factory test support
         - ath11k: TX power insertion support
         - ath12k: BSS color change support
         - ath12k: statistics improvements
      - brcmfmac: Acer A1 840 tablet quirk
      - rtl8xxxu: 40 MHz connection fixes/support"

* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
  net: page_pool: sanitise allocation order
  net: page pool: xa init with destroy on pp init
  net/mlx5e: Support XDP target xmit with dummy program
  net/mlx5e: Update XDP features in switch channels
  selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
  net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  wireguard: netlink: generate netlink code
  wireguard: uapi: generate header with ynl-gen
  wireguard: uapi: move flag enums
  wireguard: uapi: move enum wg_cmd
  wireguard: netlink: add YNL specification
  selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
  selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
  selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
  selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
  selftests: drv-net: introduce Iperf3Runner for measurement use cases
  selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
  net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
  Documentation: net: dsa: mention simple HSR offload helpers
  Documentation: net: dsa: mention availability of RedBox
  ...
2025-12-03 17:24:33 -08:00
Alexis Lothoré (eBPF Foundation)
1d17bcce6a selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
test_tc_edt currently defines the target rate in both the userspace and
BPF parts. This value could be defined once in the userspace part if we
make it able to configure the BPF program before starting the test.

Add a target_rate variable in the BPF part, and make the userspace part
set it to the desired rate before attaching the shaping program.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-4-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:37:41 -08:00
Alexis Lothoré (eBPF Foundation)
50ce5ea5f7 selftests/bpf: remove test_tc_edt.sh
Now that test_tc_edt has been integrated in test_progs, remove the
legacy shell script.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-3-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:37:41 -08:00
Alexis Lothoré (eBPF Foundation)
b0f82e7ab6 selftests/bpf: integrate test_tc_edt into test_progs
test_tc_edt.sh uses a pair of veth and a BPF program attached to the TX
veth to shape the traffic to 5MBps. It then checks that the amount of
received bytes (at interface level), compared to the TX duration, indeed
matches 5Mbps.

Convert this test script to the test_progs framework:
- keep the double veth setup, isolated in two veths
- run a small tcp server, and connect client to server
- push a pre-configured amount of bytes, and measure how much time has
  been needed to push those
- ensure that this rate is in a 2% error margin around the target rate

This two percent value, while being tight, is hopefully large enough to
not make the test too flaky in CI, while also turning it into a small
example of BPF-based shaping.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-2-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:37:41 -08:00
Alexis Lothoré (eBPF Foundation)
4b4833acc6 selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
The test_tc_edt BPF program uses a custom section name, which works fine
when manually loading it with tc, but prevents it from being loaded with
libbpf.

Update the program section name to "tc" to be able to manipulate it with
a libbpf-based C test.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-1-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:37:41 -08:00
Kumar Kartikeya Dwivedi
3448375e71 selftests/bpf: Add success stats to rqspinlock stress test
Add stats to observe the success and failure rate of lock acquisition
attempts in various contexts.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251128232802.1031906-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29 09:35:36 -08:00
Hoyeon Lee
bd5bdd200c bpf: Remove runqslower tool
runqslower was added in commit 9c01546d26 "tools/bpf: Add runqslower
tool to tools/bpf" as a BCC port to showcase early BPF CO-RE + libbpf
workflows. runqslower continues to live in BCC (libbpf-tools), so there
is no need to keep building and maintaining it.

Drop tools/bpf/runqslower and remove all build hooks in tools/bpf and
selftests accordingly.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Link: https://lore.kernel.org/r/20251126093821.373291-1-hoyeon.lee@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-28 15:20:16 -08:00
Amery Hung
a3a60cc120 selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
file_alloc_security hook is disabled. Use other LSM hooks in selftests
instead.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20251126202927.2584874-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-28 15:18:28 -08:00
Anton Protopopov
7feff23cdf bpf: force BPF_F_RDONLY_PROG on insn array creation
The original implementation added a hack to check_mem_access()
to prevent programs from writing into insn arrays. To get rid
of this hack, enforce BPF_F_RDONLY_PROG on map creation.

Also fix the corresponding selftest, as the error message changes
with this patch.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251128063224.1305482-2-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-28 15:15:43 -08:00
Bala-Vignesh-Reddy
e6fbd1759c selftests: complete kselftest include centralization
This follow-up patch completes centralization of kselftest.h and
ksefltest_harness.h includes in remaining seltests files, replacing all
relative paths with a non-relative paths using shared -I include path in
lib.mk

Tested with gcc-13.3 and clang-18.1, and cross-compiled successfully on
riscv, arm64, x86_64 and powerpc arch.

[reddybalavignesh9979@gmail.com: add selftests include path for kselftest.h]
  Link: https://lkml.kernel.org/r/20251017090201.317521-1-reddybalavignesh9979@gmail.com
Link: https://lkml.kernel.org/r/20251016104409.68985-1-reddybalavignesh9979@gmail.com
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Günther Noack <gnoack@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mickael Salaun <mic@digikod.net>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27 14:24:31 -08:00
Eric Dumazet
9a5e5334ad tcp: remove icsk->icsk_retransmit_timer
Now sk->sk_timer is no longer used by TCP keepalive, we can use
its storage for TCP and MPTCP retransmit timers for better
cache locality.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Eric Dumazet
08dfe37023 tcp: introduce icsk->icsk_keepalive_timer
sk->sk_timer has been used for TCP keepalives.

Keepalive timers are not in fast path, we want to use sk->sk_timer
storage for retransmit timers, for better cache locality.

Create icsk->icsk_keepalive_timer and change keepalive
code to no longer use sk->sk_timer.

Added space is reclaimed in the following patch.

This includes changes to MPTCP, which was also using sk_timer.

Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer
for inet_sk_diag_fill() sake.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:28:29 -08:00
Kumar Kartikeya Dwivedi
88337b587b selftests/bpf: Make CS length configurable for rqspinlock stress test
Allow users to configure the critical section delay for both task/normal
and NMI contexts, and set to 20ms and 10ms as before by default.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25 15:30:14 -08:00
Kumar Kartikeya Dwivedi
6173c1d620 selftests/bpf: Add lock wait time stats to rqspinlock stress test
Add statistics per-CPU broken down by context and various timing windows
for the time taken to acquire an rqspinlock. Cases where all
acquisitions fit into the 10ms window are skipped from printing,
otherwise the full breakdown is displayed when printing the summary.
This allows capturing precisely the number of times outlier attempts
happened for a given lock in a given context.

A critical detail is that time is captured regardless of success or
failure, which is important to capture events for failed but long
waiting timeout attempts.

Output:

[   64.279459] rqspinlock acquisition latency histogram (ms):
[   64.279472]  cpu1: total 528426 (normal 526559, nmi 1867)
[   64.279477]    0-1ms: total 524697 (normal 524697, nmi 0)
[   64.279480]    2-2ms: total 3652 (normal 1811, nmi 1841)
[   64.279482]    3-3ms: total 66 (normal 47, nmi 19)
[   64.279485]    4-4ms: total 2 (normal 1, nmi 1)
[   64.279487]    5-5ms: total 1 (normal 1, nmi 0)
[   64.279489]    6-6ms: total 1 (normal 0, nmi 1)
[   64.279490]    101-150ms: total 1 (normal 0, nmi 1)
[   64.279492]    >= 251ms: total 6 (normal 2, nmi 4)
...

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25 15:30:14 -08:00
Kumar Kartikeya Dwivedi
224de8d5a3 selftests/bpf: Relax CPU requirements for rqspinlock stress test
Only require 2 CPUs for AA, 3 for ABBA, 4 for ABBCCA, which is
calculated nicely by adding to the mode enum. Enables running single CPU
AA tests.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251125020749.2421610-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25 15:30:13 -08:00
Menglong Dong
f2cb0660ac selftests/bpf: Call bpf_get_numa_node_id() in trigger_count()
The bench test "trig-kernel-count" can be used as a baseline comparison
for fentry and other benchmarks, and the calling to bpf_get_numa_node_id()
should be considered as composition of the baseline. So, let's call it in
trigger_count(). Meanwhile, rename trigger_count() to
trigger_kernel_count() to make it easier understand.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251116014242.151110-1-dongml2@chinatelecom.cn
2025-11-25 14:32:50 -08:00