linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-04 12:35:52 +02:00

Author	SHA1	Message	Date
Amery Hung	bb6d9f5cf1	selftests/bpf: Simplify task_local_data memory allocation Simplify data allocation by always using aligned_alloc() and passing size_pot, size rounded up to the closest power of two to alignment. Currently, aligned_alloc(page_size, size) is only intended to be used with memory allocators that can fulfill the request without rounding size up to page_size to conserve memory. This is enabled by defining TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to the limitation of UPTR where only a page can be pinned to the kernel. Otherwise, malloc(size * 2) is used to allocate memory for data. However, we don't need to call aligned_alloc(page_size, size) to get a contiguous memory of size bytes within a page. aligned_alloc(size_pot, ...) will also do the trick. Therefore, just use aligned_alloc(size_pot, ...) universally. As for the size argument, create a new option, TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size. This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing memory allocators with low overhead aligned_alloc() to not waste memory. To enable this, users need to make sure it is not an undefined behavior for the memory allocator to have size not being an integral multiple of alignment. Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC used to always waste size-byte of memory due to malloc(size * 2). Now the worst case becomes size - 1 and the best case is 0 when the size is already a power of two. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260331213555.1993883-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 15:11:08 -07:00
Amery Hung	7c8ca532a7	selftests/bpf: Fix task_local_data data allocation size Currently, when allocating memory for data, size of tld_data_u->start is not taken into account. This may cause OOB access. Fixed it by adding the non-flexible array part of tld_data_u. Besides, explicitly align tld_data_u->data to 8 bytes in case some fields are added before data in the future. It could break the assumption that every data field is 8 byte aligned and sizeof(tld_data_u) will no longer be equal to offsetof(struct tld_data_u, data), which we use interchangeably. Signed-off-by: Amery Hung <ameryhung@gmail.com> Acked-by: Sun Jian <sun.jian.kdev@gmail.com> Link: https://lore.kernel.org/r/20260331213555.1993883-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 15:11:08 -07:00
Andrii Nakryiko	e8aec1058c	Merge branch 'libbpf-clarify-raw-address-single-kprobe-attach-behavior' Hoyeon Lee says: ==================== libbpf: clarify raw-address single kprobe attach behavior Today libbpf documents single-kprobe attach through func_name, with an optional offset. For the PMU-based path, func_name = NULL with an absolute address in offset already works as well, but that is not described in the API. This patchset clarifies this behavior. First commit fixes kprobe and uprobe attach error handling to use direct error codes. Next adds kprobe API comments for the raw-address form and rejects it explicitly for legacy tracefs/debugfs kprobes. Last adds PERF and LINK selftests for the raw-address form, and checks that LEGACY rejects it. --- Changes in v7: - Change selftest line wrapping and assertions Changes in v6: - Split the kprobe/uprobe direct error-code fix into a separate patch Changes in v5: - Add kprobe API docs, use -EOPNOTSUPP, and switch selftests to LIBBPF_OPTS Changes in v4: - Inline raw-address error formatting and remove the probe_target buffer Changes in v3: - Drop bpf_kprobe_opts.addr and reuse offset when func_name is NULL - Make legacy tracefs/debugfs kprobes reject the raw-address form - Update selftests to cover PERF/LINK raw-address attach and LEGACY reject Changes in v2: - Fix line wrapping and indentation ==================== Link: https://patch.msgid.link/20260401143116.185049-1-hoyeon.lee@suse.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2026-04-02 13:23:19 -07:00
Hoyeon Lee	9d77cefe8f	selftests/bpf: Add test for raw-address single kprobe attach Currently, attach_probe covers manual single-kprobe attaches by func_name, but not the raw-address form that the PMU-based single-kprobe path can accept. This commit adds PERF and LINK raw-address coverage. It resolves SYS_NANOSLEEP_KPROBE_NAME through kallsyms, passes the absolute address in bpf_kprobe_opts.offset with func_name = NULL, and verifies that kprobe and kretprobe are still triggered. It also verifies that LEGACY rejects the same form. Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20260401143116.185049-4-hoyeon.lee@suse.com	2026-04-02 13:23:19 -07:00
Hoyeon Lee	e1621c7528	libbpf: Clarify raw-address single kprobe attach behavior bpf_program__attach_kprobe_opts() documents single-kprobe attach through func_name, with an optional offset. For the PMU-based path, func_name = NULL with an absolute address in offset already works as well, but that is not described in the API. This commit clarifies this existing non-legacy behavior. For PMU-based attach, callers can use func_name = NULL with an absolute address in offset as the raw-address form. For legacy tracefs/debugfs kprobes, reject this form explicitly. Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20260401143116.185049-3-hoyeon.lee@suse.com	2026-04-02 13:23:19 -07:00
Hoyeon Lee	f547cf7947	libbpf: Use direct error codes for kprobe/uprobe attach perf_event_open_probe() and perf_event_{k,u}probe_open_legacy() helpers are returning negative error codes directly on failure. This commit changes bpf_program__attach_{k,u}probe_opts() to use those return values directly instead of re-reading possibly changed errno. Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20260401143116.185049-2-hoyeon.lee@suse.com	2026-04-02 13:23:19 -07:00
Mykyta Yatsenko	1cc96e0e20	libbpf: Fix BTF handling in bpf_program__clone() Align bpf_program__clone() with bpf_object_load_prog() by gating BTF func/line info on FEAT_BTF_FUNC kernel support, and resolve caller-provided prog_btf_fd before checking obj->btf so that callers with their own BTF can use clone() even when the object has no BTF loaded. While at it, treat func_info and line_info fields as atomic groups to prevent mismatches between pointer and count from different sources. Move bpf_program__clone() to libbpf 1.8. Fixes: `970bd2dced` ("libbpf: Introduce bpf_program__clone()") Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260401151640.356419-1-mykyta.yatsenko5@gmail.com	2026-04-02 13:02:46 -07:00
Alexei Starovoitov	e25cfbec08	Merge branch 'bpf-migrate-bpf_task_work-and-file-dynptr-to-kmalloc_nolock' Mykyta Yatsenko says: ==================== bpf: Migrate bpf_task_work and file dynptr to kmalloc_nolock Now that kmalloc can be used from NMI context via kmalloc_nolock(), migrate BPF internal allocations away from bpf_mem_alloc to use the standard slab allocator. Use kfree_rcu() for deferred freeing, which waits for a regular RCU grace period before the memory is reclaimed. Sleepable BPF programs hold rcu_read_lock_trace but not regular rcu_read_lock, so patch 1 adds explicit rcu_read_lock/unlock around the pointer-to-refcount window to prevent kfree_rcu from freeing memory while a sleepable program is still between reading the pointer and acquiring a reference. Patch 1 migrates bpf_task_work_ctx from bpf_mem_alloc/bpf_mem_free to kmalloc_nolock/kfree_rcu. Patch 2 migrates bpf_dynptr_file_impl from bpf_mem_alloc/bpf_mem_free to kmalloc_nolock/kfree. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> --- Changes in v2: - Switch to scoped_guard in patch 1 (Kumar) - Remove rcu gp wait in patch 2 (Kumar) - Defer to irq_work when irqs disabled in patch 1 - use bpf_map_kmalloc_nolock() for bpf_task_work - use kmalloc_nolock() for file dynptr - Link to v1: https://lore.kernel.org/all/20260325-kmalloc_special-v1-0-269666afb1ea@meta.com/ ==================== Link: https://patch.msgid.link/20260330-kmalloc_special-v2-0-c90403f92ff0@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 09:31:49 -07:00
Mykyta Yatsenko	cc878b4144	bpf: Migrate dynptr file to kmalloc_nolock Replace bpf_mem_alloc/bpf_mem_free with kmalloc_nolock/kfree_nolock for bpf_dynptr_file_impl, continuing the migration away from bpf_mem_alloc now that kmalloc can be used from NMI context. freader_cleanup() runs before kfree_nolock() while the dynptr still holds exclusive access, so plain kfree_nolock() is safe — no concurrent readers can access the object. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260330-kmalloc_special-v2-2-c90403f92ff0@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 09:31:42 -07:00
Mykyta Yatsenko	90f51ebff2	bpf: Migrate bpf_task_work to kmalloc_nolock Replace bpf_mem_alloc/bpf_mem_free with kmalloc_nolock/kfree_rcu for bpf_task_work_ctx. Replace guard(rcu_tasks_trace)() with guard(rcu)() in bpf_task_work_irq(). The function only accesses ctx struct members (not map values), so tasks trace protection is not needed - regular RCU is sufficient since ctx is freed via kfree_rcu. The guard in bpf_task_work_callback() remains as tasks trace since it accesses map values from process context. Sleepable BPF programs hold rcu_read_lock_trace but not regular rcu_read_lock. Since kfree_rcu waits for a regular RCU grace period, the ctx memory can be freed while a sleepable program is still running. Add scoped_guard(rcu) around the pointer read and refcount tryget in bpf_task_work_acquire_ctx to close this race window. Since kfree_rcu uses call_rcu internally which is not safe from NMI context, defer destruction via irq_work when irqs are disabled. For the lost-cmpxchg path the ctx was never published, so kfree_nolock is safe. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260330-kmalloc_special-v2-1-c90403f92ff0@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 09:31:42 -07:00
Alexei Starovoitov	f760104402	Merge branch 'bpf-fix-abuse-of-kprobe_write_ctx-via-freplace' Leon Hwang says: ==================== bpf: Fix abuse of kprobe_write_ctx via freplace The potential issue of kprobe_write_ctx+freplace was mentioned in "bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs" [1]. It is true issue, that the test in patch #2 verifies that kprobe_write_ctx=false kprobe progs can be abused to modify struct pt_regs via kprobe_write_ctx=true freplace progs. When struct pt_regs is modified, bpf_prog_test_run_opts() gets -EFAULT instead of 0. test_freplace_kprobe_write_ctx:FAIL:bpf_prog_test_run_opts unexpected error: -14 (errno 14) We will disallow attaching freplace programs on kprobe programs with different kprobe_write_ctx values. Links: [1] https://lore.kernel.org/bpf/CAP01T74w4KVMn9bEwpQXrk+bqcUxzb6VW1SQ_QvNy0A4EY-9Jg@mail.gmail.com/ Changes: v2 -> v3: * Add comment to the rejection of kprobe_write_ctx (per Jiri). * Use libbpf_get_error() instead of errno in test (per Jiri). * Collect Acked-by tags from Jiri and Song, thanks. v2: https://lore.kernel.org/bpf/20260326141718.17731-1-leon.hwang@linux.dev/ v1 -> v2: * Drop patch #1 in v1, as it wasn't an issue (per Toke). * Check kprobe_write_ctx value at attach time instead of at load time, to prevent attaching kprobe_write_ctx=true freplace progs on kprobe_write_ctx=false kprobe progs (per Gemini/sashiko). * Move kprobe_write_ctx test code to attach_probe.c and kprobe_write_ctx.c. v1: https://lore.kernel.org/bpf/20260324150444.68166-1-leon.hwang@linux.dev/ ==================== Link: https://patch.msgid.link/20260331145353.87606-1-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 09:29:49 -07:00
Leon Hwang	da77f3a9aa	selftests/bpf: Add test to verify the fix of kprobe_write_ctx abuse Add a test to verify the issue: kprobe_write_ctx can be abused to modify struct pt_regs of kernel functions via kprobe_write_ctx=true freplace progs. Without the fix, the issue is verified: kprobe_write_ctx=true freplace prog is allowed to attach to kprobe_write_ctx=false kprobe prog. Then, the first arg of bpf_fentry_test1 will be set as 0, and bpf_prog_test_run_opts() gets -EFAULT instead of 0. With the fix, the issue is rejected at attach time. Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260331145353.87606-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 09:29:49 -07:00
Leon Hwang	611fe4b79a	bpf: Fix abuse of kprobe_write_ctx via freplace uprobe programs are allowed to modify struct pt_regs. Since the actual program type of uprobe is KPROBE, it can be abused to modify struct pt_regs via kprobe+freplace when the kprobe attaches to kernel functions. For example, SEC("?kprobe") int kprobe(struct pt_regs regs) { return 0; } SEC("?freplace") int freplace_kprobe(struct pt_regs regs) { regs->di = 0; return 0; } freplace_kprobe prog will attach to kprobe prog. kprobe prog will attach to a kernel function. Without this patch, when the kernel function runs, its first arg will always be set as 0 via the freplace_kprobe prog. To fix the abuse of kprobe_write_ctx=true via kprobe+freplace, disallow attaching freplace programs on kprobe programs with different kprobe_write_ctx values. Fixes: `7384893d97` ("bpf: Allow uprobe program to change context registers") Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260331145353.87606-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-02 09:29:49 -07:00
Mykyta Yatsenko	0eeb0094ba	selftests/bpf: Suppress veristat error messages in non-verbose mode When running veristat across many BPF objects, expected load failures produce noisy stderr output that obscures actual issues. Gate these diagnostic messages behind --verbose. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260331172634.57402-2-mykyta.yatsenko5@gmail.com	2026-03-31 15:55:47 -07:00
Menglong Dong	3e6475dc60	selftests/bpf: Test access to ringbuf position with map pointer Add the testing to access the bpf_ringbuf with the map pointer. "consumer_pos" and "producer_pos" is accessed in this testing. We reserve 128 bytes in the ringbuf to test the producer_pos, which should be "128 + BPF_RINGBUF_HDR_SZ". It will be helpful if we want to evaluate the usage of the ringbuf in bpf prog with the consumer and producer position. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/bpf/20260331070434.10037-1-dongml2@chinatelecom.cn	2026-03-31 15:47:14 -07:00
Eyal Birger	f9a80c7ce4	bpf: Clarify BPF_RB_NO_WAKEUP behavior for bpf_ringbuf_discard() Clarify bpf_ringbuf_discard() documentation for BPF_RB_NO_WAKEUP. Discarded ring buffer records are still left in the ring buffer and are only skipped when user space consumes them. This can matter when BPF_RB_NO_WAKEUP is used: a later submit relying on adaptive wakeup might not wake the consumer, because the discarded record still needs to be consumed first. Scenario: epoll_wait(rb_fd); // blocks rec = bpf_ringbuf_reserve(&rb, ...); bpf_ringbuf_discard(rec, BPF_RB_NO_WAKEUP); rec = bpf_ringbuf_reserve(&rb, ...); bpf_ringbuf_submit(rec, 0); // valid record, but no wakeup Document this in bpf_ringbuf_discard() to make the interaction between discarded records, user-space consumption, and adaptive wakeups explicit. Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260331130612.3762433-1-eyal.birger@gmail.com ---- v2: adapt wording per feedback from Andrii.	2026-03-31 15:46:34 -07:00
Jiri Olsa	9eccdd38fb	bpf: Fix block device hooks names Use proper names for block device hooks names. Fixes: `46df585fcf` ("bpf: classify block device hooks appropriately") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/bpf/acrVKUy_EPiFFmV9@krava/T/#m7c7906a1ff4029e29185aec3266dbf5c8996dbf7 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20260330210344.3073712-1-jolsa@kernel.org	2026-03-31 11:11:42 +02:00
haoyu.lu	b6b5e0ebd4	bpf,arc_jit: Fix missing newline in pr_err messages Add missing newline to pr_err messages in ARC JIT. Fixes: `f122668ddc` ("ARC: Add eBPF JIT support") Signed-off-by: haoyu.lu <hechushiguitu666@gmail.com> Link: https://lore.kernel.org/r/20260324122703.641-1-hechushiguitu666@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-29 09:59:00 -07:00
Daniel Borkmann	398ad123e8	selftests/bpf: Add few tests for alu32 shift value tracking and zext Add few more alu32 shift tests using div-by-zero on provably dead paths to check both verifier and JIT xlation resp. runtime correctness. If the verifier mistracks the result, it rejects due to the div by 0; if the JIT computes a wrong value, then runtime hits the dead path and retval changes. # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_subreg [...] #644/76 verifier_subreg/arsh32_imm1_value:OK #644/77 verifier_subreg/lsh32_reg0_zero_extend_check:OK #644/78 verifier_subreg/rsh32_reg0_zero_extend_check:OK #644/79 verifier_subreg/arsh32_reg0_zero_extend_check:OK #644/80 verifier_subreg/lsh32_imm31_value:OK #644/81 verifier_subreg/rsh32_imm31_value:OK #644/82 verifier_subreg/arsh32_imm31_value:OK #644/83 verifier_subreg/lsh32_unknown_precise_bounds:OK #644/84 verifier_subreg/rsh32_unknown_bounds:OK #644 verifier_subreg:OK Summary: 1/84 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260327220629.343327-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-29 09:57:39 -07:00
Ihor Solodrai	101a9d9df8	selftests/bpf: Update kfuncs using btf_struct_meta to new variants Update selftests to use the new non-_impl kfuncs marked with KF_IMPLICIT_ARGS by removing redundant declarations and macros from bpf_experimental.h (the new kfuncs are present in the vmlinux.h) and updating relevant callsites. Fix spin_lock verifier-log matching for lock_id_kptr_preserve by accepting variable instruction numbers. The calls to kfuncs with implicit arguments do not have register moves (e.g. r5 = 0) corresponding to dummy arguments anymore, so the order of instructions has shifted. Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260327203241.3365046-2-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-29 09:56:06 -07:00
Ihor Solodrai	d457072576	bpf: Support struct btf_struct_meta via KF_IMPLICIT_ARGS The following kfuncs currently accept void meta__ign argument: bpf_obj_new_impl * bpf_obj_drop_impl * bpf_percpu_obj_new_impl * bpf_percpu_obj_drop_impl * bpf_refcount_acquire_impl * bpf_list_push_back_impl * bpf_list_push_front_impl * bpf_rbtree_add_impl The __ign suffix is an indicator for the verifier to skip the argument in check_kfunc_args(). Then, in fixup_kfunc_call() the verifier may set the value of this argument to struct btf_struct_meta * kptr_struct_meta from insn_aux_data. BPF programs must pass a dummy NULL value when calling these kfuncs. Additionally, the list and rbtree _impl kfuncs also accept an implicit u64 argument, which doesn't require __ign suffix because it's a scalar, and BPF programs explicitly pass 0. Add new kfuncs with KF_IMPLICIT_ARGS [1], that correspond to each _impl kfunc accepting meta__ign. The existing _impl kfuncs remain unchanged for backwards compatibility. To support this, add "btf_struct_meta" to the list of recognized implicit argument types in resolve_btfids. Implement is_kfunc_arg_implicit() in the verifier, that determines implicit args by inspecting both a non-_impl BTF prototype of the kfunc. Update the special_kfunc_list in the verifier and relevant checks to support both the old _impl and the new KF_IMPLICIT_ARGS variants of btf_struct_meta users. [1] https://lore.kernel.org/bpf/20260120222638.3976562-1-ihor.solodrai@linux.dev/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260327203241.3365046-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-29 09:56:06 -07:00
Alexei Starovoitov	5e961eebef	Merge branch 'bpf-classify-block-device-hooks-and-add-selftests' Christian Brauner says: ==================== bpf: classify block device hooks and add selftests A bunch of new hooks for managing block devices were added a while ago but they weren't appropriately classified. Classify them and add a test program so we catch regressions. Note that for whatever reason building the bpf selftests locally seems to fail for all kinds of arcane reasons for me. That might just be my fault. I added a pr against the ci to have the selftests run but to test this meaningfully it needs veritysetup and dmverity support. I'm not sure if that's available already. Signed-off-by: Christian Brauner <brauner@kernel.org> --- Changes in v2: - No changes. - Link to v1: https://patch.msgid.link/20260220-work-bpf-bdev-v1-0-c53e852c4702@kernel.org --- ==================== Link: https://patch.msgid.link/20260326-work-bpf-bdev-v2-0-5e3c58963987@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-27 09:05:13 -07:00
Christian Brauner	96f4c251a0	selftests/bpf: add block device management selftests Add selftests to test block device tracking for bpf lsm programs. Signed-off-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20260326-work-bpf-bdev-v2-2-5e3c58963987@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-27 09:05:13 -07:00
Christian Brauner	46df585fcf	bpf: classify block device hooks appropriately A bunch of new hooks for managing block devices were added a while ago but they weren't actually appropriately classified. * bpf_lsm_bdev_alloc() is called when the inode for the block device is allocated. This happens from a sleepable context so mark the function as sleepable. When this function is called the memory for the block device storage embedded into the inode is zeroed. That block device cannot be meaningfully reference or interacted with at this point. So mark it as untrusted for now. * bpf_lsm_bdev_free() is called when the inode for the block device is freed. A bunch of memory associated with the block device has already been freed and there's dangling pointers in there. So mark it as untrusted. It cannot be meaningfully referenced or interacted with anymore. It is also called from sb->s_op->free_inode:: which means it runs in rcu context (most of the times). So leave it as non-sleepable. * bpf_lsm_bdev_setintegrity() is called when a dm-verity device is instantiated (glossing over details for simplicity of the commit message). The block device is very much alive so it remains a trusted hook. It's also called with device mapper's suspend lock held and so the hook is able to sleep so mark it sleepable. Signed-off-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20260326-work-bpf-bdev-v2-1-5e3c58963987@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-27 09:05:13 -07:00
Andrii Nakryiko	01504da43e	Merge branch 'add-btf-layout-to-btf' Alan Maguire says: ==================== Add BTF layout to BTF Update struct btf_header to add a new "layout" section containing a description of how to parse the BTF kinds known about at BTF encoding time. This provides the opportunity for tools that might not know all of these kinds - as is the case when older tools run on more newly-generated BTF - to still parse the BTF provided, even if it cannot all be used. The ideas here were discussed at [1], with further discussion at [2]. Patches for pahole will enable the layout addition during BTF generation are at [3], but even absent these the addition of the layout feature in the final patch in this series should not break anything since such unknown features are simply ignored during pahole BTF generation. Separately tested sanitization of BTF location info with separate small series which simulates feature absence to support testing of features for older kernels; will follow up with that shortly. Changes since v15 [4]: - Fixed endian issues for layout section by swapping flags fields where needed (sashiko.dev, patch 2) - Fixed string size issue with swapped endian case, use btf->magic for comparison to determine endian mismatch (bpf review bot, sashiko.dev, patch 6) Changes since v14 [5]: - Fix potential overflow for swapped endian case (BPF review bot, patch 2) - Add global: keyword to libbpf.map (sashiko.dev, patch 4) - Fix endian issues in sanitization; we use the endian safe btf->hdr and check for endian mismatch between it and raw original BTF header to inform how we write the change str_off. Also fix potential truncation issues due to not including hdr->type_off (sashiko.dev, patch 6) - Fix issues with selftests raw BTF file interactions (sashiko.dev, patch 8) - Drop feature test test since it will be covered by another series Changes since v13: [6]: - add feature check/sanitization of BTF with layout info (Andrii, patch 6) - added feature check test for layout support (patch 9) Changes since v12: [7]: - add logging of layout off/len to kernel header logging (review bot, patch 6) - add mode to open() in selftest (review bot, patch 7) Changes since v11 [8]: - Revert unneeded changes to btf__new_empty() (Eduard, review bot, patch 4) - Reorder btf_parse_layout_sec() checks to ensure min size check occurs before multiple check (review bot, patch 6) Changes since v10 [9]: - deal with read overflow with small header (review bot, patch 2) - validate layout length is a multiple of sizeof(struct btf_layout) (review bot, patch 6) - fix comment style (Alexei, patches 4,7) - remove bpftool BTF metadata subcommands for now (Alexei) Changes since v9: [10]: - fix memcpy header size overrun (review bot, patch 2) - return size computation directly (Andrii, patch 333) - revert to original unknown kind logging (Alexei/review bot, patch 6) - gap-checking logic can be simplified now that we have 4-byte aligned types and layout together (patch 6) - fix naming of layout offset, unconditionally emit a layouts array in json (Quentin, review bot, patch 8) - fix metadata output in man page to include flags (review bot, patch 9) Changes since v8: [11]: - updated name from "kind_layout" to "layout" (Andrii) - moved layout info to inbetween types and strings since both types and layout info align on 4 bytes (Andrii) - use embedded btf_header (Eduard) - when consulting layout, fall back to base BTF if none found in split BTF; this will allow us to only encode layout info in vmlinux rather than repeating it for each module. Changes since v7: [12]: - Fixed comment style in UAPI headers (Mykyta, patch 1) - Simplify calcuation of header size using min() (Mykyta, patch 2) - simplify computation of bounds for kind (Mykyta, patch 3) - Added utility functions for updating type, string offsets when data is added; this simplifies the code and encapsulates such updates more clearly (patch 2) Changes since v6: [13]: - BPF review bot caught some memory leaks around freeing of kind layout; more importantly, it noted that we were breaking with the contiguous BTF representation for btf_new_empty_opts(). Doing so meant that freeing kind_layout could not be predicated on having btf->modifiable set, so adpoted the contiguous raw data layout for BTF to be consistent with type/string storage (patches 2,4) - Moved checks for kind overflow prior to referencing kinds to avoid any risk of overrun (patches 3, 8) - Tightened up kind layout header offset/len header validation to catch invalid combinations early in btf_parse_hdr() (patch 2) - Fixed selftest to verify calloc success (patch 7) Changes since v5: [14]: - removed flags field from kind layout; it is not really workable since we would have to define semantics of all possible future flags today to be usable. Instead stick to parsing only, which means each kind just needs the length of the singular and vlen-specified objects (Alexei) - added documentation for bpftool BTF metadata dump (Quentin, patch 9) Changes since v4: [15]: - removed CRC generation since it is not needed to handle modules built at different time than kernel; distilled base BTF supports this now - fixed up bpftool display of empty kind names, comment/documentation indentation (Quentin, patches 8, 9) Changes since v3 [16]: - fixed mismerge issues with kbuild changes for BTF generation (patches 9, 14) - fixed a few small issues in libbpf with kind layout representation (patches 2, 4) Changes since v2 [17]: - drop "optional" kind flag (Andrii, patch 1) - allocate "struct btf_header" for struct btf to ensure we can always access new fields (Andrii, patch 2) - use an internal BTF kind array in btf.c to simplify kind encoding (Andrii, patch 2) - drop use of kind layout information for in-kernel parsing, since the kernel needs to be strict in what it accepts (Andrii, patch 6) - added CRC verification for BTF objects and for matching with base object (Alexei, patches 7,8) - fixed bpftool json output (Quentin, patch 10) - added standalone module BTF support, tests (patches 13-17) Changes since RFC - Terminology change from meta -> kind_layout (Alexei and Andrii) - Simplify representation, removing meta header and just having kind layout section (Alexei) - Fixed bpftool to have JSON support, support prefix match, documented changes (Quentin) - Separated metadata opts into add_kind_layout and add_crc - Added additional positive/negative tests to cover basic unknown kind, one with an info_sz object following it and one with N elem_sz elements following it. - Updated pahole-flags to use help output rather than version to see if features are present [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/20230531201936.1992188-1-alan.maguire@oracle.com/ [3] https://lore.kernel.org/dwarves/20260226085240.1908874-1-alan.maguire@oracle.com/ [4] https://lore.kernel.org/bpf/20260324174450.1570809-1-alan.maguire@oracle.com/ [5] https://lore.kernel.org/bpf/20260318132927.1142388-1-alan.maguire@oracle.com/ [6] https://lore.kernel.org/bpf/20260306113630.1281527-1-alan.maguire@oracle.com/ [7] https://lore.kernel.org/bpf/20260303182003.117483-1-alan.maguire@oracle.com/ [8] https://lore.kernel.org/bpf/20260302114059.3697879-1-alan.maguire@oracle.com/ [9] https://lore.kernel.org/bpf/20260227100426.2585191-1-alan.maguire@oracle.com/ [10] https://lore.kernel.org/bpf/20260226085624.1909682-1-alan.maguire@oracle.com/ [11] https://lore.kernel.org/bpf/20251215091730.1188790-1-alan.maguire@oracle.com/ [12] https://lore.kernel.org/dwarves/20251211164646.1219122-1-alan.maguire@oracle.com/ [13] https://lore.kernel.org/bpf/20251210203243.814529-1-alan.maguire@oracle.com/ [14] https://lore.kernel.org/bpf/20250528095743.791722-1-alan.maguire@oracle.com/ [15] https://lore.kernel.org/bpf/20231112124834.388735-1-alan.maguire@oracle.com/ [16] https://lore.kernel.org/bpf/20231110110304.63910-1-alan.maguire@oracle.com/ [17] https://lore.kernel.org/bpf/20230616171728.530116-1-alan.maguire@oracle.com/ ==================== Link: https://patch.msgid.link/20260326145444.2076244-1-alan.maguire@oracle.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2026-03-26 13:53:57 -07:00
Alan Maguire	5e1942eb1c	kbuild, bpf: Specify "layout" optional feature The "layout" feature will add metadata about BTF kinds to the generated BTF; its absence in pahole will not trigger an error so it is safe to add unconditionally as it will simply be ignored if pahole does not support it. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-10-alan.maguire@oracle.com	2026-03-26 13:53:57 -07:00
Alan Maguire	0467491617	selftests/bpf: Test kind encoding/decoding verify btf__new_empty_opts() adds layouts for all kinds supported, and after adding kind-related types for an unknown kind, ensure that parsing uses this info when that kind is encountered rather than giving up. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-9-alan.maguire@oracle.com	2026-03-26 13:53:57 -07:00
Alan Maguire	626e88c070	btf: Support kernel parsing of BTF with layout info Validate layout if present, but because the kernel must be strict in what it accepts, reject BTF with unsupported kinds, even if they are in the layout information. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-8-alan.maguire@oracle.com	2026-03-26 13:53:56 -07:00
Alan Maguire	081677d03d	libbpf: Support sanitization of BTF layout for older kernels Add a FEAT_BTF_LAYOUT feature check which checks if the kernel supports BTF layout information. Also sanitize BTF if it contains layout data but the kernel does not support it. The sanitization requires rewriting raw BTF data to update the header and eliminate the layout section (since it lies between the types and strings), so refactor sanitization to do the raw BTF retrieval and creation of updated BTF, returning that new BTF on success. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-7-alan.maguire@oracle.com	2026-03-26 13:53:56 -07:00
Alan Maguire	6ad8928599	libbpf: BTF validation can use layout for unknown kinds BTF parsing can use layout to navigate unknown kinds, so btf_validate_type() should take layout information into account to avoid failure when an unrecognized kind is met. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-6-alan.maguire@oracle.com	2026-03-26 13:53:56 -07:00
Alan Maguire	d686d92c40	libbpf: Add layout encoding support Support encoding of BTF layout data via btf__new_empty_opts(). Current supported opts are base_btf and add_layout. Layout information is maintained in btf.c in the layouts[] array; when BTF is created with the add_layout option it represents the current view of supported BTF kinds. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-5-alan.maguire@oracle.com	2026-03-26 13:53:56 -07:00
Alan Maguire	2ecbe53e0e	libbpf: Use layout to compute an unknown kind size This allows BTF parsing to proceed even if we do not know the kind. Fall back to base BTF layout if layout information is not in split BTF. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-4-alan.maguire@oracle.com	2026-03-26 13:53:56 -07:00
Alan Maguire	087f3964f4	libbpf: Support layout section handling in BTF Support reading in layout fixing endian issues on reading; also support writing layout section to raw BTF object. There is not yet an API to populate the layout with meaningful information. As part of this, we need to consider multiple valid BTF header sizes; the original or the layout-extended headers. So to support this, the "struct btf" representation is modified to contain a "struct btf_header" and we copy the valid portion from the raw data to it; this means we can always safely check fields like btf->hdr.layout_len . Note if parsed-in BTF has extra header information beyond sizeof(struct btf_header) - if so we make that BTF ineligible for modification by setting btf->has_hdr_extra . Ensure that we handle endianness issues for BTF layout section, though currently only field that needs this (flags) is unused. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-3-alan.maguire@oracle.com	2026-03-26 13:53:56 -07:00
Alan Maguire	222edc843c	btf: Add BTF kind layout encoding to UAPI BTF kind layouts provide information to parse BTF kinds. By separating parsing BTF from using all the information it provides, we allow BTF to encode new features even if they cannot be used by readers. This will be helpful in particular for cases where older tools are used to parse newer BTF with kinds the older tools do not recognize; the BTF can still be parsed in such cases using kind layout. The intent is to support encoding of kind layouts optionally so that tools like pahole can add this information. For each kind, we record - length of singular element following struct btf_type - length of each of the btf_vlen() elements following - a (currently unused) flags field The ideas here were discussed at [1], [2]; hence Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260326145444.2076244-2-alan.maguire@oracle.com [1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/20230531201936.1992188-1-alan.maguire@oracle.com/	2026-03-26 13:53:56 -07:00
Alexei Starovoitov	400ff899c3	selftests/bpf: Make reg_bounds test more robust The verifier log output may contain multiple lines that start with 18: (bf) r0 = r6 teach reg_bounds to look for lines that have ';' in them, since reg_bounds test is looking for: 18: (bf) r0 = r6 ; R0=... R6=... Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260325012242.45606-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-25 08:50:33 -07:00
Alexei Starovoitov	9f7d8fa681	selftests/bpf: Test variable length stack write Add a test to make sure that variable length stack writes scrubs STACK_SPILL into STACK_MISC. Tested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260324215938.81733-2-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 17:00:16 -07:00
Alexei Starovoitov	4639eb9e30	bpf: Fix variable length stack write over spilled pointers Scrub slots if variable-offset stack write goes over spilled pointers. Otherwise is_spilled_reg() may == true && spilled_ptr.type == NOT_INIT and valid program is rejected by check_stack_read_fixed_off() with obscure "invalid size of register fill" message. Fixes: `01f810ace9` ("bpf: Allow variable-offset stack access") Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260324215938.81733-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 17:00:11 -07:00
David Carlier	8ed82f807b	bpf: Use RCU-safe iteration in dev_map_redirect_multi() SKB path The DEVMAP_HASH branch in dev_map_redirect_multi() uses hlist_for_each_entry_safe() to iterate hash buckets, but this function runs under RCU protection (called from xdp_do_generic_redirect_map() in softirq context). Concurrent writers (__dev_map_hash_update_elem, dev_map_hash_delete_elem) modify the list using RCU primitives (hlist_add_head_rcu, hlist_del_rcu). hlist_for_each_entry_safe() performs plain pointer dereferences without rcu_dereference(), missing the acquire barrier needed to pair with writers' rcu_assign_pointer(). On weakly-ordered architectures (ARM64, POWER), a reader can observe a partially-constructed node. It also defeats CONFIG_PROVE_RCU lockdep validation and KCSAN data-race detection. Replace with hlist_for_each_entry_rcu() using rcu_read_lock_bh_held() as the lockdep condition, consistent with the rcu_dereference_check() used in the DEVMAP (non-hash) branch of the same functions. Also fix the same incorrect lockdep_is_held(&dtab->index_lock) condition in dev_map_enqueue_multi(), where the lock is not held either. Fixes: `e624d4ed4a` ("xdp: Extend xdp_redirect_map with broadcast support") Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260320072645.16731-1-devnexen@gmail.com	2026-03-24 15:17:20 -07:00
Sun Jian	7f5b0a60a8	selftests/bpf: move trampoline_count to dedicated bpf_testmod target trampoline_count fills all trampoline attachment slots for a single target function and verifies that one extra attach fails with -E2BIG. It currently targets bpf_modify_return_test, which is also used by other selftests such as modify_return, get_func_ip_test, and get_func_args_test. When such tests run in parallel, they can contend for the same per-function trampoline quota and cause unexpected attach failures. This issue is currently masked by harness serialization. Move trampoline_count to a dedicated bpf_testmod target and register it for fmod_ret attachment. Also route the final trigger through trigger_module_test_read(), so the execution path exercises the same dedicated target. This keeps the test semantics unchanged while isolating it from other selftests, so it no longer needs to run in serial mode. Remove the TODO comment as well. Tested: ./test_progs -t trampoline_count -vv ./test_progs -j$(nproc) -t trampoline_count -vv ./test_progs -j$(nproc) -t \ trampoline_count,modify_return,get_func_ip_test,get_func_args_test -vv 20 runs of: ./test_progs -j$(nproc) -t \ trampoline_count,modify_return,get_func_ip_test,get_func_args_test Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260324044949.869801-1-sun.jian.kdev@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 13:39:32 -07:00
Jiayuan Chen	d9d7125e44	selftests/bpf: Fix sockmap_multi_channels reliability Previously I added a FIONREAD test for sockmap, but it can occasionally fail in CI [1]. The test sends 10 bytes in two segments (2 + 8). For UDP, FIONREAD only reports the length of the first datagram, not the total queued data. The original code used recv_timeout() expecting all 10 bytes, but under high system load, the second datagram may not yet be processed by the protocol stack, so recv would only return the first 2-byte datagram, causing a size mismatch failure. Fix this by receiving exactly the expected bytes (matching FIONREAD) in the first recv. The remaining datagram is then consumed in a second recv block, which is only reachable for UDP since TCP's expected already equals sizeof(buf). Test: ./test_progs -a sockmap_basic 410/1 sockmap_basic/sockmap create_update_free:OK ... Summary: 1/35 PASSED, 0 SKIPPED, 0 FAILED [1] https://github.com/kernel-patches/bpf/actions/runs/22919385910/job/66515395423 Cc: Jiayuan Chen <jiayuan.chen@linux.dev> Fixes: `17e2ce02bf` ("selftests/bpf: Add tests for FIONREAD and copied_seq") Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://lore.kernel.org/r/20260312072549.6766-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 13:38:43 -07:00
Jiayuan Chen	2790db208b	selftests/bpf: Improve tc_tunnel test reliability A test failure was discovered in BPF CI [1] caused by connection timeout. The current test timeout of 500ms is insufficient for CI environments, particularly under high load. While the optimal timeout is unclear, this test was converted from the original test_tc_tunnel.sh script. The original script used nc with "-w 1" to specify a 1-second timeout [2]. Therefore, this test restores the timeout to 1s. Test: ./test_progs -a tc_tunnel #478/1 tc_tunnel/ipip_none:OK #478/2 tc_tunnel/ipip6_none:OK #478/3 tc_tunnel/ip6tnl_none:OK #478/4 tc_tunnel/sit_none:OK #478/5 tc_tunnel/vxlan_eth:OK #478/6 tc_tunnel/ip6vxlan_eth:OK #478/7 tc_tunnel/gre_none:OK #478/8 tc_tunnel/gre_eth:OK #478/9 tc_tunnel/gre_mpls:OK #478/10 tc_tunnel/ip6gre_none:OK #478/11 tc_tunnel/ip6gre_eth:OK #478/12 tc_tunnel/ip6gre_mpls:OK #478/13 tc_tunnel/udp_none:OK #478/14 tc_tunnel/udp_eth:OK #478/15 tc_tunnel/udp_mpls:OK #478/16 tc_tunnel/ip6udp_none:OK #478/17 tc_tunnel/ip6udp_eth:OK #478/18 tc_tunnel/ip6udp_mpls:OK #478 tc_tunnel:OK Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED [1] https://github.com/kernel-patches/bpf/actions/runs/22674350732/job/65728072723 [2] https://lore.kernel.org/all/20251027-tc_tunnel-v3-4-505c12019f9d@bootlin.com/ Cc: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://lore.kernel.org/r/20260312083615.31835-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 13:38:04 -07:00
Kexin Sun	70b5f3f782	bpf: update outdated comment for refactored btf_check_kfunc_arg_match() The function btf_check_kfunc_arg_match() was refactored into check_kfunc_args() by commit `00b85860fe` ("bpf: Rewrite kfunc argument handling"). Update the comment accordingly. Assisted-by: unnamed:deepseek-v3.2 coccinelle Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn> Acked-by: Yonghong Song <yonghong.song@linux.dev> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260321105658.6006-1-kexinsun@smail.nju.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 13:37:29 -07:00
Alexei Starovoitov	70275388ae	Merge branch 'bpf-add-multi-level-pointer-parameter-support-for-trampolines' Slava Imameev says: ==================== bpf: Add multi-level pointer parameter support for trampolines This is v6 of a series adding support for new pointer types for trampoline parameters. Originally, only support for multi-level pointers was proposed. As suggested during review, it was extended to single-level pointers. During discussion, it was proposed to add support for any single or multi-level pointer type that is not a single-level pointer to a structure, with the condition if (!btf_type_is_struct(t)). The safety of this condition is based on BTF data verification performed for modules and programs, and vmlinux BTF being trusted to not contain invalid types, so it is not possible for invalid types, like PTR->DATASEC, PTR->FUNC, PTR->VAR and corresponding multi-level pointers, to reach btf_ctx_access. These changes appear to be a safe extension since any future support for arrays and output values would require annotation (similar to Microsoft SAL), which differentiates between current unannotated scalar cases and new annotated cases. This series adds BPF verifier support for single- and multi-level pointer parameters and return values in BPF trampolines. The implementation treats these parameters as SCALAR_VALUE. This is consistent with existing pointers to int and void that are already treated as SCALAR. This provides consistent logic for single- and multi-level pointers: if the type is treated as SCALAR for a single-level pointer, the same applies to multi-level pointers, except for pointers to structs which are currently PTR_TO_BTF_ID. However, in the case of multi-level pointers, they are treated as scalar since the verifier lacks the context to infer the size of their target memory regions. Background: Prior to these changes, accessing multi-level pointer parameters or return values through BPF trampoline context arrays resulted in verification failures in btf_ctx_access, producing errors such as: func '%s' arg%d type %s is not a struct For example, consider a BPF program that logs an input parameter of type struct posix_acl : SEC("fentry/__posix_acl_chmod") int BPF_PROG(trace_posix_acl_chmod, struct posix_acl ppacl, gfp_t gfp, umode_t mode) { bpf_printk("__posix_acl_chmod ppacl = %px\n", ppacl); return 0; } This program failed BPF verification with the following error: libbpf: prog 'trace_posix_acl_chmod': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int BPF_PROG(trace_posix_acl_chmod, struct posix_acl *ppacl, gfp_t gfp, umode_t mode) @ posix_acl_monitor.bpf.c:23 0: (79) r6 = (u64 )(r1 +16) ; R1=ctx() R6_w=scalar() 1: (79) r1 = (u64 )(r1 +0) func '__posix_acl_chmod' arg0 type PTR is not a struct invalid bpf_context access off=0 size=8 processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- The common workaround involved using helper functions to fetch parameter values by passing the address of the context array entry: SEC("fentry/__posix_acl_chmod") int BPF_PROG(trace_posix_acl_chmod, struct posix_acl ppacl, gfp_t gfp, umode_t mode) { struct posix_acl pp; bpf_probe_read_kernel(&pp, sizeof(ppacl), &ctx[0]); bpf_printk("__posix_acl_chmod %px\n", pp); return 0; } This approach introduced helper call overhead and created inconsistency with parameter access patterns. Improvements: With this patch, trampoline programs can directly access multi-level pointer parameters, eliminating helper call overhead and explicit ctx access while ensuring consistent parameter handling. For example, the following ctx access with a helper call: SEC("fentry/__posix_acl_chmod") int BPF_PROG(trace_posix_acl_chmod, struct posix_acl ppacl, gfp_t gfp, umode_t mode) { struct posix_acl pp; bpf_probe_read_kernel(&pp, sizeof(pp), &ctx[0]); bpf_printk("__posix_acl_chmod %px\n", pp); ... } is replaced by a load instruction: SEC("fentry/__posix_acl_chmod") int BPF_PROG(trace_posix_acl_chmod, struct posix_acl ppacl, gfp_t gfp, umode_t mode) { bpf_printk("__posix_acl_chmod %px\n", ppacl); ... } The bpf_core_cast macro can be used for deeper level dereferences. v1 -> v2: corrected maintainer's email v2 -> v3: * Addressed reviewers' feedback: * Changed the register type from PTR_TO_MEM to SCALAR_VALUE. * Modified tests to accommodate SCALAR_VALUE handling. * Fixed a compilation error for loongarch * https://lore.kernel.org/oe-kbuild-all/202602181710.tEK6nOl6-lkp@intel.com/ * Addressed AI bot review * Added a commentary to address a NULL pointer case * Removed WARN_ON * Fixed a commentary v3 -> v4: * Added more consistent support for single and multi-level pointers as suggested by reviewers. * added single level pointers to enum 32 and 64 * added single level pointers to functions * harmonized support for single and multi-level pointer types * added new tests to support the above changes * Removed create_bad_kaddr that allocated and invalidated kernel VA for tests, and replaced it with hardcoded values similar to bpf_testmod_return_ptr as suggested by reviewers. v4 -> v5: * As suggested, extended support to single-level pointers and covered all supported valid pointer (single- and multi-level) types with a wider condition if (!btf_type_is_struct(t)). * As requested, simplified tests by keeping only tests that check the verifier log for scalar(). v5 -> v6: * Fixed the test message based on the bot's feedback. ==================== Link: https://patch.msgid.link/20260314082127.7939-1-slava.imameev@crowdstrike.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 13:36:32 -07:00
Slava Imameev	e8571de534	selftests/bpf: Add trampolines single and multi-level pointer params test coverage Add single and multi-level pointer parameters and return value test coverage for BPF trampolines. Includes verifier tests for single and multi-level pointers. The tests check verifier logs for pointers inferred as scalar() type. Signed-off-by: Slava Imameev <slava.imameev@crowdstrike.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260314082127.7939-3-slava.imameev@crowdstrike.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 13:36:32 -07:00
Slava Imameev	4145203841	bpf: Support pointer param types via SCALAR_VALUE for trampolines Add BPF verifier support for single- and multi-level pointer parameters and return values in BPF trampolines by treating these parameters as SCALAR_VALUE. This extends the existing support for int and void pointers that are already treated as SCALAR_VALUE. This provides consistent logic for single and multi-level pointers: if a type is treated as SCALAR for a single-level pointer, the same applies to multi-level pointers. The exception is pointer-to-struct, which is currently PTR_TO_BTF_ID for single-level but treated as scalar for multi-level pointers since the verifier lacks context to infer the size of target memory regions. Safety is ensured by existing BTF verification, which rejects invalid pointer types at the BTF verification stage. Signed-off-by: Slava Imameev <slava.imameev@crowdstrike.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260314082127.7939-2-slava.imameev@crowdstrike.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 13:36:31 -07:00
Alexei Starovoitov	7b4f1a29c7	selftests/bpf: Test 32-bit scalar spill pruning in stacksafe() Add a test verifying that stacksafe() correctly handles 32-bit scalar spills when comparing stack states for equivalence during state pruning. A 32-bit scalar spill creates slot[0-3] = STACK_INVALID and slot[4-7] = STACK_SPILL. Without the im=4 check in stacksafe(), the STACK_SPILL vs STACK_MISC mismatch at byte 4 causes pruning to fail, forcing the verifier to re-explore a path that is provably safe. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260323022410.75444-2-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 12:10:38 -07:00
Alexei Starovoitov	596bef1d71	bpf: Support 32-bit scalar spills in stacksafe() v1->v2: updated comments v1: https://lore.kernel.org/bpf/20260322225124.14005-1-alexei.starovoitov@gmail.com/ The commit `6efbde200b` ("bpf: Handle scalar spill vs all MISC in stacksafe()") in stacksafe() only recognized full 64-bit scalar spills when comparing stack states for equivalence during state pruning and missed 32-bit scalar spill. When 32-bit scalar is spilled the check_stack_write_fixed_off() -> save_register_state() calls mark_stack_slot_misc() for slot[0-3], which preserves STACK_INVALID and STACK_ZERO (on a fresh stack slot[0-3] remain STACK_INVALID), sets slot[4-7] = STACK_SPILL, and updates spilled_ptr. The im=4 path is only reached when im=0 fails: The loop at im=0 already attempts the 64-bit scalar-spill/all-MISC check. If it matches, i advances by 7, skipping the entire 8-byte slot. So im=4 is only reached when bytes 0-3 are neither a scalar spill nor all-MISC — they must pass individual byte-by-byte comparison first. Then bytes 4-7 get the scalar-unit treatment. is_spilled_scalar_after(stack, 4): slot_type[4] == STACK_SPILL from a 64-bit spill would have been caught at im=0 (unless it's a pointer spill, in which case spilled_ptr.type != SCALAR_VALUE -> returns false at im=4 too). A partial overwrite of a 64-bit spill invalidates the entire slot in check_stack_write_fixed_off(). is_stack_misc_after(stack, 4): Only checks bytes 4-7 are MISC/INVALID, returns &unbound_reg. Comparing two unbound regs via regsafe() is safe. Changes to cilium programs: File Program Insns (A) Insns (B) Insns (DIFF) _______________ _________________________________ _________ _________ ________________ bpf_host.o cil_host_policy 49351 45811 -3540 (-7.17%) bpf_host.o cil_to_host 2384 2270 -114 (-4.78%) bpf_host.o cil_to_netdev 112051 100269 -11782 (-10.51%) bpf_host.o tail_handle_ipv4_cont_from_host 61175 60910 -265 (-0.43%) bpf_host.o tail_handle_ipv4_cont_from_netdev 9381 8873 -508 (-5.42%) bpf_host.o tail_handle_ipv4_from_host 12994 7066 -5928 (-45.62%) bpf_host.o tail_handle_ipv4_from_netdev 85015 59875 -25140 (-29.57%) bpf_host.o tail_handle_ipv6_cont_from_host 24732 23527 -1205 (-4.87%) bpf_host.o tail_handle_ipv6_cont_from_netdev 9463 8953 -510 (-5.39%) bpf_host.o tail_handle_ipv6_from_host 12477 11787 -690 (-5.53%) bpf_host.o tail_handle_ipv6_from_netdev 30814 30017 -797 (-2.59%) bpf_host.o tail_handle_nat_fwd_ipv4 8943 8860 -83 (-0.93%) bpf_host.o tail_handle_snat_fwd_ipv4 64716 61625 -3091 (-4.78%) bpf_host.o tail_handle_snat_fwd_ipv6 48299 30797 -17502 (-36.24%) bpf_host.o tail_ipv4_host_policy_ingress 21591 20017 -1574 (-7.29%) bpf_host.o tail_ipv6_host_policy_ingress 21177 20693 -484 (-2.29%) bpf_host.o tail_nodeport_nat_egress_ipv4 16588 16543 -45 (-0.27%) bpf_host.o tail_nodeport_nat_ingress_ipv4 39200 36116 -3084 (-7.87%) bpf_host.o tail_nodeport_nat_ingress_ipv6 50102 48003 -2099 (-4.19%) bpf_lxc.o tail_handle_ipv4_cont 113092 96891 -16201 (-14.33%) bpf_lxc.o tail_handle_ipv6 6727 6701 -26 (-0.39%) bpf_lxc.o tail_handle_ipv6_cont 25567 21805 -3762 (-14.71%) bpf_lxc.o tail_ipv4_ct_egress 28843 15970 -12873 (-44.63%) bpf_lxc.o tail_ipv4_ct_ingress 16691 10213 -6478 (-38.81%) bpf_lxc.o tail_ipv4_ct_ingress_policy_only 16691 10213 -6478 (-38.81%) bpf_lxc.o tail_ipv4_policy 6776 6622 -154 (-2.27%) bpf_lxc.o tail_ipv4_to_endpoint 7523 7219 -304 (-4.04%) bpf_lxc.o tail_ipv6_ct_egress 10275 9999 -276 (-2.69%) bpf_lxc.o tail_ipv6_ct_ingress 6466 6438 -28 (-0.43%) bpf_lxc.o tail_ipv6_ct_ingress_policy_only 6466 6438 -28 (-0.43%) bpf_lxc.o tail_ipv6_policy 6859 5159 -1700 (-24.78%) bpf_lxc.o tail_ipv6_to_endpoint 7039 4427 -2612 (-37.11%) bpf_lxc.o tail_nodeport_ipv6_dsr 1175 1033 -142 (-12.09%) bpf_lxc.o tail_nodeport_nat_egress_ipv4 16318 16292 -26 (-0.16%) bpf_lxc.o tail_nodeport_nat_ingress_ipv4 18907 18490 -417 (-2.21%) bpf_lxc.o tail_nodeport_nat_ingress_ipv6 14624 14556 -68 (-0.46%) bpf_lxc.o tail_nodeport_rev_dnat_ipv4 4776 4588 -188 (-3.94%) bpf_overlay.o tail_handle_inter_cluster_revsnat 15733 15498 -235 (-1.49%) bpf_overlay.o tail_handle_ipv4 124682 105717 -18965 (-15.21%) bpf_overlay.o tail_handle_ipv6 16201 15801 -400 (-2.47%) bpf_overlay.o tail_handle_snat_fwd_ipv4 21280 19323 -1957 (-9.20%) bpf_overlay.o tail_handle_snat_fwd_ipv6 20824 20822 -2 (-0.01%) bpf_overlay.o tail_nodeport_ipv6_dsr 1175 1033 -142 (-12.09%) bpf_overlay.o tail_nodeport_nat_egress_ipv4 16293 16267 -26 (-0.16%) bpf_overlay.o tail_nodeport_nat_ingress_ipv4 20841 20737 -104 (-0.50%) bpf_overlay.o tail_nodeport_nat_ingress_ipv6 14678 14629 -49 (-0.33%) bpf_sock.o cil_sock4_connect 1678 1623 -55 (-3.28%) bpf_sock.o cil_sock4_sendmsg 1791 1736 -55 (-3.07%) bpf_sock.o cil_sock6_connect 3641 3600 -41 (-1.13%) bpf_sock.o cil_sock6_recvmsg 2048 1899 -149 (-7.28%) bpf_sock.o cil_sock6_sendmsg 3755 3721 -34 (-0.91%) bpf_wireguard.o tail_handle_ipv4 31180 27484 -3696 (-11.85%) bpf_wireguard.o tail_handle_ipv6 12095 11760 -335 (-2.77%) bpf_wireguard.o tail_nodeport_ipv6_dsr 1232 1094 -138 (-11.20%) bpf_wireguard.o tail_nodeport_nat_egress_ipv4 16071 16061 -10 (-0.06%) bpf_wireguard.o tail_nodeport_nat_ingress_ipv4 20804 20565 -239 (-1.15%) bpf_wireguard.o tail_nodeport_nat_ingress_ipv6 13490 12224 -1266 (-9.38%) bpf_xdp.o tail_lb_ipv4 49695 42673 -7022 (-14.13%) bpf_xdp.o tail_lb_ipv6 122683 87896 -34787 (-28.36%) bpf_xdp.o tail_nodeport_ipv6_dsr 1833 1862 +29 (+1.58%) bpf_xdp.o tail_nodeport_nat_egress_ipv4 6999 6990 -9 (-0.13%) bpf_xdp.o tail_nodeport_nat_ingress_ipv4 28903 28780 -123 (-0.43%) bpf_xdp.o tail_nodeport_nat_ingress_ipv6 200361 197771 -2590 (-1.29%) bpf_xdp.o tail_nodeport_rev_dnat_ipv4 4606 4454 -152 (-3.30%) Changes to sched-ext: File Program Insns (A) Insns (B) Insns (DIFF) _________________________ ________________ _________ _________ _______________ scx_arena_selftests.bpf.o arena_selftest 236305 236251 -54 (-0.02%) scx_chaos.bpf.o chaos_dispatch 12282 8013 -4269 (-34.76%) scx_chaos.bpf.o chaos_enqueue 11398 7126 -4272 (-37.48%) scx_chaos.bpf.o chaos_init 3854 3828 -26 (-0.67%) scx_flash.bpf.o flash_init 1015 979 -36 (-3.55%) scx_flatcg.bpf.o fcg_dispatch 1143 1100 -43 (-3.76%) scx_lavd.bpf.o lavd_enqueue 35487 35472 -15 (-0.04%) scx_lavd.bpf.o lavd_init 21127 21107 -20 (-0.09%) scx_p2dq.bpf.o p2dq_enqueue 10210 7854 -2356 (-23.08%) scx_p2dq.bpf.o p2dq_init 3233 3207 -26 (-0.80%) scx_qmap.bpf.o qmap_init 20285 20230 -55 (-0.27%) scx_rusty.bpf.o rusty_select_cpu 1165 1148 -17 (-1.46%) scxtop.bpf.o on_sched_switch 2369 2355 -14 (-0.59%) Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260323022410.75444-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 11:59:52 -07:00
Kumar Kartikeya Dwivedi	02bcf8ef26	bpf: Update MAINTAINERS file for general BPF entry Per discussion with Alexei, add Eduard and myself as maintainers under BPF [GENERAL]. While at it, drop R entries for reviewers who have been inactive. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260324152230.2916217-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 09:05:11 -07:00
Varun R Mallya	b43d574c00	selftests/bpf: Add test for struct_ops __ref argument in any position Add a selftest to verify that the verifier correctly identifies refcounted arguments in struct_ops programs, even when they are not the first argument. This ensures that the restriction on tail calls for programs with __ref arguments is properly enforced regardless of which argument they appear in. This test verifies the fix for check_struct_ops_btf_id() proposed by Keisuke Nishimura [0], which corrected a bug where only the first argument was checked for the refcounted flag. The test includes: - An update to bpf_testmod to add 'test_refcounted_multi', an operator with three arguments where the third is tagged with "__ref". - A BPF program 'test_refcounted_multi' that attempts a tail call. - A test runner that asserts the verifier rejects the program with "program with __ref argument cannot tail call". [0]: https://lore.kernel.org/bpf/20260320130219.63711-1-keisuke.nishimura@inria.fr/ Signed-off-by: Varun R Mallya <varunrmallya@gmail.com> Link: https://lore.kernel.org/r/20260321214038.80479-1-varunrmallya@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 08:51:23 -07:00
Keisuke Nishimura	25e3e1f109	bpf: Fix refcount check in check_struct_ops_btf_id() The current implementation only checks whether the first argument is refcounted. Fix this by iterating over all arguments. Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr> Fixes: `38f1e66abd` ("bpf: Do not allow tail call in strcut_ops program with __ref argument") Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260320130219.63711-1-keisuke.nishimura@inria.fr Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-03-24 08:50:20 -07:00

1 2 3 4 5 ...

1428970 Commits