Simplify data allocation by always using aligned_alloc() and passing
size_pot, size rounded up to the closest power of two to alignment.
Currently, aligned_alloc(page_size, size) is only intended to be used
with memory allocators that can fulfill the request without rounding
size up to page_size to conserve memory. This is enabled by defining
TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to
the limitation of UPTR where only a page can be pinned to the kernel.
Otherwise, malloc(size * 2) is used to allocate memory for data.
However, we don't need to call aligned_alloc(page_size, size) to get
a contiguous memory of size bytes within a page. aligned_alloc(size_pot,
...) will also do the trick. Therefore, just use aligned_alloc(size_pot,
...) universally.
As for the size argument, create a new option,
TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size.
This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing
memory allocators with low overhead aligned_alloc() to not waste memory.
To enable this, users need to make sure it is not an undefined behavior
for the memory allocator to have size not being an integral multiple of
alignment.
Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC
used to always waste size-byte of memory due to malloc(size * 2).
Now the worst case becomes size - 1 and the best case is 0 when the size
is already a power of two.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, when allocating memory for data, size of tld_data_u->start
is not taken into account. This may cause OOB access. Fixed it by adding
the non-flexible array part of tld_data_u.
Besides, explicitly align tld_data_u->data to 8 bytes in case some
fields are added before data in the future. It could break the
assumption that every data field is 8 byte aligned and
sizeof(tld_data_u) will no longer be equal to
offsetof(struct tld_data_u, data), which we use interchangeably.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Hoyeon Lee says:
====================
libbpf: clarify raw-address single kprobe attach behavior
Today libbpf documents single-kprobe attach through func_name, with an
optional offset. For the PMU-based path, func_name = NULL with an
absolute address in offset already works as well, but that is not
described in the API.
This patchset clarifies this behavior. First commit fixes kprobe
and uprobe attach error handling to use direct error codes. Next adds
kprobe API comments for the raw-address form and rejects it explicitly
for legacy tracefs/debugfs kprobes. Last adds PERF and LINK selftests
for the raw-address form, and checks that LEGACY rejects it.
---
Changes in v7:
- Change selftest line wrapping and assertions
Changes in v6:
- Split the kprobe/uprobe direct error-code fix into a separate patch
Changes in v5:
- Add kprobe API docs, use -EOPNOTSUPP, and switch selftests to LIBBPF_OPTS
Changes in v4:
- Inline raw-address error formatting and remove the probe_target buffer
Changes in v3:
- Drop bpf_kprobe_opts.addr and reuse offset when func_name is NULL
- Make legacy tracefs/debugfs kprobes reject the raw-address form
- Update selftests to cover PERF/LINK raw-address attach and LEGACY reject
Changes in v2:
- Fix line wrapping and indentation
====================
Link: https://patch.msgid.link/20260401143116.185049-1-hoyeon.lee@suse.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Currently, attach_probe covers manual single-kprobe attaches by
func_name, but not the raw-address form that the PMU-based
single-kprobe path can accept.
This commit adds PERF and LINK raw-address coverage. It resolves
SYS_NANOSLEEP_KPROBE_NAME through kallsyms, passes the absolute address
in bpf_kprobe_opts.offset with func_name = NULL, and verifies that
kprobe and kretprobe are still triggered. It also verifies that LEGACY
rejects the same form.
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-4-hoyeon.lee@suse.com
bpf_program__attach_kprobe_opts() documents single-kprobe attach
through func_name, with an optional offset. For the PMU-based path,
func_name = NULL with an absolute address in offset already works as
well, but that is not described in the API.
This commit clarifies this existing non-legacy behavior. For PMU-based
attach, callers can use func_name = NULL with an absolute address in
offset as the raw-address form. For legacy tracefs/debugfs kprobes,
reject this form explicitly.
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-3-hoyeon.lee@suse.com
perf_event_open_probe() and perf_event_{k,u}probe_open_legacy() helpers
are returning negative error codes directly on failure. This commit
changes bpf_program__attach_{k,u}probe_opts() to use those return
values directly instead of re-reading possibly changed errno.
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-2-hoyeon.lee@suse.com
Align bpf_program__clone() with bpf_object_load_prog() by gating
BTF func/line info on FEAT_BTF_FUNC kernel support, and resolve
caller-provided prog_btf_fd before checking obj->btf so that callers
with their own BTF can use clone() even when the object has no BTF
loaded.
While at it, treat func_info and line_info fields as atomic groups
to prevent mismatches between pointer and count from different sources.
Move bpf_program__clone() to libbpf 1.8.
Fixes: 970bd2dced ("libbpf: Introduce bpf_program__clone()")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260401151640.356419-1-mykyta.yatsenko5@gmail.com
Mykyta Yatsenko says:
====================
bpf: Migrate bpf_task_work and file dynptr to kmalloc_nolock
Now that kmalloc can be used from NMI context via kmalloc_nolock(),
migrate BPF internal allocations away from bpf_mem_alloc to use the
standard slab allocator.
Use kfree_rcu() for deferred freeing, which waits for a regular RCU
grace period before the memory is reclaimed. Sleepable BPF programs
hold rcu_read_lock_trace but not regular rcu_read_lock, so patch 1
adds explicit rcu_read_lock/unlock around the pointer-to-refcount
window to prevent kfree_rcu from freeing memory while a sleepable
program is still between reading the pointer and acquiring a
reference.
Patch 1 migrates bpf_task_work_ctx from bpf_mem_alloc/bpf_mem_free to
kmalloc_nolock/kfree_rcu.
Patch 2 migrates bpf_dynptr_file_impl from bpf_mem_alloc/bpf_mem_free
to kmalloc_nolock/kfree.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v2:
- Switch to scoped_guard in patch 1 (Kumar)
- Remove rcu gp wait in patch 2 (Kumar)
- Defer to irq_work when irqs disabled in patch 1
- use bpf_map_kmalloc_nolock() for bpf_task_work
- use kmalloc_nolock() for file dynptr
- Link to v1: https://lore.kernel.org/all/20260325-kmalloc_special-v1-0-269666afb1ea@meta.com/
====================
Link: https://patch.msgid.link/20260330-kmalloc_special-v2-0-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Replace bpf_mem_alloc/bpf_mem_free with kmalloc_nolock/kfree_nolock for
bpf_dynptr_file_impl, continuing the migration away from bpf_mem_alloc
now that kmalloc can be used from NMI context.
freader_cleanup() runs before kfree_nolock() while the dynptr still
holds exclusive access, so plain kfree_nolock() is safe — no concurrent
readers can access the object.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260330-kmalloc_special-v2-2-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Replace bpf_mem_alloc/bpf_mem_free with
kmalloc_nolock/kfree_rcu for bpf_task_work_ctx.
Replace guard(rcu_tasks_trace)() with guard(rcu)() in
bpf_task_work_irq(). The function only accesses ctx struct members
(not map values), so tasks trace protection is not needed - regular
RCU is sufficient since ctx is freed via kfree_rcu. The guard in
bpf_task_work_callback() remains as tasks trace since it accesses map
values from process context.
Sleepable BPF programs hold rcu_read_lock_trace but not
regular rcu_read_lock. Since kfree_rcu
waits for a regular RCU grace period, the ctx memory can be freed
while a sleepable program is still running. Add scoped_guard(rcu)
around the pointer read and refcount tryget in
bpf_task_work_acquire_ctx to close this race window.
Since kfree_rcu uses call_rcu internally which is not safe from
NMI context, defer destruction via irq_work when irqs are disabled.
For the lost-cmpxchg path the ctx was never published, so
kfree_nolock is safe.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260330-kmalloc_special-v2-1-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Leon Hwang says:
====================
bpf: Fix abuse of kprobe_write_ctx via freplace
The potential issue of kprobe_write_ctx+freplace was mentioned in
"bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs" [1].
It is true issue, that the test in patch #2 verifies that kprobe_write_ctx=false
kprobe progs can be abused to modify struct pt_regs via kprobe_write_ctx=true
freplace progs.
When struct pt_regs is modified, bpf_prog_test_run_opts() gets -EFAULT instead
of 0.
test_freplace_kprobe_write_ctx:FAIL:bpf_prog_test_run_opts unexpected error: -14 (errno 14)
We will disallow attaching freplace programs on kprobe programs with different
kprobe_write_ctx values.
Links:
[1] https://lore.kernel.org/bpf/CAP01T74w4KVMn9bEwpQXrk+bqcUxzb6VW1SQ_QvNy0A4EY-9Jg@mail.gmail.com/
Changes:
v2 -> v3:
* Add comment to the rejection of kprobe_write_ctx (per Jiri).
* Use libbpf_get_error() instead of errno in test (per Jiri).
* Collect Acked-by tags from Jiri and Song, thanks.
v2: https://lore.kernel.org/bpf/20260326141718.17731-1-leon.hwang@linux.dev/
v1 -> v2:
* Drop patch #1 in v1, as it wasn't an issue (per Toke).
* Check kprobe_write_ctx value at attach time instead of at load time, to
prevent attaching kprobe_write_ctx=true freplace progs on
kprobe_write_ctx=false kprobe progs (per Gemini/sashiko).
* Move kprobe_write_ctx test code to attach_probe.c and kprobe_write_ctx.c.
v1: https://lore.kernel.org/bpf/20260324150444.68166-1-leon.hwang@linux.dev/
====================
Link: https://patch.msgid.link/20260331145353.87606-1-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test to verify the issue: kprobe_write_ctx can be abused to modify
struct pt_regs of kernel functions via kprobe_write_ctx=true freplace
progs.
Without the fix, the issue is verified:
kprobe_write_ctx=true freplace prog is allowed to attach to
kprobe_write_ctx=false kprobe prog. Then, the first arg of
bpf_fentry_test1 will be set as 0, and bpf_prog_test_run_opts() gets
-EFAULT instead of 0.
With the fix, the issue is rejected at attach time.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260331145353.87606-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
uprobe programs are allowed to modify struct pt_regs.
Since the actual program type of uprobe is KPROBE, it can be abused to
modify struct pt_regs via kprobe+freplace when the kprobe attaches to
kernel functions.
For example,
SEC("?kprobe")
int kprobe(struct pt_regs *regs)
{
return 0;
}
SEC("?freplace")
int freplace_kprobe(struct pt_regs *regs)
{
regs->di = 0;
return 0;
}
freplace_kprobe prog will attach to kprobe prog.
kprobe prog will attach to a kernel function.
Without this patch, when the kernel function runs, its first arg will
always be set as 0 via the freplace_kprobe prog.
To fix the abuse of kprobe_write_ctx=true via kprobe+freplace, disallow
attaching freplace programs on kprobe programs with different
kprobe_write_ctx values.
Fixes: 7384893d97 ("bpf: Allow uprobe program to change context registers")
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260331145353.87606-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When running veristat across many BPF objects, expected load failures
produce noisy stderr output that obscures actual issues. Gate these
diagnostic messages behind --verbose.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260331172634.57402-2-mykyta.yatsenko5@gmail.com
Add the testing to access the bpf_ringbuf with the map pointer.
"consumer_pos" and "producer_pos" is accessed in this testing. We reserve
128 bytes in the ringbuf to test the producer_pos, which should be
"128 + BPF_RINGBUF_HDR_SZ".
It will be helpful if we want to evaluate the usage of the ringbuf in bpf
prog with the consumer and producer position.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/bpf/20260331070434.10037-1-dongml2@chinatelecom.cn
Clarify bpf_ringbuf_discard() documentation for BPF_RB_NO_WAKEUP.
Discarded ring buffer records are still left in the ring buffer and are
only skipped when user space consumes them. This can matter when
BPF_RB_NO_WAKEUP is used: a later submit relying on adaptive wakeup
might not wake the consumer, because the discarded record still needs to
be consumed first.
Scenario:
epoll_wait(rb_fd); // blocks
rec = bpf_ringbuf_reserve(&rb, ...);
bpf_ringbuf_discard(rec, BPF_RB_NO_WAKEUP);
rec = bpf_ringbuf_reserve(&rb, ...);
bpf_ringbuf_submit(rec, 0); // valid record, but no wakeup
Document this in bpf_ringbuf_discard() to make the interaction between
discarded records, user-space consumption, and adaptive wakeups explicit.
Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260331130612.3762433-1-eyal.birger@gmail.com
----
v2: adapt wording per feedback from Andrii.
Add few more alu32 shift tests using div-by-zero on provably dead paths
to check both verifier and JIT xlation resp. runtime correctness.
If the verifier mistracks the result, it rejects due to the div by 0;
if the JIT computes a wrong value, then runtime hits the dead path and
retval changes.
# LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_subreg
[...]
#644/76 verifier_subreg/arsh32_imm1_value:OK
#644/77 verifier_subreg/lsh32_reg0_zero_extend_check:OK
#644/78 verifier_subreg/rsh32_reg0_zero_extend_check:OK
#644/79 verifier_subreg/arsh32_reg0_zero_extend_check:OK
#644/80 verifier_subreg/lsh32_imm31_value:OK
#644/81 verifier_subreg/rsh32_imm31_value:OK
#644/82 verifier_subreg/arsh32_imm31_value:OK
#644/83 verifier_subreg/lsh32_unknown_precise_bounds:OK
#644/84 verifier_subreg/rsh32_unknown_bounds:OK
#644 verifier_subreg:OK
Summary: 1/84 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260327220629.343327-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update selftests to use the new non-_impl kfuncs marked with
KF_IMPLICIT_ARGS by removing redundant declarations and macros from
bpf_experimental.h (the new kfuncs are present in the vmlinux.h) and
updating relevant callsites.
Fix spin_lock verifier-log matching for lock_id_kptr_preserve by
accepting variable instruction numbers. The calls to kfuncs with
implicit arguments do not have register moves (e.g. r5 = 0)
corresponding to dummy arguments anymore, so the order of instructions
has shifted.
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260327203241.3365046-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The following kfuncs currently accept void *meta__ign argument:
* bpf_obj_new_impl
* bpf_obj_drop_impl
* bpf_percpu_obj_new_impl
* bpf_percpu_obj_drop_impl
* bpf_refcount_acquire_impl
* bpf_list_push_back_impl
* bpf_list_push_front_impl
* bpf_rbtree_add_impl
The __ign suffix is an indicator for the verifier to skip the argument
in check_kfunc_args(). Then, in fixup_kfunc_call() the verifier may
set the value of this argument to struct btf_struct_meta *
kptr_struct_meta from insn_aux_data.
BPF programs must pass a dummy NULL value when calling these kfuncs.
Additionally, the list and rbtree _impl kfuncs also accept an implicit
u64 argument, which doesn't require __ign suffix because it's a
scalar, and BPF programs explicitly pass 0.
Add new kfuncs with KF_IMPLICIT_ARGS [1], that correspond to each
_impl kfunc accepting meta__ign. The existing _impl kfuncs remain
unchanged for backwards compatibility.
To support this, add "btf_struct_meta" to the list of recognized
implicit argument types in resolve_btfids.
Implement is_kfunc_arg_implicit() in the verifier, that determines
implicit args by inspecting both a non-_impl BTF prototype of the
kfunc.
Update the special_kfunc_list in the verifier and relevant checks to
support both the old _impl and the new KF_IMPLICIT_ARGS variants of
btf_struct_meta users.
[1] https://lore.kernel.org/bpf/20260120222638.3976562-1-ihor.solodrai@linux.dev/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260327203241.3365046-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Christian Brauner says:
====================
bpf: classify block device hooks and add selftests
A bunch of new hooks for managing block devices were added a while ago
but they weren't appropriately classified. Classify them and add a test
program so we catch regressions.
Note that for whatever reason building the bpf selftests locally seems
to fail for all kinds of arcane reasons for me. That might just be my
fault. I added a pr against the ci to have the selftests run but to test
this meaningfully it needs veritysetup and dmverity support. I'm not
sure if that's available already.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v2:
- No changes.
- Link to v1: https://patch.msgid.link/20260220-work-bpf-bdev-v1-0-c53e852c4702@kernel.org
---
====================
Link: https://patch.msgid.link/20260326-work-bpf-bdev-v2-0-5e3c58963987@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A bunch of new hooks for managing block devices were added a while ago
but they weren't actually appropriately classified.
* bpf_lsm_bdev_alloc() is called when the inode for the block
device is allocated. This happens from a sleepable context so mark the
function as sleepable. When this function is called the memory for the
block device storage embedded into the inode is zeroed. That block
device cannot be meaningfully reference or interacted with at this
point. So mark it as untrusted for now.
* bpf_lsm_bdev_free() is called when the inode for the block
device is freed. A bunch of memory associated with the block device
has already been freed and there's dangling pointers in there. So mark
it as untrusted. It cannot be meaningfully referenced or interacted
with anymore. It is also called from sb->s_op->free_inode:: which
means it runs in rcu context (most of the times). So leave it as
non-sleepable.
* bpf_lsm_bdev_setintegrity() is called when a dm-verity device
is instantiated (glossing over details for simplicity of the commit
message). The block device is very much alive so it remains a trusted
hook. It's also called with device mapper's suspend lock held and so
the hook is able to sleep so mark it sleepable.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20260326-work-bpf-bdev-v2-1-5e3c58963987@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alan Maguire says:
====================
Add BTF layout to BTF
Update struct btf_header to add a new "layout" section containing
a description of how to parse the BTF kinds known about at BTF
encoding time. This provides the opportunity for tools that might
not know all of these kinds - as is the case when older tools run
on more newly-generated BTF - to still parse the BTF provided,
even if it cannot all be used.
The ideas here were discussed at [1], with further discussion
at [2].
Patches for pahole will enable the layout addition during BTF
generation are at [3], but even absent these the addition of the
layout feature in the final patch in this series should not break
anything since such unknown features are simply ignored during pahole
BTF generation.
Separately tested sanitization of BTF location info with separate
small series which simulates feature absence to support testing of
features for older kernels; will follow up with that shortly.
Changes since v15 [4]:
- Fixed endian issues for layout section by swapping flags fields
where needed (sashiko.dev, patch 2)
- Fixed string size issue with swapped endian case, use btf->magic
for comparison to determine endian mismatch (bpf review bot,
sashiko.dev, patch 6)
Changes since v14 [5]:
- Fix potential overflow for swapped endian case (BPF review bot,
patch 2)
- Add global: keyword to libbpf.map (sashiko.dev, patch 4)
- Fix endian issues in sanitization; we use the endian safe
btf->hdr and check for endian mismatch between it and raw original
BTF header to inform how we write the change str_off. Also fix
potential truncation issues due to not including hdr->type_off
(sashiko.dev, patch 6)
- Fix issues with selftests raw BTF file interactions (sashiko.dev,
patch 8)
- Drop feature test test since it will be covered by another series
Changes since v13: [6]:
- add feature check/sanitization of BTF with layout info (Andrii,
patch 6)
- added feature check test for layout support (patch 9)
Changes since v12: [7]:
- add logging of layout off/len to kernel header logging (review bot,
patch 6)
- add mode to open() in selftest (review bot, patch 7)
Changes since v11 [8]:
- Revert unneeded changes to btf__new_empty() (Eduard, review bot,
patch 4)
- Reorder btf_parse_layout_sec() checks to ensure min size check
occurs before multiple check (review bot, patch 6)
Changes since v10 [9]:
- deal with read overflow with small header (review bot, patch 2)
- validate layout length is a multiple of sizeof(struct btf_layout)
(review bot, patch 6)
- fix comment style (Alexei, patches 4,7)
- remove bpftool BTF metadata subcommands for now (Alexei)
Changes since v9: [10]:
- fix memcpy header size overrun (review bot, patch 2)
- return size computation directly (Andrii, patch 333)
- revert to original unknown kind logging (Alexei/review bot,
patch 6)
- gap-checking logic can be simplified now that we have
4-byte aligned types and layout together (patch 6)
- fix naming of layout offset, unconditionally emit a layouts
array in json (Quentin, review bot, patch 8)
- fix metadata output in man page to include flags (review bot,
patch 9)
Changes since v8: [11]:
- updated name from "kind_layout" to "layout" (Andrii)
- moved layout info to inbetween types and strings since
both types and layout info align on 4 bytes (Andrii)
- use embedded btf_header (Eduard)
- when consulting layout, fall back to base BTF if none found in
split BTF; this will allow us to only encode layout info in
vmlinux rather than repeating it for each module.
Changes since v7: [12]:
- Fixed comment style in UAPI headers (Mykyta, patch 1)
- Simplify calcuation of header size using min() (Mykyta, patch 2)
- simplify computation of bounds for kind (Mykyta, patch 3)
- Added utility functions for updating type, string offsets when
data is added; this simplifies the code and encapsulates such
updates more clearly (patch 2)
Changes since v6: [13]:
- BPF review bot caught some memory leaks around freeing
of kind layout; more importantly, it noted that we were
breaking with the contiguous BTF representation for
btf_new_empty_opts(). Doing so meant that freeing kind_layout
could not be predicated on having btf->modifiable set, so
adpoted the contiguous raw data layout for BTF to be
consistent with type/string storage (patches 2,4)
- Moved checks for kind overflow prior to referencing kinds
to avoid any risk of overrun (patches 3, 8)
- Tightened up kind layout header offset/len header validation
to catch invalid combinations early in btf_parse_hdr()
(patch 2)
- Fixed selftest to verify calloc success (patch 7)
Changes since v5: [14]:
- removed flags field from kind layout; it is not really workable
since we would have to define semantics of all possible future
flags today to be usable. Instead stick to parsing only, which
means each kind just needs the length of the singular and
vlen-specified objects (Alexei)
- added documentation for bpftool BTF metadata dump (Quentin, patch 9)
Changes since v4: [15]:
- removed CRC generation since it is not needed to handle modules
built at different time than kernel; distilled base BTF supports
this now
- fixed up bpftool display of empty kind names, comment/documentation
indentation (Quentin, patches 8, 9)
Changes since v3 [16]:
- fixed mismerge issues with kbuild changes for BTF generation
(patches 9, 14)
- fixed a few small issues in libbpf with kind layout representation
(patches 2, 4)
Changes since v2 [17]:
- drop "optional" kind flag (Andrii, patch 1)
- allocate "struct btf_header" for struct btf to ensure
we can always access new fields (Andrii, patch 2)
- use an internal BTF kind array in btf.c to simplify
kind encoding (Andrii, patch 2)
- drop use of kind layout information for in-kernel parsing,
since the kernel needs to be strict in what it accepts
(Andrii, patch 6)
- added CRC verification for BTF objects and for matching
with base object (Alexei, patches 7,8)
- fixed bpftool json output (Quentin, patch 10)
- added standalone module BTF support, tests (patches 13-17)
Changes since RFC
- Terminology change from meta -> kind_layout
(Alexei and Andrii)
- Simplify representation, removing meta header
and just having kind layout section (Alexei)
- Fixed bpftool to have JSON support, support
prefix match, documented changes (Quentin)
- Separated metadata opts into add_kind_layout
and add_crc
- Added additional positive/negative tests
to cover basic unknown kind, one with an
info_sz object following it and one with
N elem_sz elements following it.
- Updated pahole-flags to use help output
rather than version to see if features
are present
[1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
[2] https://lore.kernel.org/bpf/20230531201936.1992188-1-alan.maguire@oracle.com/
[3] https://lore.kernel.org/dwarves/20260226085240.1908874-1-alan.maguire@oracle.com/
[4] https://lore.kernel.org/bpf/20260324174450.1570809-1-alan.maguire@oracle.com/
[5] https://lore.kernel.org/bpf/20260318132927.1142388-1-alan.maguire@oracle.com/
[6] https://lore.kernel.org/bpf/20260306113630.1281527-1-alan.maguire@oracle.com/
[7] https://lore.kernel.org/bpf/20260303182003.117483-1-alan.maguire@oracle.com/
[8] https://lore.kernel.org/bpf/20260302114059.3697879-1-alan.maguire@oracle.com/
[9] https://lore.kernel.org/bpf/20260227100426.2585191-1-alan.maguire@oracle.com/
[10] https://lore.kernel.org/bpf/20260226085624.1909682-1-alan.maguire@oracle.com/
[11] https://lore.kernel.org/bpf/20251215091730.1188790-1-alan.maguire@oracle.com/
[12] https://lore.kernel.org/dwarves/20251211164646.1219122-1-alan.maguire@oracle.com/
[13] https://lore.kernel.org/bpf/20251210203243.814529-1-alan.maguire@oracle.com/
[14] https://lore.kernel.org/bpf/20250528095743.791722-1-alan.maguire@oracle.com/
[15] https://lore.kernel.org/bpf/20231112124834.388735-1-alan.maguire@oracle.com/
[16] https://lore.kernel.org/bpf/20231110110304.63910-1-alan.maguire@oracle.com/
[17] https://lore.kernel.org/bpf/20230616171728.530116-1-alan.maguire@oracle.com/
====================
Link: https://patch.msgid.link/20260326145444.2076244-1-alan.maguire@oracle.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
The "layout" feature will add metadata about BTF kinds to the
generated BTF; its absence in pahole will not trigger an error so it
is safe to add unconditionally as it will simply be ignored if pahole
does not support it.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-10-alan.maguire@oracle.com
verify btf__new_empty_opts() adds layouts for all kinds supported,
and after adding kind-related types for an unknown kind, ensure that
parsing uses this info when that kind is encountered rather than
giving up.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-9-alan.maguire@oracle.com
Validate layout if present, but because the kernel must be
strict in what it accepts, reject BTF with unsupported kinds,
even if they are in the layout information.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-8-alan.maguire@oracle.com
Add a FEAT_BTF_LAYOUT feature check which checks if the
kernel supports BTF layout information. Also sanitize
BTF if it contains layout data but the kernel does not
support it. The sanitization requires rewriting raw
BTF data to update the header and eliminate the layout
section (since it lies between the types and strings),
so refactor sanitization to do the raw BTF retrieval
and creation of updated BTF, returning that new BTF
on success.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-7-alan.maguire@oracle.com
BTF parsing can use layout to navigate unknown kinds, so
btf_validate_type() should take layout information into
account to avoid failure when an unrecognized kind is met.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-6-alan.maguire@oracle.com
Support encoding of BTF layout data via btf__new_empty_opts().
Current supported opts are base_btf and add_layout.
Layout information is maintained in btf.c in the layouts[] array;
when BTF is created with the add_layout option it represents the
current view of supported BTF kinds.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-5-alan.maguire@oracle.com
This allows BTF parsing to proceed even if we do not know the
kind. Fall back to base BTF layout if layout information is
not in split BTF.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-4-alan.maguire@oracle.com
Support reading in layout fixing endian issues on reading;
also support writing layout section to raw BTF object.
There is not yet an API to populate the layout with meaningful
information.
As part of this, we need to consider multiple valid BTF header
sizes; the original or the layout-extended headers.
So to support this, the "struct btf" representation is modified
to contain a "struct btf_header" and we copy the valid
portion from the raw data to it; this means we can always safely
check fields like btf->hdr.layout_len .
Note if parsed-in BTF has extra header information beyond
sizeof(struct btf_header) - if so we make that BTF ineligible
for modification by setting btf->has_hdr_extra .
Ensure that we handle endianness issues for BTF layout section,
though currently only field that needs this (flags) is unused.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-3-alan.maguire@oracle.com
BTF kind layouts provide information to parse BTF kinds. By separating
parsing BTF from using all the information it provides, we allow BTF
to encode new features even if they cannot be used by readers. This
will be helpful in particular for cases where older tools are used
to parse newer BTF with kinds the older tools do not recognize;
the BTF can still be parsed in such cases using kind layout.
The intent is to support encoding of kind layouts optionally so that
tools like pahole can add this information. For each kind, we record
- length of singular element following struct btf_type
- length of each of the btf_vlen() elements following
- a (currently unused) flags field
The ideas here were discussed at [1], [2]; hence
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-2-alan.maguire@oracle.com
[1] https://lore.kernel.org/bpf/CAEf4BzYjWHRdNNw4B=eOXOs_ONrDwrgX4bn=Nuc1g8JPFC34MA@mail.gmail.com/
[2] https://lore.kernel.org/bpf/20230531201936.1992188-1-alan.maguire@oracle.com/
The verifier log output may contain multiple lines that start with
18: (bf) r0 = r6
teach reg_bounds to look for lines that have ';' in them,
since reg_bounds test is looking for:
18: (bf) r0 = r6 ; R0=... R6=...
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260325012242.45606-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test to make sure that variable length stack writes
scrubs STACK_SPILL into STACK_MISC.
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260324215938.81733-2-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The DEVMAP_HASH branch in dev_map_redirect_multi() uses
hlist_for_each_entry_safe() to iterate hash buckets, but this function
runs under RCU protection (called from xdp_do_generic_redirect_map()
in softirq context). Concurrent writers (__dev_map_hash_update_elem,
dev_map_hash_delete_elem) modify the list using RCU primitives
(hlist_add_head_rcu, hlist_del_rcu).
hlist_for_each_entry_safe() performs plain pointer dereferences without
rcu_dereference(), missing the acquire barrier needed to pair with
writers' rcu_assign_pointer(). On weakly-ordered architectures (ARM64,
POWER), a reader can observe a partially-constructed node. It also
defeats CONFIG_PROVE_RCU lockdep validation and KCSAN data-race
detection.
Replace with hlist_for_each_entry_rcu() using rcu_read_lock_bh_held()
as the lockdep condition, consistent with the rcu_dereference_check()
used in the DEVMAP (non-hash) branch of the same functions. Also fix
the same incorrect lockdep_is_held(&dtab->index_lock) condition in
dev_map_enqueue_multi(), where the lock is not held either.
Fixes: e624d4ed4a ("xdp: Extend xdp_redirect_map with broadcast support")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260320072645.16731-1-devnexen@gmail.com
trampoline_count fills all trampoline attachment slots for a single
target function and verifies that one extra attach fails with -E2BIG.
It currently targets bpf_modify_return_test, which is also used by
other selftests such as modify_return, get_func_ip_test, and
get_func_args_test. When such tests run in parallel, they can contend
for the same per-function trampoline quota and cause unexpected attach
failures. This issue is currently masked by harness serialization.
Move trampoline_count to a dedicated bpf_testmod target and register it
for fmod_ret attachment. Also route the final trigger through
trigger_module_test_read(), so the execution path exercises the same
dedicated target.
This keeps the test semantics unchanged while isolating it from other
selftests, so it no longer needs to run in serial mode. Remove the
TODO comment as well.
Tested:
./test_progs -t trampoline_count -vv
./test_progs -j$(nproc) -t trampoline_count -vv
./test_progs -j$(nproc) -t \
trampoline_count,modify_return,get_func_ip_test,get_func_args_test -vv
20 runs of:
./test_progs -j$(nproc) -t \
trampoline_count,modify_return,get_func_ip_test,get_func_args_test
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260324044949.869801-1-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Previously I added a FIONREAD test for sockmap, but it can occasionally
fail in CI [1].
The test sends 10 bytes in two segments (2 + 8). For UDP, FIONREAD only
reports the length of the first datagram, not the total queued data.
The original code used recv_timeout() expecting all 10 bytes, but under
high system load, the second datagram may not yet be processed by the
protocol stack, so recv would only return the first 2-byte datagram,
causing a size mismatch failure.
Fix this by receiving exactly the expected bytes (matching FIONREAD) in
the first recv. The remaining datagram is then consumed in a second recv
block, which is only reachable for UDP since TCP's expected already
equals sizeof(buf).
Test:
./test_progs -a sockmap_basic
410/1 sockmap_basic/sockmap create_update_free:OK
...
Summary: 1/35 PASSED, 0 SKIPPED, 0 FAILED
[1] https://github.com/kernel-patches/bpf/actions/runs/22919385910/job/66515395423
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Fixes: 17e2ce02bf ("selftests/bpf: Add tests for FIONREAD and copied_seq")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://lore.kernel.org/r/20260312072549.6766-1-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A test failure was discovered in BPF CI [1] caused by connection timeout.
The current test timeout of 500ms is insufficient for CI environments,
particularly under high load.
While the optimal timeout is unclear, this test was converted from the
original test_tc_tunnel.sh script. The original script used nc with "-w 1"
to specify a 1-second timeout [2]. Therefore, this test restores the
timeout to 1s.
Test:
./test_progs -a tc_tunnel
#478/1 tc_tunnel/ipip_none:OK
#478/2 tc_tunnel/ipip6_none:OK
#478/3 tc_tunnel/ip6tnl_none:OK
#478/4 tc_tunnel/sit_none:OK
#478/5 tc_tunnel/vxlan_eth:OK
#478/6 tc_tunnel/ip6vxlan_eth:OK
#478/7 tc_tunnel/gre_none:OK
#478/8 tc_tunnel/gre_eth:OK
#478/9 tc_tunnel/gre_mpls:OK
#478/10 tc_tunnel/ip6gre_none:OK
#478/11 tc_tunnel/ip6gre_eth:OK
#478/12 tc_tunnel/ip6gre_mpls:OK
#478/13 tc_tunnel/udp_none:OK
#478/14 tc_tunnel/udp_eth:OK
#478/15 tc_tunnel/udp_mpls:OK
#478/16 tc_tunnel/ip6udp_none:OK
#478/17 tc_tunnel/ip6udp_eth:OK
#478/18 tc_tunnel/ip6udp_mpls:OK
#478 tc_tunnel:OK
Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED
[1] https://github.com/kernel-patches/bpf/actions/runs/22674350732/job/65728072723
[2] https://lore.kernel.org/all/20251027-tc_tunnel-v3-4-505c12019f9d@bootlin.com/
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://lore.kernel.org/r/20260312083615.31835-1-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The function btf_check_kfunc_arg_match() was refactored into
check_kfunc_args() by commit 00b85860fe ("bpf: Rewrite kfunc
argument handling"). Update the comment accordingly.
Assisted-by: unnamed:deepseek-v3.2 coccinelle
Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20260321105658.6006-1-kexinsun@smail.nju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Slava Imameev says:
====================
bpf: Add multi-level pointer parameter support for trampolines
This is v6 of a series adding support for new pointer types for
trampoline parameters.
Originally, only support for multi-level pointers was proposed.
As suggested during review, it was extended to single-level pointers.
During discussion, it was proposed to add support for any single or
multi-level pointer type that is not a single-level pointer to a
structure, with the condition if (!btf_type_is_struct(t)). The safety
of this condition is based on BTF data verification performed for
modules and programs, and vmlinux BTF being trusted to not contain
invalid types, so it is not possible for invalid types, like
PTR->DATASEC, PTR->FUNC, PTR->VAR and corresponding multi-level
pointers, to reach btf_ctx_access.
These changes appear to be a safe extension since any future support
for arrays and output values would require annotation (similar to
Microsoft SAL), which differentiates between current unannotated
scalar cases and new annotated cases.
This series adds BPF verifier support for single- and multi-level
pointer parameters and return values in BPF trampolines. The
implementation treats these parameters as SCALAR_VALUE.
This is consistent with existing pointers to int and void that are
already treated as SCALAR.
This provides consistent logic for single- and multi-level pointers:
if the type is treated as SCALAR for a single-level pointer, the
same applies to multi-level pointers, except for pointers to structs
which are currently PTR_TO_BTF_ID. However, in the case of
multi-level pointers, they are treated as scalar since the verifier
lacks the context to infer the size of their target memory regions.
Background:
Prior to these changes, accessing multi-level pointer parameters or
return values through BPF trampoline context arrays resulted in
verification failures in btf_ctx_access, producing errors such as:
func '%s' arg%d type %s is not a struct
For example, consider a BPF program that logs an input parameter of
type struct posix_acl **:
SEC("fentry/__posix_acl_chmod")
int BPF_PROG(trace_posix_acl_chmod, struct posix_acl **ppacl, gfp_t gfp,
umode_t mode)
{
bpf_printk("__posix_acl_chmod ppacl = %px\n", ppacl);
return 0;
}
This program failed BPF verification with the following error:
libbpf: prog 'trace_posix_acl_chmod': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; int BPF_PROG(trace_posix_acl_chmod, struct posix_acl **ppacl,
gfp_t gfp, umode_t mode) @ posix_acl_monitor.bpf.c:23
0: (79) r6 = *(u64 *)(r1 +16) ; R1=ctx() R6_w=scalar()
1: (79) r1 = *(u64 *)(r1 +0)
func '__posix_acl_chmod' arg0 type PTR is not a struct
invalid bpf_context access off=0 size=8
processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
-- END PROG LOAD LOG --
The common workaround involved using helper functions to fetch
parameter values by passing the address of the context array entry:
SEC("fentry/__posix_acl_chmod")
int BPF_PROG(trace_posix_acl_chmod, struct posix_acl **ppacl, gfp_t gfp,
umode_t mode)
{
struct posix_acl **pp;
bpf_probe_read_kernel(&pp, sizeof(ppacl), &ctx[0]);
bpf_printk("__posix_acl_chmod %px\n", pp);
return 0;
}
This approach introduced helper call overhead and created
inconsistency with parameter access patterns.
Improvements:
With this patch, trampoline programs can directly access multi-level
pointer parameters, eliminating helper call overhead and explicit ctx
access while ensuring consistent parameter handling. For example, the
following ctx access with a helper call:
SEC("fentry/__posix_acl_chmod")
int BPF_PROG(trace_posix_acl_chmod, struct posix_acl **ppacl, gfp_t gfp,
umode_t mode)
{
struct posix_acl **pp;
bpf_probe_read_kernel(&pp, sizeof(pp), &ctx[0]);
bpf_printk("__posix_acl_chmod %px\n", pp);
...
}
is replaced by a load instruction:
SEC("fentry/__posix_acl_chmod")
int BPF_PROG(trace_posix_acl_chmod, struct posix_acl **ppacl, gfp_t gfp,
umode_t mode)
{
bpf_printk("__posix_acl_chmod %px\n", ppacl);
...
}
The bpf_core_cast macro can be used for deeper level dereferences.
v1 -> v2:
* corrected maintainer's email
v2 -> v3:
* Addressed reviewers' feedback:
* Changed the register type from PTR_TO_MEM to SCALAR_VALUE.
* Modified tests to accommodate SCALAR_VALUE handling.
* Fixed a compilation error for loongarch
* https://lore.kernel.org/oe-kbuild-all/202602181710.tEK6nOl6-lkp@intel.com/
* Addressed AI bot review
* Added a commentary to address a NULL pointer case
* Removed WARN_ON
* Fixed a commentary
v3 -> v4:
* Added more consistent support for single and multi-level pointers
as suggested by reviewers.
* added single level pointers to enum 32 and 64
* added single level pointers to functions
* harmonized support for single and multi-level pointer types
* added new tests to support the above changes
* Removed create_bad_kaddr that allocated and invalidated kernel VA
for tests, and replaced it with hardcoded values similar to
bpf_testmod_return_ptr as suggested by reviewers.
v4 -> v5:
* As suggested, extended support to single-level pointers and
covered all supported valid pointer (single- and multi-level) types
with a wider condition if (!btf_type_is_struct(t)).
* As requested, simplified tests by keeping only tests that check
the verifier log for scalar().
v5 -> v6:
* Fixed the test message based on the bot's feedback.
====================
Link: https://patch.msgid.link/20260314082127.7939-1-slava.imameev@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add single and multi-level pointer parameters and return value test
coverage for BPF trampolines. Includes verifier tests for single and
multi-level pointers. The tests check verifier logs for pointers
inferred as scalar() type.
Signed-off-by: Slava Imameev <slava.imameev@crowdstrike.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260314082127.7939-3-slava.imameev@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add BPF verifier support for single- and multi-level pointer
parameters and return values in BPF trampolines by treating these
parameters as SCALAR_VALUE.
This extends the existing support for int and void pointers that are
already treated as SCALAR_VALUE.
This provides consistent logic for single and multi-level pointers:
if a type is treated as SCALAR for a single-level pointer, the same
applies to multi-level pointers. The exception is pointer-to-struct,
which is currently PTR_TO_BTF_ID for single-level but treated as
scalar for multi-level pointers since the verifier lacks context
to infer the size of target memory regions.
Safety is ensured by existing BTF verification, which rejects invalid
pointer types at the BTF verification stage.
Signed-off-by: Slava Imameev <slava.imameev@crowdstrike.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260314082127.7939-2-slava.imameev@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a test verifying that stacksafe() correctly handles 32-bit scalar
spills when comparing stack states for equivalence during state pruning.
A 32-bit scalar spill creates slot[0-3] = STACK_INVALID and
slot[4-7] = STACK_SPILL. Without the im=4 check in stacksafe(), the
STACK_SPILL vs STACK_MISC mismatch at byte 4 causes pruning to fail,
forcing the verifier to re-explore a path that is provably safe.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260323022410.75444-2-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Per discussion with Alexei, add Eduard and myself as maintainers under
BPF [GENERAL]. While at it, drop R entries for reviewers who have been
inactive.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260324152230.2916217-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a selftest to verify that the verifier correctly identifies refcounted
arguments in struct_ops programs, even when they are not the first
argument. This ensures that the restriction on tail calls for programs
with __ref arguments is properly enforced regardless of which argument
they appear in.
This test verifies the fix for check_struct_ops_btf_id() proposed by
Keisuke Nishimura [0], which corrected a bug where only the first
argument was checked for the refcounted flag.
The test includes:
- An update to bpf_testmod to add 'test_refcounted_multi', an operator with
three arguments where the third is tagged with "__ref".
- A BPF program 'test_refcounted_multi' that attempts a tail call.
- A test runner that asserts the verifier rejects the program with
"program with __ref argument cannot tail call".
[0]: https://lore.kernel.org/bpf/20260320130219.63711-1-keisuke.nishimura@inria.fr/
Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Link: https://lore.kernel.org/r/20260321214038.80479-1-varunrmallya@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The current implementation only checks whether the first argument is
refcounted. Fix this by iterating over all arguments.
Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr>
Fixes: 38f1e66abd ("bpf: Do not allow tail call in strcut_ops program with __ref argument")
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260320130219.63711-1-keisuke.nishimura@inria.fr
Signed-off-by: Alexei Starovoitov <ast@kernel.org>