linux/kernel
Kumar Kartikeya Dwivedi 61df10c779 bpf: Allow storing unreferenced kptr in map
This commit introduces a new pointer type 'kptr' which can be embedded
in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during
its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID
register must have the same type as in the map value's BTF, and loading
a kptr marks the destination register as PTR_TO_BTF_ID with the correct
kernel BTF and BTF ID.

Such kptr are unreferenced, i.e. by the time another invocation of the
BPF program loads this pointer, the object which the pointer points to
may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
patched to PROBE_MEM loads by the verifier, it would safe to allow user
to still access such invalid pointer, but passing such pointers into
BPF helpers and kfuncs should not be permitted. A future patch in this
series will close this gap.

The flexibility offered by allowing programs to dereference such invalid
pointers while being safe at runtime frees the verifier from doing
complex lifetime tracking. As long as the user may ensure that the
object remains valid, it can ensure data read by it from the kernel
object is valid.

The user indicates that a certain pointer must be treated as kptr
capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
information is recorded in the object BTF which will be passed into the
kernel by way of map's BTF information. The name and kind from the map
value BTF is used to look up the in-kernel type, and the actual BTF and
BTF ID is recorded in the map struct in a new kptr_off_tab member. For
now, only storing pointers to structs is permitted.

An example of this specification is shown below:

	#define __kptr __attribute__((btf_type_tag("kptr")))

	struct map_value {
		...
		struct task_struct __kptr *task;
		...
	};

Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
task_struct into the map, and then load it later.

Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
the verifier cannot know whether the value is NULL or not statically, it
must treat all potential loads at that map value offset as loading a
possibly NULL pointer.

Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL)
are allowed instructions that can access such a pointer. On BPF_LDX, the
destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
it is checked whether the source register type is a PTR_TO_BTF_ID with
same BTF type as specified in the map BTF. The access size must always
be BPF_DW.

For the map in map support, the kptr_off_tab for outer map is copied
from the inner map's kptr_off_tab. It was chosen to do a deep copy
instead of introducing a refcount to kptr_off_tab, because the copy only
needs to be done when paramterizing using inner_map_fd in the map in map
case, hence would be unnecessary for all other users.

It is not permitted to use MAP_FREEZE command and mmap for BPF map
having kptrs, similar to the bpf_timer case. A kptr also requires that
BPF program has both read and write access to the map (hence both
BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed).

Note that check_map_access must be called from both
check_helper_mem_access and for the BPF instructions, hence the kptr
check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and
reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src
and reuse it for this purpose.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com
2022-04-25 17:31:35 -07:00
..
bpf bpf: Allow storing unreferenced kptr in map 2022-04-25 17:31:35 -07:00
cgroup Driver core changes for 5.18-rc1 2022-03-28 12:41:28 -07:00
configs Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
debug kdb: Fix the putarea helper function 2022-03-24 16:39:47 +00:00
dma dma-mapping: move pgprot_decrypted out of dma_pgprot 2022-04-01 06:46:51 +02:00
entry Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" 2022-03-31 10:36:55 +02:00
events asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
futex mm/truncate: Inline invalidate_complete_page() into its one caller 2022-03-21 12:59:01 -04:00
gcov gcov: Remove compiler version check 2021-12-02 17:25:21 +09:00
irq Changes in this cycle were: 2022-03-22 14:39:12 -07:00
kcsan KCSAN updates for v5.17 2022-01-11 09:51:26 -08:00
livepatch ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
locking Changes in this cycle were: 2022-03-22 13:44:21 -07:00
power for-5.18/block-2022-03-18 2022-03-21 16:48:55 -07:00
printk printk changes for 5.18 2022-03-23 10:54:27 -07:00
rcu Changes in this cycle were: 2022-03-22 14:39:12 -07:00
sched ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
time ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
trace bpf: Move rcu lock management out of BPF_PROG_RUN routines 2022-04-19 09:45:47 -07:00
.gitignore
acct.c kernel: remove spurious blkdev.h includes 2021-10-18 06:17:01 -06:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-03 11:20:34 -08:00
audit_fsnotify.c fsnotify: clarify contract for create event hooks 2021-10-27 12:32:34 +02:00
audit_tree.c audit: use struct_size() helper in kmalloc() 2021-12-14 17:39:42 -05:00
audit_watch.c \n 2021-11-06 16:43:20 -07:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-01-25 13:22:51 -05:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-02-22 13:51:40 -05:00
auditfilter.c audit/stable-5.17 PR 20220110 2022-01-11 13:08:21 -08:00
auditsc.c audit/stable-5.18 PR 20220321 2022-03-21 20:53:11 -07:00
backtracetest.c
bounds.c
capability.c xfs: don't generate selinux audit messages for capability testing 2022-03-09 10:32:06 -08:00
cfi.c cfi: Use rcu_read_{un}lock_sched_notrace 2021-08-11 13:11:12 -07:00
compat.c arch: remove compat_alloc_user_space 2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
cpu.c Changes in this cycle were: 2022-03-22 14:39:12 -07:00
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-25 12:20:55 -08:00
crash_dump.c
cred.c x86: Mark __invalid_creds() __noreturn 2022-03-15 10:32:44 +01:00
delayacct.c delayacct: track delays from memory compact 2022-01-20 08:52:55 +02:00
dma.c
exec_domain.c
exit.c ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
extable.c lkdtm: Really write into kernel text in WRITE_KERN 2022-02-16 23:25:12 +11:00
fail_function.c
fork.c kasan, arm64: reset pointer tags of vmapped stacks 2022-03-24 19:06:47 -07:00
freezer.c sched: Add get_current_state() 2021-06-18 11:43:08 +02:00
gen_kheaders.sh kbuild: clean up ${quiet} checks in shell scripts 2021-05-27 04:01:50 +09:00
groups.c
hung_task.c hung_task: move hung_task sysctl interface to hung_task.c 2022-01-22 08:33:34 +02:00
iomem.c
irq_work.c irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT 2021-10-15 11:25:18 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c kallsyms: Skip the name search for empty string 2022-03-17 20:17:18 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" 2022-03-31 10:36:55 +02:00
kcov.c kcov: properly handle subsequent mmap calls 2022-03-23 19:00:35 -07:00
kexec_core.c exit: Move oops specific logic from do_exit into make_task_dead 2021-12-13 12:04:45 -06:00
kexec_elf.c
kexec_file.c memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED 2021-11-06 13:30:42 -07:00
kexec_internal.h
kexec.c kexec: avoid compat_alloc_user_space 2021-09-08 15:32:34 -07:00
kheaders.c
kmod.c
kprobes.c kprobes: Use rethook for kretprobe if possible 2022-03-28 19:38:09 -07:00
ksysfs.c kernel/ksysfs.c: use helper macro __ATTR_RW 2022-03-23 19:00:33 -07:00
kthread.c asm-generic updates for 5.18 2022-03-23 18:03:08 -07:00
latencytop.c
Makefile kprobes: Use rethook for kretprobe if possible 2022-03-28 19:38:09 -07:00
module_decompress.c module: fix building with sysfs disabled 2022-02-16 12:51:32 -08:00
module_signature.c
module_signing.c
module-internal.h module: add in-kernel support for decompressing 2022-01-11 18:45:02 -08:00
module.c NFSD: Remove svc_serv_ops::svo_module 2022-02-28 10:26:40 -05:00
notifier.c notifier: Return an error when a callback has already been registered 2021-12-29 10:37:33 +01:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: replace cpumask_weight with cpumask_empty in padata.c 2022-01-31 11:21:46 +11:00
panic.c panic: move panic_print before kmsg dumpers 2022-03-23 19:00:35 -07:00
params.c kobject: remove kset from struct kset_uevent_ops callbacks 2021-12-28 11:26:18 +01:00
pid_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
pid.c pid: add pidfd_get_task() helper 2021-10-14 13:29:18 +02:00
profile.c exit: Remove profile_handoff_task 2022-01-08 12:43:57 -06:00
ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE 2022-03-22 13:06:05 -05:00
range.c
reboot.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-11-12 11:53:16 -08:00
regset.c
relay.c
resource_kunit.c
resource.c kernel/resource: fix kfree() of bootmem memory again 2022-03-23 19:00:35 -07:00
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-02-02 13:11:34 +01:00
scftorture.c scftorture: Always log error message 2021-12-07 16:36:17 -08:00
scs.c kasan, vmalloc: only tag normal vmalloc allocations 2022-03-24 19:06:48 -07:00
seccomp.c ptrace: Cleanups for v5.18 2022-03-28 17:29:53 -07:00
signal.c Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" 2022-03-31 10:36:55 +02:00
smp.c sched: Improve wake_up_all_idle_cpus() take #2 2021-10-22 15:32:46 +02:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c genirq, softirq: Use in_hardirq() instead of in_irq() 2022-02-02 21:34:19 +01:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-03 17:02:21 -08:00
stacktrace.c uaccess: remove CONFIG_SET_FS 2022-02-25 09:36:06 +01:00
static_call.c KVM: x86: allow defining return-0 static calls 2022-02-18 12:44:22 -05:00
stop_machine.c
sys_ni.c mm/mempolicy: wire up syscall set_mempolicy_home_node 2022-01-15 16:30:30 +02:00
sys.c prlimit: do not grab the tasklist_lock 2022-03-08 14:33:36 -06:00
sysctl-test.c kernel/sysctl-test: Remove some casts which are no-longer required 2021-06-23 16:41:24 -06:00
sysctl.c bpf: Move BPF sysctls from kernel/sysctl.c to BPF core 2022-04-13 21:36:56 +02:00
task_work.c resume_user_mode: Move to resume_user_mode.h 2022-03-10 16:51:50 -06:00
taskstats.c taskstats: remove unneeded dead assignment 2022-03-23 19:00:35 -07:00
torture.c torture: Wake up kthreads after storing task_struct pointer 2022-02-01 17:24:39 -08:00
tracepoint.c tracepoint: Fix kerneldoc comments 2021-08-16 11:39:51 -04:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-08 12:43:57 -06:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-17 09:11:57 -06:00
uid16.c
uid16.h
umh.c
up.c
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-02-25 10:40:14 -06:00
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: Free the page array when watch_queue is dismantled 2022-04-02 10:37:39 -07:00
watchdog_hld.c
watchdog.c sched/isolation: Use single feature type while referring to housekeeping cpumask 2022-02-16 15:57:55 +01:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00
workqueue.c Merge branch 'for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2022-03-23 12:40:51 -07:00