linux/kernel
Daniel Borkmann 54e8cf41b2 bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
[ Upstream commit fdadd04931 ]

Michael and Sandipan report:

  Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
  JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
  and is adjusted again at init time if MODULES_VADDR is defined.

  For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
  the compile-time default at boot-time, which is 0x9c400000 when
  using 64K page size. This overflows the signed 32-bit bpf_jit_limit
  value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

  and can cause various unexpected failures throughout the network
  stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
             16) = -1 ENOTSUPP (Unknown error 524)

  and similar failures can be seen with tools like tcpdump. This doesn't
  always reproduce however, and I'm not sure why. The more consistent
  failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
  host would time out on systemd/netplan configuring a virtio-net NIC
  with no noticeable errors in the logs.

Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().

Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.

Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.

Fixes: ede95a63b5 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10 09:53:47 +02:00
..
bpf bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K 2019-07-10 09:53:47 +02:00
cgroup cpuset: restore sanity to cpuset_cpus_allowed_fallback() 2019-07-10 09:53:39 +02:00
configs
debug kdb: Don't back trace on a cpu that didn't round up 2019-02-12 19:47:19 +01:00
dma dma-direct: do not include SME mask in the DMA supported check 2019-01-13 09:51:05 +01:00
events perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data 2019-06-22 08:15:16 +02:00
gcov
irq genirq: Prevent use-after-free and work list corruption 2019-05-10 17:54:10 +02:00
livepatch module: Fix livepatch/ftrace module text permissions race 2019-07-10 09:53:40 +02:00
locking locking/rwsem: Prevent decrement of reader count before increment 2019-05-22 07:37:34 +02:00
power x86/power: Fix 'nosmt' vs hibernation triple fault during resume 2019-06-11 12:20:52 +02:00
printk printk: Fix panic caused by passing log_buf_len to command line 2018-11-13 11:08:48 -08:00
rcu rcuperf: Fix cleanup path for invalid perf_type strings 2019-05-31 06:46:30 -07:00
sched jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
time timekeeping: Repair ktime_get_coarse*() granularity 2019-06-19 08:18:06 +02:00
trace ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code() 2019-07-10 09:53:44 +02:00
.gitignore
acct.c acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
async.c
audit_fsnotify.c
audit_tree.c
audit_watch.c
audit.c
audit.h
auditfilter.c audit: fix a memory leak bug 2019-05-31 06:46:17 -07:00
auditsc.c
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:08:47 -08:00
capability.c
compat.c
configs.c
context_tracking.c
cpu_pm.c
cpu.c cpu/speculation: Warn on unsupported mitigations= parameter 2019-07-03 13:14:46 +02:00
crash_core.c
crash_dump.c
cred.c ptrace: restore smp_rmb() in __ptrace_may_access() 2019-06-19 08:18:00 +02:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting 2019-04-05 22:33:13 +02:00
extable.c
fail_function.c
fork.c userfaultfd: use RCU to free the task struct when fork fails 2019-05-22 07:37:41 +02:00
freezer.c
futex_compat.c
futex.c locking/futex: Allow low-level atomic operations to return -EAGAIN 2019-05-10 17:54:11 +02:00
groups.c
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:16:02 +02:00
iomem.c
irq_work.c irq_work: Do not raise an IPI when queueing work on the local CPU 2019-05-31 06:46:19 -07:00
jump_label.c jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kernel/kcov.c: mark write_comp_data() as notrace 2019-02-12 19:47:20 +01:00
kexec_core.c
kexec_file.c
kexec_internal.h
kexec.c
kmod.c
kprobes.c kprobes: Fix error check when reusing optimized probes 2019-04-27 09:36:37 +02:00
ksysfs.c
kthread.c
latencytop.c
Makefile x86/uaccess, kcov: Disable stack protector 2019-06-19 08:18:01 +02:00
memremap.c mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support 2019-01-13 09:51:04 +01:00
module_signing.c
module-internal.h
module.c jump_label: move 'asm goto' support test to Kconfig 2019-06-04 08:02:34 +02:00
notifier.c
nsproxy.c
padata.c
panic.c panic: avoid deadlocks in re-entrant console drivers 2018-12-29 13:37:57 +01:00
params.c
pid_namespace.c
pid.c Fix failure path in alloc_pid() 2019-01-13 09:51:06 +01:00
profile.c
ptrace.c ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME 2019-07-10 09:53:41 +02:00
range.c
reboot.c
relay.c relay: check return of create_buf_file() properly 2019-03-13 14:02:35 -07:00
resource.c
rseq.c
seccomp.c
signal.c kernel/signal.c: trace_signal_deliver when signal_group_exit 2019-06-09 09:17:20 +02:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:47:25 +01:00
smpboot.c
smpboot.h
softirq.c
stacktrace.c
stop_machine.c
sys_ni.c
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-06-15 11:54:01 +02:00
sysctl_binary.c
sysctl.c sysctl: return -EINVAL if val violates minmax 2019-06-15 11:53:59 +02:00
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user_namespace.c userns: also map extents in the reverse map to kernel IDs 2018-11-13 11:09:00 -08:00
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog_hld.c
watchdog.c watchdog: Respect watchdog cpumask on CPU hotplug 2019-04-03 06:26:29 +02:00
workqueue_internal.h
workqueue.c workqueue: Try to catch flush_work() without INIT_WORK(). 2019-05-02 09:58:56 +02:00