linux/kernel/bpf
Daniel Borkmann 54e8cf41b2 bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
[ Upstream commit fdadd04931 ]

Michael and Sandipan report:

  Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
  JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
  and is adjusted again at init time if MODULES_VADDR is defined.

  For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
  the compile-time default at boot-time, which is 0x9c400000 when
  using 64K page size. This overflows the signed 32-bit bpf_jit_limit
  value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

  and can cause various unexpected failures throughout the network
  stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
             16) = -1 ENOTSUPP (Unknown error 524)

  and similar failures can be seen with tools like tcpdump. This doesn't
  always reproduce however, and I'm not sure why. The more consistent
  failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
  host would time out on systemd/netplan configuring a virtio-net NIC
  with no noticeable errors in the logs.

Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().

Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.

Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.

Fixes: ede95a63b5 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10 09:53:47 +02:00
..
arraymap.c bpf: decouple btf from seq bpf fs dump and enable more maps 2018-08-13 00:52:45 +02:00
bpf_lru_list.c bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 2017-04-17 13:55:52 -04:00
bpf_lru_list.h bpf: Only set node->ref = 1 if it has not been set 2017-09-01 09:57:39 -07:00
btf.c bpf: btf: Fix end boundary calculation for type section 2018-09-12 22:00:23 +02:00
cgroup.c bpf: introduce update_effective_progs() 2018-08-07 14:29:55 +02:00
core.c bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K 2019-07-10 09:53:47 +02:00
cpumap.c bpf: fix redirect to map under tail calls 2018-08-17 15:56:23 -07:00
devmap.c bpf: devmap: fix use-after-free Read in __dev_map_entry_free 2019-05-31 06:46:04 -07:00
disasm.c bpf: Remove struct bpf_verifier_env argument from print_bpf_insn 2018-03-23 17:38:57 +01:00
disasm.h bpf: Remove struct bpf_verifier_env argument from print_bpf_insn 2018-03-23 17:38:57 +01:00
hashtab.c bpf, lru: avoid messing with eviction heuristics upon syscall lookup 2019-05-25 18:23:48 +02:00
helpers.c bpf: introduce the bpf_get_local_storage() helper function 2018-08-03 00:47:32 +02:00
inode.c bpf: relax inode permission check for retrieving bpf program 2019-05-25 18:23:47 +02:00
local_storage.c bpf: allocate local storage buffers using GFP_ATOMIC 2018-12-17 09:24:33 +01:00
lpm_trie.c bpf: lpm_trie: check left child of last leftmost node for NULL 2019-07-03 13:14:48 +02:00
Makefile bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY 2018-08-11 01:58:46 +02:00
map_in_map.c bpf: fix inner map masking to prevent oob under speculation 2019-01-31 08:14:41 +01:00
map_in_map.h bpf: Add syscall lookup support for fd array and htab 2017-06-29 13:13:25 -04:00
offload.c bpf: offload: allow program and map sharing per-ASIC 2018-07-18 15:10:34 +02:00
percpu_freelist.c bpf: fix lockdep false positive in percpu_freelist 2019-03-13 14:02:36 -07:00
percpu_freelist.h bpf: fix lockdep false positive in percpu_freelist 2019-03-13 14:02:36 -07:00
reuseport_array.c bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY 2018-08-11 01:58:46 +02:00
sockmap.c bpf: sockmap, fix transition through disconnect without close 2018-09-22 02:46:41 +02:00
stackmap.c bpf: fix lockdep false positive in stackmap 2019-03-23 20:09:48 +01:00
syscall.c bpf: fix unconnected udp hooks 2019-07-03 13:14:48 +02:00
tnum.c bpf/verifier: improve register value range tracking with ARSH 2018-04-29 08:45:53 -07:00
verifier.c bpf: fix unconnected udp hooks 2019-07-03 13:14:48 +02:00
xskmap.c xsk: do not call synchronize_net() under RCU read lock 2018-10-11 10:19:01 +02:00