linux/net/core
John Fastabend fd09af0107 bpf: sock_ops ctx access may stomp registers in corner case
I had a sockmap program that after doing some refactoring started spewing
this splat at me:

[18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[...]
[18610.807359] Call Trace:
[18610.807370]  ? 0xffffffffc114d0d5
[18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
[18610.807391]  tcp_connect+0x895/0xd50
[18610.807400]  tcp_v4_connect+0x465/0x4e0
[18610.807407]  __inet_stream_connect+0xd6/0x3a0
[18610.807412]  ? __inet_stream_connect+0x5/0x3a0
[18610.807417]  inet_stream_connect+0x3b/0x60
[18610.807425]  __sys_connect+0xed/0x120

After some debugging I was able to build this simple reproducer,

 __section("sockops/reproducer_bad")
 int bpf_reproducer_bad(struct bpf_sock_ops *skops)
 {
        volatile __maybe_unused __u32 i = skops->snd_ssthresh;
        return 0;
 }

And along the way noticed that below program ran without splat,

__section("sockops/reproducer_good")
int bpf_reproducer_good(struct bpf_sock_ops *skops)
{
        volatile __maybe_unused __u32 i = skops->snd_ssthresh;
        volatile __maybe_unused __u32 family;

        compiler_barrier();

        family = skops->family;
        return 0;
}

So I decided to check out the code we generate for the above two
programs and noticed each generates the BPF code you would expect,

0000000000000000 <bpf_reproducer_bad>:
;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
       0:       r1 = *(u32 *)(r1 + 96)
       1:       *(u32 *)(r10 - 4) = r1
;       return 0;
       2:       r0 = 0
       3:       exit

0000000000000000 <bpf_reproducer_good>:
;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
       0:       r2 = *(u32 *)(r1 + 96)
       1:       *(u32 *)(r10 - 4) = r2
;       family = skops->family;
       2:       r1 = *(u32 *)(r1 + 20)
       3:       *(u32 *)(r10 - 8) = r1
;       return 0;
       4:       r0 = 0
       5:       exit

So we get reasonable assembly, but still something was causing the null
pointer dereference. So, we load the programs and dump the xlated version
observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
translated by the skops access helpers.

int bpf_reproducer_bad(struct bpf_sock_ops * skops):
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   0: (61) r1 = *(u32 *)(r1 +28)
   1: (15) if r1 == 0x0 goto pc+2
   2: (79) r1 = *(u64 *)(r1 +0)
   3: (61) r1 = *(u32 *)(r1 +2340)
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   4: (63) *(u32 *)(r10 -4) = r1
; return 0;
   5: (b7) r0 = 0
   6: (95) exit

int bpf_reproducer_good(struct bpf_sock_ops * skops):
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   0: (61) r2 = *(u32 *)(r1 +28)
   1: (15) if r2 == 0x0 goto pc+2
   2: (79) r2 = *(u64 *)(r1 +0)
   3: (61) r2 = *(u32 *)(r2 +2340)
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   4: (63) *(u32 *)(r10 -4) = r2
; family = skops->family;
   5: (79) r1 = *(u64 *)(r1 +0)
   6: (69) r1 = *(u16 *)(r1 +16)
; family = skops->family;
   7: (63) *(u32 *)(r10 -8) = r1
; return 0;
   8: (b7) r0 = 0
   9: (95) exit

Then we look at lines 0 and 2 above. In the good case we do the zero
check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
into the bpf_sock_ops check and we can confirm that is the 'struct
sock *sk' pointer field. But, in the bad case,

   0: (61) r1 = *(u32 *)(r1 +28)
   1: (15) if r1 == 0x0 goto pc+2
   2: (79) r1 = *(u64 *)(r1 +0)

Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,

[18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001

The 0x01 makes sense because that is exactly the fullsock value. And
its not a valid dereference so we splat.

To fix we need to guard the case when a program is doing a sock_ops field
access with src_reg == dst_reg. This is already handled in the load case
where the ctx_access handler uses a tmp register being careful to
store the old value and restore it. To fix the get case test if
src_reg == dst_reg and in this case do the is_fullsock test in the
temporary register. Remembering to restore the temporary register before
writing to either dst_reg or src_reg to avoid smashing the pointer into
the struct holding the tmp variable.

Adding this inline code to test_tcpbpf_kern will now be generated
correctly from,

  9: r2 = *(u32 *)(r2 + 96)

to xlated code,

  12: (7b) *(u64 *)(r2 +32) = r9
  13: (61) r9 = *(u32 *)(r2 +28)
  14: (15) if r9 == 0x0 goto pc+4
  15: (79) r9 = *(u64 *)(r2 +32)
  16: (79) r2 = *(u64 *)(r2 +0)
  17: (61) r2 = *(u32 *)(r2 +2348)
  18: (05) goto pc+1
  19: (79) r9 = *(u64 *)(r2 +32)

And in the normal case we keep the original code, because really this
is an edge case. From this,

  9: r2 = *(u32 *)(r6 + 96)

to xlated code,

  22: (61) r2 = *(u32 *)(r6 +28)
  23: (15) if r2 == 0x0 goto pc+2
  24: (79) r2 = *(u64 *)(r6 +0)
  25: (61) r2 = *(u32 *)(r2 +2348)

So three additional instructions if dst == src register, but I scanned
my current code base and did not see this pattern anywhere so should
not be a big deal. Further, it seems no one else has hit this or at
least reported it so it must a fairly rare pattern.

Fixes: 9b1f3d6e5a ("bpf: Refactor sock_ops_convert_ctx_access")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159718347772.4728.2781381670567919577.stgit@john-Precision-5820-Tower
2020-08-13 22:40:36 +02:00
..
bpf_sk_storage.c bpf: Change uapi for bpf iterator map elements 2020-08-06 16:39:14 -07:00
datagram.c net: use indirect call wrappers for skb_copy_datagram_iter() 2020-03-25 11:30:40 -07:00
datagram.h
dev_addr_lists.c net: explain the lockdep annotations for dev_uc_unsync() 2020-06-28 21:38:27 -07:00
dev_ioctl.c net: Call into DSA netdevice_ops wrappers 2020-07-20 16:48:22 -07:00
dev.c bpf: Fix XDP FD-based attach/detach logic around XDP_FLAGS_UPDATE_IF_NOEXIST 2020-08-12 18:00:49 -07:00
devlink.c devlink: Pass extack when setting trap's action and group's parameters 2020-08-03 18:06:46 -07:00
drop_monitor.c net: Add MODULE_DESCRIPTION entries to network modules 2020-06-20 21:33:57 -07:00
dst_cache.c
dst.c net/dst: use a smaller percpu_counter batch for dst entries accounting 2020-05-08 21:33:33 -07:00
failover.c
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c fib: Fix undef compile warning 2020-08-03 18:01:49 -07:00
filter.c bpf: sock_ops ctx access may stomp registers in corner case 2020-08-13 22:40:36 +02:00
flow_dissector.c net/flow_dissector: add packet hash dissection 2020-07-24 15:23:31 -07:00
flow_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-25 17:49:04 -07:00
gen_estimator.c net_sched: gen_estimator: extend packet counter to 64bit 2019-11-06 21:51:36 -08:00
gen_stats.c docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
gro_cells.c
hwbm.c net: hwbm: Make the hwbm_pool lock a mutex 2019-06-09 19:40:10 -07:00
link_watch.c net: Add IF_OPER_TESTING 2020-04-20 12:43:24 -07:00
lwt_bpf.c net: add net available in build_state 2020-03-29 22:30:57 -07:00
lwtunnel.c net: ipv6: add rpl sr tunnel 2020-03-29 22:30:57 -07:00
Makefile ethtool: move to its own directory 2019-12-12 17:07:05 -08:00
neighbour.c net: neighbor: add fdb extended attribute 2020-06-24 14:36:33 -07:00
net_namespace.c nsproxy: add struct nsset 2020-05-09 13:57:12 +02:00
net-procfs.c net: procfs: use index hashlist instead of name hashlist 2019-10-01 14:47:19 -07:00
net-sysfs.c The main changes in this cycle were: 2020-08-03 14:58:38 -07:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c page_pool: add tracepoints for page_pool with details need by XDP 2019-06-19 11:23:13 -04:00
netclassid_cgroup.c cgroup, netclassid: remove double cond_resched 2020-04-21 15:44:30 -07:00
netevent.c
netpoll.c netpoll: accept NULL np argument in netpoll_send_skb() 2020-05-07 18:11:07 -07:00
netprio_cgroup.c netprio_cgroup: Fix unlimited memory leak of v2 cgroups 2020-05-09 20:59:21 -07:00
page_pool.c net: page pool: allow to pass zero flags to page_pool_init() 2020-03-29 21:49:20 -07:00
pktgen.c docs: networking: convert pktgen.txt to ReST 2020-04-30 12:56:37 -07:00
ptp_classifier.c
request_sock.c tcp: add rcu protection around tp->fastopen_rsk 2019-10-13 10:13:08 -07:00
rtnetlink.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-08-03 18:27:40 -07:00
scm.c fs: Add receive_fd() wrapper for __receive_fd() 2020-07-13 11:03:44 -07:00
secure_seq.c crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h 2020-05-08 15:32:17 +10:00
skbuff.c net: Use helper function ip_is_fragment() 2020-08-08 14:24:53 -07:00
skmsg.c bpf, sockmap: RCU dereferenced psock may be used outside RCU block 2020-06-28 08:33:28 -07:00
sock_diag.c sock: make cookie generation global instead of per netns 2019-08-09 13:14:46 -07:00
sock_map.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-11 00:46:00 -07:00
sock_reuseport.c udp: Copy has_conns in reuseport_grow(). 2020-07-21 15:31:02 -07:00
sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
stream.c tcp: make sure EPOLLOUT wont be missed 2019-08-19 13:07:43 -07:00
sysctl_net_core.c bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() 2020-07-08 16:01:21 -07:00
timestamping.c net: Introduce a new MII time stamping interface. 2019-12-25 19:51:33 -08:00
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands 2020-07-25 20:37:02 -07:00