linux/net
Lorenz Bauer 9c02bec959 bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy,
which configures SO_REUSEPORT on its sockets. In turn, we're blocked
from removing TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the
bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
one of the reuseport sockets is selected by hash. This could cause
dispatch to the "wrong" socket:

    sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
    bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc72e ("bpf: Fix use of sk->sk_reuseport from sk_assign")
Fixes: cf7fbe660f ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@cilium.io>
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-07-25 13:55:55 -07:00
..
6lowpan 6lowpan: Remove redundant initialisation. 2023-03-29 08:22:52 +01:00
9p Including fixes from netfilter. 2023-05-05 19:12:01 -07:00
802
8021q vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit() 2023-05-17 12:55:39 +01:00
appletalk sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
atm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ax25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
batman-adv batman-adv: Broken sync while rescheduling delayed work 2023-05-26 23:14:49 +02:00
bluetooth Bluetooth: MGMT: Use correct address for memcpy() 2023-07-20 11:27:22 -07:00
bpf selftests/bpf: add testcase for TRACING with 6+ arguments 2023-07-13 16:04:56 -07:00
bpfilter net: Use umd_cleanup_helper() 2023-05-31 13:06:57 +02:00
bridge net: switchdev: Add a helper to replay objects on a bridge port 2023-07-21 08:54:03 +01:00
caif sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
can can: bcm: Fix UAF in bcm_proc_show() 2023-07-17 09:46:26 +02:00
ceph libceph: harden msgr2.1 frame segment length checks 2023-07-13 13:18:57 +02:00
core bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign 2023-07-25 13:55:55 -07:00
dcb
dccp ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
devlink devlink: remove reload failed checks in params get/set callbacks 2023-07-14 09:08:09 +01:00
dns_resolver
dsa net: dsa: remove deprecated strncpy 2023-07-23 11:45:46 +01:00
ethernet
ethtool net: create device lookup API with reference tracking 2023-06-15 08:21:11 +01:00
handshake Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-15 22:19:41 -07:00
hsr net: hsr: Disable promiscuous mode in offload mode 2023-06-21 16:47:05 -07:00
ieee802154 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
ife
ipv4 bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign 2023-07-25 13:55:55 -07:00
ipv6 bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign 2023-07-25 13:55:55 -07:00
iucv net/iucv: Fix size of interrupt data 2023-03-16 17:34:40 -07:00
kcm sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
key ipsec-next-2023-07-19 2023-07-19 20:36:16 -07:00
l2tp ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
l3mdev
lapb
llc llc: Don't drop packet from non-root netns. 2023-07-20 10:46:28 +02:00
mac80211 - New Drivers 2023-07-03 11:26:05 -07:00
mac802154 Core WPAN changes: 2023-06-24 15:41:46 -07:00
mctp sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
mpls net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
mptcp ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
ncsi net/ncsi: change from ndo_set_mac_address to dev_set_mac_address 2023-06-09 10:32:51 +01:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-07-20 15:52:55 -07:00
netlabel netlabel: Reorder fields in 'struct netlbl_domaddr6_map' 2023-06-20 20:06:56 -07:00
netlink netlink: Add new netlink_release function 2023-07-23 11:34:22 +01:00
netrom sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-06-27 09:45:22 -07:00
nsh net: move gso declarations and functions to their own files 2023-06-10 00:11:41 -07:00
openvswitch openvswitch: set IPS_CONFIRMED in tmpl status only when commit is set in conntrack 2023-07-20 10:06:36 +02:00
packet sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
phonet sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
psample
qrtr net: qrtr: Handle IPCR control port format of older targets 2023-07-17 09:02:30 +01:00
rds sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rfkill net: rfkill-gpio: Add explicit include for of.h 2023-04-06 20:36:27 +02:00
rose sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
rxrpc Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sched tcx: Fix splat in ingress_destroy upon tcx_entry_free 2023-07-24 11:42:35 -07:00
sctp ipv6: remove hard coded limitation on ipv6_pinfo 2023-07-24 09:39:31 +01:00
smc smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES 2023-06-24 15:50:12 -07:00
strparser
sunrpc NFS client updates for Linux 6.5 2023-07-01 14:38:25 -07:00
switchdev net: switchdev: Add a helper to replay objects on a bridge port 2023-07-21 08:54:03 +01:00
tipc sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
tls net: Kill MSG_SENDPAGE_NOTLAST 2023-06-24 15:50:13 -07:00
unix net: scm: introduce and use scm_recv_unix helper 2023-06-27 10:50:22 -07:00
vmw_vsock sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
wireless wifi: cfg80211: fix receiving mesh packets without RFC1042 header 2023-07-12 18:03:40 -07:00
x25 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) 2023-06-24 15:50:13 -07:00
xdp xsk: support ZC Tx multi-buffer in batch API 2023-07-19 09:56:50 -07:00
xfrm tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage 2023-06-24 15:50:12 -07:00
compat.c net/compat: Update msg_control_is_user when setting a kernel pointer 2023-04-14 11:09:27 +01:00
devres.c
Kconfig bpf: Add fd-based tcx multi-prog infra with link support 2023-07-19 10:07:27 -07:00
Kconfig.debug
Makefile net/handshake: Create a NETLINK service for handling handshake requests 2023-04-19 18:48:48 -07:00
socket.c Networking changes for 6.5. 2023-06-28 16:43:10 -07:00
sysctl_net.c