linux/net
Wen Gu 0ef6049f66 net/smc: Forward wakeup to smc socket waitqueue after fallback
commit 341adeec9a upstream.

When we replace TCP with SMC and a fallback occurs, there may be
some socket waitqueue entries remaining in smc socket->wq, such
as eppoll_entries inserted by userspace applications.

After the fallback, data flows over TCP/IP and only clcsocket->wq
will be woken up. Applications can't be notified by the entries
which were inserted in smc socket->wq before fallback. So we need
a mechanism to wake up smc socket->wq at the same time if some
entries remaining in it.

The current workaround is to transfer the entries from smc socket->wq
to clcsock->wq during the fallback. But this may cause a crash
like this:

 general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G E     5.16.0+ #107
 RIP: 0010:__wake_up_common+0x65/0x170
 Call Trace:
  <IRQ>
  __wake_up_common_lock+0x7a/0xc0
  sock_def_readable+0x3c/0x70
  tcp_data_queue+0x4a7/0xc40
  tcp_rcv_established+0x32f/0x660
  ? sk_filter_trim_cap+0xcb/0x2e0
  tcp_v4_do_rcv+0x10b/0x260
  tcp_v4_rcv+0xd2a/0xde0
  ip_protocol_deliver_rcu+0x3b/0x1d0
  ip_local_deliver_finish+0x54/0x60
  ip_local_deliver+0x6a/0x110
  ? tcp_v4_early_demux+0xa2/0x140
  ? tcp_v4_early_demux+0x10d/0x140
  ip_sublist_rcv_finish+0x49/0x60
  ip_sublist_rcv+0x19d/0x230
  ip_list_rcv+0x13e/0x170
  __netif_receive_skb_list_core+0x1c2/0x240
  netif_receive_skb_list_internal+0x1e6/0x320
  napi_complete_done+0x11d/0x190
  mlx5e_napi_poll+0x163/0x6b0 [mlx5_core]
  __napi_poll+0x3c/0x1b0
  net_rx_action+0x27c/0x300
  __do_softirq+0x114/0x2d2
  irq_exit_rcu+0xb4/0xe0
  common_interrupt+0xba/0xe0
  </IRQ>
  <TASK>

The crash is caused by privately transferring waitqueue entries from
smc socket->wq to clcsock->wq. The owners of these entries, such as
epoll, have no idea that the entries have been transferred to a
different socket wait queue and still use original waitqueue spinlock
(smc socket->wq.wait.lock) to make the entries operation exclusive,
but it doesn't work. The operations to the entries, such as removing
from the waitqueue (now is clcsock->wq after fallback), may cause a
crash when clcsock waitqueue is being iterated over at the moment.

This patch tries to fix this by no longer transferring wait queue
entries privately, but introducing own implementations of clcsock's
callback functions in fallback situation. The callback functions will
forward the wakeup to smc socket->wq if clcsock->wq is actually woken
up and smc socket->wq has remaining entries.

Fixes: 2153bd1e3d ("net/smc: Transfer remaining wait queue entries during fallback")
Suggested-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08 18:34:09 +01:00
..
6lowpan
9p 9p/net: fix missing error check in p9_check_errors 2021-11-18 19:17:16 +01:00
802 net: 802: remove dead leftover after ipx driver removal 2021-08-13 16:30:35 -07:00
8021q net: vlan: fix underflow for the real_dev refcnt 2021-12-01 09:04:53 +01:00
appletalk
atm
ax25 ax25: uninitialized variable in ax25_setsockopt() 2022-01-27 11:03:59 +01:00
batman-adv batman-adv: allow netlink usage in unprivileged containers 2022-01-27 11:04:25 +01:00
bluetooth Bluetooth: refactor malicious adv data check 2022-02-01 17:27:14 +01:00
bpf bpf, test, cgroup: Use sk_{alloc,free} for test cases 2021-09-28 09:29:28 +02:00
bpfilter
bridge netfilter: nft_reject_bridge: Fix for missing reply from prerouting 2022-02-08 18:34:08 +01:00
caif net-caif: avoid user-triggerable WARN_ON(1) 2021-09-14 12:51:15 +01:00
can can: isotp: convert struct tpcon::{idx,len} to unsigned int 2022-01-16 09:12:44 +01:00
ceph
core rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink() 2022-02-05 12:38:59 +01:00
dcb
dccp tcp: switch orphan_count to bare per-cpu counters 2021-11-18 19:16:33 +01:00
decnet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
dns_resolver
dsa net: dsa: fix incorrect function pointer check for MRP ring roles 2022-01-27 11:03:50 +01:00
ethernet move netdev_boot_setup into Space.c 2021-08-03 13:05:26 +01:00
ethtool ethtool: do not perform operations on net devices being unregistered 2021-12-14 10:57:09 +01:00
hsr
ieee802154 net: ieee802154: Return meaningful error codes from the netlink helpers 2022-02-08 18:34:09 +01:00
ife
ipv4 tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() 2022-02-05 12:38:59 +01:00
ipv6 Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values" 2022-02-01 17:27:14 +01:00
iucv net/iucv: Replace deprecated CPU-hotplug functions. 2021-08-09 10:13:32 +01:00
kcm
key
l2tp net/l2tp: Fix reference count leak in l2tp_udp_recv_core 2021-09-09 11:00:20 +01:00
l3mdev
lapb
llc net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
mac80211 mac80211: allow non-standard VHT MCS-10/11 2022-01-27 11:04:52 +01:00
mac802154 ieee802154: Remove redundant initialization of variable ret 2021-09-07 14:06:08 +01:00
mctp mctp: Don't let RTM_DELROUTE delete local routes 2021-12-08 09:04:53 +01:00
mpls net: mpls: Fix notifications when deleting a device 2021-12-08 09:04:47 +01:00
mptcp mptcp: fix msk traversal in mptcp_nl_cmd_set_flags() 2022-02-08 18:34:06 +01:00
ncsi net/ncsi: check for error return from call to nla_put_u32 2022-01-05 12:42:37 +01:00
netfilter netfilter: conntrack: don't increment invalid counter on NF_REPEAT 2022-02-01 17:27:09 +01:00
netlabel net: fix NULL pointer reference in cipso_v4_doi_free 2021-08-30 12:23:18 +01:00
netlink net: netlink: af_netlink: Prevent empty skb by adding a check on len. 2021-12-17 10:30:15 +01:00
netrom netrom: fix api breakage in nr_setsockopt() 2022-01-27 11:04:00 +01:00
nfc nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind() 2022-01-27 11:02:48 +01:00
nsh
openvswitch net: openvswitch: Fix ct_state nat flags for conns arriving from tc 2022-01-27 11:04:02 +01:00
packet af_packet: fix data-race in packet_setsockopt / packet_setsockopt 2022-02-05 12:38:59 +01:00
phonet phonet: refcount leak in pep_sock_accep 2022-01-11 15:35:16 +01:00
psample
qrtr net: qrtr: revert check in qrtr_endpoint_post() 2021-09-02 11:37:02 +01:00
rds rds: memory leak in __rds_conn_create() 2021-12-22 09:32:42 +01:00
rfkill
rose
rxrpc rxrpc: Adjust retransmission backoff 2022-02-01 17:27:11 +01:00
sched net: sched: fix use-after-free in tc_new_tfilter() 2022-02-05 12:38:59 +01:00
sctp sctp: hold endpoint before calling cb in sctp_transport_lookup_process 2022-01-11 15:35:14 +01:00
smc net/smc: Forward wakeup to smc socket waitqueue after fallback 2022-02-08 18:34:09 +01:00
strparser bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding 2021-11-18 19:17:11 +01:00
sunrpc fsnotify: fix fsnotify hooks in pseudo filesystems 2022-02-01 17:27:01 +01:00
switchdev net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge 2021-08-04 12:35:07 +01:00
tipc net ticp:fix a kernel-infoleak in __tipc_sendmsg() 2022-01-11 15:35:16 +01:00
tls net/tls: Fix authentication failure in CCM mode 2021-12-08 09:04:41 +01:00
unix af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress 2022-01-27 11:05:30 +01:00
vmw_vsock virtio/vsock: fix the transport to work with VMADDR_CID_ANY 2021-12-22 09:32:39 +01:00
wireless cfg80211: Acquire wiphy mutex on regulatory work 2021-12-22 09:32:42 +01:00
x25
xdp Revert "xsk: Do not sleep in poll() when need_wakeup set" 2021-12-22 09:32:51 +01:00
xfrm xfrm: Don't accidentally set RTO_ONLINK in decode_session4() 2022-01-27 11:05:36 +01:00
compat.c
devres.c
Kconfig mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
Makefile mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
socket.c net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets 2022-01-27 11:03:52 +01:00
sysctl_net.c