linux/drivers/net
Yufeng Mo f710323dcd bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
[ Upstream commit 220ade7745 ]

Some time ago, I reported a calltrace issue
"did not find a suitable aggregator", please see[1].
After a period of analysis and reproduction, I find
that this problem is caused by concurrency.

Before the problem occurs, the bond structure is like follows:

bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1
                      \
                        port0
      \
        slaver1(eth1) - agg1.lag_ports -> NULL
                      \
                        port1

If we run 'ifenslave bond0 -d eth1', the process is like below:

excuting __bond_release_one()
|
bond_upper_dev_unlink()[step1]
|                       |                       |
|                       |                       bond_3ad_lacpdu_recv()
|                       |                       ->bond_3ad_rx_indication()
|                       |                       spin_lock_bh()
|                       |                       ->ad_rx_machine()
|                       |                       ->__record_pdu()[step2]
|                       |                       spin_unlock_bh()
|                       |                       |
|                       bond_3ad_state_machine_handler()
|                       spin_lock_bh()
|                       ->ad_port_selection_logic()
|                       ->try to find free aggregator[step3]
|                       ->try to find suitable aggregator[step4]
|                       ->did not find a suitable aggregator[step5]
|                       spin_unlock_bh()
|                       |
|                       |
bond_3ad_unbind_slave() |
spin_lock_bh()
spin_unlock_bh()

step1: already removed slaver1(eth1) from list, but port1 remains
step2: receive a lacpdu and update port0
step3: port0 will be removed from agg0.lag_ports. The struct is
       "agg0.lag_ports -> port1" now, and agg0 is not free. At the
	   same time, slaver1/agg1 has been removed from the list by step1.
	   So we can't find a free aggregator now.
step4: can't find suitable aggregator because of step2
step5: cause a calltrace since port->aggregator is NULL

To solve this concurrency problem, put bond_upper_dev_unlink()
after bond_3ad_unbind_slave(). In this way, we can invalid the port
first and skip this port in bond_3ad_state_machine_handler(). This
eliminates the situation that the slaver has been removed from the
list but the port is still valid.

[1]https://lore.kernel.org/netdev/10374.1611947473@famine/

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18 13:40:24 +02:00
..
appletalk net: appletalk: cops: Fix data race in cops_probe1 2021-06-16 12:01:37 +02:00
arcnet
bonding bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() 2021-09-18 13:40:24 +02:00
caif net: caif: fix memory leak in ldisc_open 2021-06-30 08:47:21 -04:00
can can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters 2021-09-03 10:09:22 +02:00
dsa net: dsa: mt7530: fix VLAN traffic leaks again 2021-09-03 10:09:30 +02:00
ethernet nfp: fix return statement in nfp_net_parse_meta() 2021-09-18 13:40:21 +02:00
fddi net: fddi: fix UAF in fza_probe 2021-07-25 14:36:20 +02:00
fjes fjes: check return value after calling platform_get_resource() 2021-07-19 09:44:49 +02:00
hamradio net: 6pack: fix slab-out-of-bounds in decode_data 2021-08-26 08:35:45 -04:00
hippi
hyperv hv_netvsc: Reset the RSC count if NVSP_STAT_FAIL in netvsc_receive() 2021-02-17 11:02:26 +01:00
ieee802154 ieee802154: hwsim: fix GPF in hwsim_new_edge_nl 2021-08-18 08:59:07 +02:00
ipa net: ipa: Add missing of_node_put() in ipa_firmware_load() 2021-07-19 09:44:51 +02:00
ipvlan
mdio net: mdio-mux: Handle -EPROBE_DEFER correctly 2021-08-26 08:35:49 -04:00
netdevsim net: netdevsim: use xso.real_dev instead of xso.dev in callback functions of struct xfrmdev_ops 2021-07-25 14:36:19 +02:00
pcs
phy net: phy: Fix data type in DP83822 dp8382x_disable_wol() 2021-09-18 13:40:18 +02:00
plip
ppp ppp: Fix generating ifname when empty IFLA_IFNAME is specified 2021-08-18 08:59:10 +02:00
slip
team team: protect features update by RCU to avoid deadlock 2021-02-03 23:28:51 +01:00
usb net: usb: pegasus: fixes of set_register(s) return value evaluation; 2021-09-03 10:09:23 +02:00
vmxnet3 vmxnet3: fix cksum offload issues for tunnels with non-default udp ports 2021-07-25 14:36:19 +02:00
wan net: lapbether: Prevent racing when checking whether the netif is running 2021-05-14 09:50:29 +02:00
wimax staging: wimax/i2400m: fix byte-order issue 2021-05-11 14:47:16 +02:00
wireguard wireguard: allowedips: free empty intermediate nodes when removing single node 2021-06-10 13:39:24 +02:00
wireless wcn36xx: Ensure finish scan is not requested before start scan 2021-09-18 13:40:08 +02:00
xen-netback xen-netback: take a reference to the RX task thread 2021-06-10 13:39:29 +02:00
bareudp.c bareudp: Fix invalid read beyond skb's linear data 2021-08-18 08:59:11 +02:00
dummy.c
eql.c
geneve.c net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb 2021-05-14 09:50:43 +02:00
gtp.c net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending 2021-03-04 11:38:46 +01:00
ifb.c
Kconfig crypto: mips/poly1305 - enable for all MIPS processors 2021-03-17 17:06:10 +01:00
LICENSE.SRC
loopback.c
macsec.c net: macsec: fix the length used to copy the key for offloading 2021-07-14 16:56:28 +02:00
macvlan.c
macvtap.c
Makefile
mdio.c
mii.c
net_failover.c
netconsole.c
nlmon.c
ntb_netdev.c
rionet.c
sb1000.c
Space.c
sungem_phy.c
tap.c net: fix dev_ifsioc_locked() race condition 2021-03-07 12:34:07 +01:00
thunderbolt.c
tun.c net: tun: set tun->dev->addr_len during TUNSETLINK processing 2021-04-14 08:42:13 +02:00
veth.c veth: Store queue_mapping independently of XDP prog presence 2021-03-30 14:31:56 +02:00
virtio_net.c virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO 2021-08-26 08:35:48 -04:00
vrf.c vrf: Reset skb conntrack connection on VRF rcv 2021-08-26 08:35:47 -04:00
vsockmon.c
vxlan.c vxlan: add missing rcu_read_lock() in neigh_reduce() 2021-07-14 16:56:25 +02:00
xen-netfront.c