linux/net
Nikolay Aleksandrov 6652101175 net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group
[ Upstream commit 1005f19b93 ]

When replacing a nexthop group, we must release the IPv6 per-cpu dsts of
the removed nexthop entries after an RCU grace period because they
contain references to the nexthop's net device and to the fib6 info.
With specific series of events[1] we can reach net device refcount
imbalance which is unrecoverable. IPv4 is not affected because dsts
don't take a refcount on the route.

[1]
 $ ip nexthop list
  id 200 via 2002:db8::2 dev bridge.10 scope link onlink
  id 201 via 2002:db8::3 dev bridge scope link onlink
  id 203 group 201/200
 $ ip -6 route
  2001:db8::10 nhid 203 metric 1024 pref medium
     nexthop via 2002:db8::3 dev bridge weight 1 onlink
     nexthop via 2002:db8::2 dev bridge.10 weight 1 onlink

Create rt6_info through one of the multipath legs, e.g.:
 $ taskset -a -c 1  ./pkt_inj 24 bridge.10 2001:db8::10
 (pkt_inj is just a custom packet generator, nothing special)

Then remove that leg from the group by replace (let's assume it is id
200 in this case):
 $ ip nexthop replace id 203 group 201

Now remove the IPv6 route:
 $ ip -6 route del 2001:db8::10/128

The route won't be really deleted due to the stale rt6_info holding 1
refcnt in nexthop id 200.
At this point we have the following reference count dependency:
 (deleted) IPv6 route holds 1 reference over nhid 203
 nh 203 holds 1 ref over id 201
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

Now to create circular dependency between nh 200 and the IPv6 route, and
also to get a reference over nh 200, restore nhid 200 in the group:
 $ ip nexthop replace id 203 group 201/200

And now we have a permanent circular dependncy because nhid 203 holds a
reference over nh 200 and 201, but the route holds a ref over nh 203 and
is deleted.

To trigger the bug just delete the group (nhid 203):
 $ ip nexthop del id 203

It won't really be deleted due to the IPv6 route dependency, and now we
have 2 unlinked and deleted objects that reference each other: the group
and the IPv6 route. Since the group drops the reference it holds over its
entries at free time (i.e. its own refcount needs to drop to 0) that will
never happen and we get a permanent ref on them, since one of the entries
holds a reference over the IPv6 route it will also never be released.

At this point the dependencies are:
 (deleted, only unlinked) IPv6 route holds reference over group nh 203
 (deleted, only unlinked) group nh 203 holds reference over nh 201 and 200
 nh 200 holds 1 ref over the net device and the route due to the stale
 rt6_info

This is the last point where it can be fixed by running traffic through
nh 200, and specifically through the same CPU so the rt6_info (dst) will
get released due to the IPv6 genid, that in turn will free the IPv6
route, which in turn will free the ref count over the group nh 203.

If nh 200 is deleted at this point, it will never be released due to the
ref from the unlinked group 203, it will only be unlinked:
 $ ip nexthop del id 200
 $ ip nexthop
 $

Now we can never release that stale rt6_info, we have IPv6 route with ref
over group nh 203, group nh 203 with ref over nh 200 and 201, nh 200 with
rt6_info (dst) with ref over the net device and the IPv6 route. All of
these objects are only unlinked, and cannot be released, thus they can't
release their ref counts.

 Message from syslogd@dev at Nov 19 14:04:10 ...
  kernel:[73501.828730] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3
 Message from syslogd@dev at Nov 19 14:04:20 ...
  kernel:[73512.068811] unregister_netdevice: waiting for bridge.10 to become free. Usage count = 3

Fixes: 7bf4796dd0 ("nexthops: add support for replace")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01 09:04:49 +01:00
..
6lowpan 6lowpan: iphc: Fix an off-by-one check of array index 2021-07-22 16:19:03 +02:00
9p 9p/net: fix missing error check in p9_check_errors 2021-11-18 19:17:16 +01:00
802 net: 802: remove dead leftover after ipx driver removal 2021-08-13 16:30:35 -07:00
8021q net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-18 19:17:06 +01:00
appletalk net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
atm
ax25 ax25: use skb_expand_head 2021-08-03 11:21:39 +01:00
batman-adv net: batman-adv: fix error handling 2021-10-26 14:47:12 +01:00
bluetooth Bluetooth: fix init and cleanup of sco_conn.timeout_work 2021-11-18 19:16:22 +01:00
bpf bpf, test, cgroup: Use sk_{alloc,free} for test cases 2021-09-28 09:29:28 +02:00
bpfilter
bridge net: bridge: fix uninitialized variables when BRIDGE_CFM is disabled 2021-11-18 19:16:44 +01:00
caif net-caif: avoid user-triggerable WARN_ON(1) 2021-09-14 12:51:15 +01:00
can can: j1939: j1939_tp_cmd_recv(): check the dst address of TP.CM_BAM 2021-11-18 19:16:03 +01:00
ceph
core net: add and use skb_unclone_keeptruesize() helper 2021-11-25 09:49:08 +01:00
dcb
dccp tcp: switch orphan_count to bare per-cpu counters 2021-11-18 19:16:33 +01:00
decnet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
dns_resolver
dsa net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridge 2021-11-18 19:17:06 +01:00
ethernet move netdev_boot_setup into Space.c 2021-08-03 13:05:26 +01:00
ethtool ethtool: fix ethtool msg len calculation for pause stats 2021-11-18 19:17:06 +01:00
hsr
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-13 06:41:22 -07:00
ife
ipv4 net: nexthop: release IPv6 per-cpu dsts when replacing a nexthop group 2021-12-01 09:04:49 +01:00
ipv6 net: ipv6: add fib6_nh_release_dsts stub 2021-12-01 09:04:49 +01:00
iucv net/iucv: Replace deprecated CPU-hotplug functions. 2021-08-09 10:13:32 +01:00
kcm
key
l2tp net/l2tp: Fix reference count leak in l2tp_udp_recv_core 2021-09-09 11:00:20 +01:00
l3mdev
lapb
llc net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
mac80211 mac80211: drop check for DONT_REORDER in __ieee80211_select_queue 2021-11-25 09:48:47 +01:00
mac802154 ieee802154: Remove redundant initialization of variable ret 2021-09-07 14:06:08 +01:00
mctp mctp: handle the struct sockaddr_mctp padding fields 2021-11-18 19:16:02 +01:00
mpls mpls: defer ttl decrement in mpls_forward() 2021-07-23 17:17:56 +01:00
mptcp mptcp: use delegate action to schedule 3rd ack retrans 2021-12-01 09:04:49 +01:00
ncsi net/ncsi: add get MAC address command to get Intel i210 MAC address 2021-09-01 17:18:56 -07:00
netfilter netfilter: flowtable: fix IPv6 tunnel addr match 2021-12-01 09:04:45 +01:00
netlabel net: fix NULL pointer reference in cipso_v4_doi_free 2021-08-30 12:23:18 +01:00
netlink netlink: annotate data races around nlk->bound 2021-10-05 13:11:09 +01:00
netrom net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
nfc NFC: add NCI_UNREG flag to eliminate the race 2021-11-25 09:48:40 +01:00
nsh
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-08-19 18:09:18 -07:00
packet net/packet: clarify source of pr_*() messages 2021-09-10 10:00:59 +01:00
phonet net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
psample
qrtr net: qrtr: revert check in qrtr_endpoint_post() 2021-09-02 11:37:02 +01:00
rds net/rds: dma_map_sg is entitled to merge entries 2021-08-18 15:35:50 -07:00
rfkill
rose
rxrpc rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies() 2021-11-18 19:16:25 +01:00
sched net: sched: act_mirred: drop dst for the direction from egress to ingress 2021-11-25 09:48:38 +01:00
sctp sctp: return true only for pathmtu update in sctp_transport_pl_toobig 2021-11-18 19:16:43 +01:00
smc net/smc: Make sure the link_id is unique 2021-11-25 09:48:35 +01:00
strparser bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding 2021-11-18 19:17:11 +01:00
sunrpc SUNRPC: Partial revert of commit 6f9f17287e 2021-11-18 19:17:20 +01:00
switchdev net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge 2021-08-04 12:35:07 +01:00
tipc tipc: check for null after calling kmemdup 2021-11-25 09:48:42 +01:00
tls net/tls: Fix flipped sign in async_wait.err assignment 2021-10-28 14:41:20 +01:00
unix af_unix: fix regression in read after shutdown 2021-12-01 09:04:49 +01:00
vmw_vsock vsock: prevent unnecessary refcnt inc for nonblocking connect 2021-11-18 19:17:13 +01:00
wireless cfg80211: call cfg80211_stop_ap when switch from P2P_GO type 2021-11-25 09:48:46 +01:00
x25
xdp
xfrm xfrm: fix rcu lock in xfrm_notify_userpolicy() 2021-09-23 10:11:12 +02:00
compat.c
devres.c
Kconfig mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
Makefile mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
socket.c Core: 2021-08-31 16:43:06 -07:00
sysctl_net.c