-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH7ZpcWbFyOOp6OJbrB3Eaf9PW7cFAmgtYgEACgkQrB3Eaf9P
W7e1ag//da84UIRyJwMfDO4Y3MXDNPslNSDuq0HuvwRtdLIBLFtwitSzU1uhKsxY
yn5v7RSsxvp6lXW2RT+Ycor2qZ/mGHJsHcVfG7m0YjxH6unw7yzjqn5LNNzRbYN4
NcD8P0skuX6d80EFPUB3Hsnmdj1VKR62OsWyk3rAPb4CLBVKJt9OsseVfN4bn1R0
TaZSIkdh5EDGYXTBKb49jc8LFfQo7+uVg/AjtZ/2ZsWt+Qgw3XevTIcwLokH00rt
GzXcLjC1g+b6TeVncOuD1oiNJUtQVGYV23t2yQlk9k2HFzCdNnq0YM9pzawwiI+l
icBV2X/QFjhdCRkvJRF4dkXq/4tnnEmYoY/1vSOoWR9VmY2u8Lr3VRiDD/h0gYJT
KXd8YPMtZLDnLgmH+DwWbv4vdLtHvQTmB8XFzb/4VN6Ikucenry3loJsUsLnS+Je
t1/7unLrg9yyJC6UPzweqjAx+6VgZvem/M5kejIVxHpk+Wg2dXGZ2jz4fsVuZYPB
dMLj1h1MLn4gOt2b/bdI2do0C+p2R1axrTNw+RiqwCrb1h5Ey+7RAhWyXyaHUEs3
1brMAgOcvdbaaeSIpoHJ8eJx/PgRxDrxRnUC3HjCGPNApYQXC3FM3POk7wwJ9C0i
odlHrq+yOdzLCZyU+YKdR1q3kPq9AWpUSmc4Olg359OQ9IxDGQw=
=bgyq
-----END PGP SIGNATURE-----
Merge tag 'ipsec-2025-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2025-05-21
1) Fix some missing kfree_skb in the error paths of espintcp.
From Sabrina Dubroca.
2) Fix a reference leak in espintcp.
From Sabrina Dubroca.
3) Fix UDP GRO handling for ESPINUDP.
From Tobias Brunner.
4) Fix ipcomp truesize computation on the receive path.
From Sabrina Dubroca.
5) Sanitize marks before policy/state insertation.
From Paul Chaignon.
* tag 'ipsec-2025-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: Sanitize marks before insert
xfrm: ipcomp: fix truesize computation on receive
xfrm: Fix UDP GRO handling for some corner cases
espintcp: remove encap socket caching to avoid reference leak
espintcp: fix skb leaks
====================
Link: https://patch.msgid.link/20250521054348.4057269-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Syzbot reported a slab-use-after-free with the following call trace:
==================================================================
BUG: KASAN: slab-use-after-free in tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
Read of size 8 at addr ffff88807a733000 by task kworker/1:0/25
Call Trace:
kasan_report+0xd9/0x110 mm/kasan/report.c:601
tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840
crypto_request_complete include/crypto/algapi.h:266
aead_request_complete include/crypto/internal/aead.h:85
cryptd_aead_crypt+0x3b8/0x750 crypto/cryptd.c:772
crypto_request_complete include/crypto/algapi.h:266
cryptd_queue_worker+0x131/0x200 crypto/cryptd.c:181
process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
Allocated by task 8355:
kzalloc_noprof include/linux/slab.h:778
tipc_crypto_start+0xcc/0x9e0 net/tipc/crypto.c:1466
tipc_init_net+0x2dd/0x430 net/tipc/core.c:72
ops_init+0xb9/0x650 net/core/net_namespace.c:139
setup_net+0x435/0xb40 net/core/net_namespace.c:343
copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
ksys_unshare+0x419/0x970 kernel/fork.c:3323
__do_sys_unshare kernel/fork.c:3394
Freed by task 63:
kfree+0x12a/0x3b0 mm/slub.c:4557
tipc_crypto_stop+0x23c/0x500 net/tipc/crypto.c:1539
tipc_exit_net+0x8c/0x110 net/tipc/core.c:119
ops_exit_list+0xb0/0x180 net/core/net_namespace.c:173
cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
After freed the tipc_crypto tx by delete namespace, tipc_aead_encrypt_done
may still visit it in cryptd_queue_worker workqueue.
I reproduce this issue by:
ip netns add ns1
ip link add veth1 type veth peer name veth2
ip link set veth1 netns ns1
ip netns exec ns1 tipc bearer enable media eth dev veth1
ip netns exec ns1 tipc node set key this_is_a_master_key master
ip netns exec ns1 tipc bearer disable media eth dev veth1
ip netns del ns1
The key of reproduction is that, simd_aead_encrypt is interrupted, leading
to crypto_simd_usable() return false. Thus, the cryptd_queue_worker is
triggered, and the tipc_crypto tx will be visited.
tipc_disc_timeout
tipc_bearer_xmit_skb
tipc_crypto_xmit
tipc_aead_encrypt
crypto_aead_encrypt
// encrypt()
simd_aead_encrypt
// crypto_simd_usable() is false
child = &ctx->cryptd_tfm->base;
simd_aead_encrypt
crypto_aead_encrypt
// encrypt()
cryptd_aead_encrypt_enqueue
cryptd_aead_enqueue
cryptd_enqueue_request
// trigger cryptd_queue_worker
queue_work_on(smp_processor_id(), cryptd_wq, &cpu_queue->work)
Fix this by holding net reference count before encrypt.
Reported-by: syzbot+55c12726619ff85ce1f6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=55c12726619ff85ce1f6
Fixes: fc1b6d6de2 ("tipc: introduce TIPC encryption & authentication")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Link: https://patch.msgid.link/20250520101404.1341730-1-wangliang74@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the
child qdisc's peek() operation before incrementing sch->q.qlen and
sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may
trigger an immediate dequeue and potential packet drop. In such cases,
qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog
have not yet been updated, leading to inconsistent queue accounting. This
can leave an empty HFSC class in the active list, causing further
consequences like use-after-free.
This patch fixes the bug by moving the increment of sch->q.qlen and
sch->qstats.backlog before the call to the child qdisc's peek() operation.
This ensures that queue length and backlog are always accurate when packet
drops or dequeues are triggered during the peek.
Fixes: 12d0ad3be9 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline")
Reported-by: Mingi Cho <mincho@theori.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 5ef44b3cb4 ("xsk: Bring back busy polling support") fixed the
busy polling support in xsk for XDP_ZEROCOPY after it was broken in
commit 86e25f40aa ("net: napi: Add napi_config"). The busy polling
support with XDP_COPY remained broken since the napi_id setup in
xsk_rcv_check was removed.
Bring back the setup of napi_id for XDP_COPY so socket level SO_BUSYPOLL
can be used to poll the underlying napi.
Do the setup of napi_id for XDP_COPY in xsk_bind, as it is done
currently for XDP_ZEROCOPY. The setup of napi_id for XDP_COPY in
xsk_bind is safe because xsk_rcv_check checks that the rx queue at which
the packet arrives is equal to the queue_id that was supplied in bind.
This is done for both XDP_COPY and XDP_ZEROCOPY mode.
Tested using AF_XDP support in virtio-net by running the xsk_rr AF_XDP
benchmarking tool shared here:
https://lore.kernel.org/all/20250320163523.3501305-1-skhawaja@google.com/T/
Enabled socket busy polling using following commands in qemu,
```
sudo ethtool -L eth0 combined 1
echo 400 | sudo tee /proc/sys/net/core/busy_read
echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout
```
Fixes: 5ef44b3cb4 ("xsk: Bring back busy polling support")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the procfs content is generated for a bcm_op which is in the process
to be removed the procfs output might show unreliable data (UAF).
As the removal of bcm_op's is already implemented with rcu handling this
patch adds the missing rcu_read_lock() and makes sure the list entries
are properly removed under rcu protection.
Fixes: f1b4e32aca ("can: bcm: use call_rcu() instead of costly synchronize_rcu()")
Reported-by: Anderson Nascimento <anderson@allelesecurity.com>
Suggested-by: Anderson Nascimento <anderson@allelesecurity.com>
Tested-by: Anderson Nascimento <anderson@allelesecurity.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250519125027.11900-2-socketcan@hartkopp.net
Cc: stable@vger.kernel.org # >= 5.4
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The CAN broadcast manager (CAN BCM) can send a sequence of CAN frames via
hrtimer. The content and also the length of the sequence can be changed
resp reduced at runtime where the 'currframe' counter is then set to zero.
Although this appeared to be a safe operation the updates of 'currframe'
can be triggered from user space and hrtimer context in bcm_can_tx().
Anderson Nascimento created a proof of concept that triggered a KASAN
slab-out-of-bounds read access which can be prevented with a spin_lock_bh.
At the rework of bcm_can_tx() the 'count' variable has been moved into
the protected section as this variable can be modified from both contexts
too.
Fixes: ffd980f976 ("[CAN]: Add broadcast manager (bcm) protocol")
Reported-by: Anderson Nascimento <anderson@allelesecurity.com>
Tested-by: Anderson Nascimento <anderson@allelesecurity.com>
Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250519125027.11900-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
For SOCK_STREAM sockets, if user buffer size (len) is less
than skb size (skb->len), the remaining data from skb
will be lost after calling kfree_skb().
To fix this, move the statement for partial reading
above skb deletion.
Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org)
Fixes: 30a584d944 ("[LLX]: SOCK_DGRAM interface fixes")
Cc: stable@vger.kernel.org
Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
When netfilter defrag hooks are loaded (due to the presence of conntrack
rules, for example), fragmented packets entering the bridge will be
defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
ipv4_conntrack_defrag()).
Later on, in the bridge's post-routing hook, the defragged packet will
be fragmented again. If the size of the largest fragment is larger than
what the kernel has determined as the destination MTU (using
ip_skb_dst_mtu()), the defragged packet will be dropped.
Before commit ac6627a28d ("net: ipv4: Consolidate ipv4_mtu and
ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
the destination MTU. Assuming the dst entry attached to the packet is
the bridge's fake rtable one, this would simply be the bridge's MTU (see
fake_mtu()).
However, after above mentioned commit, ip_skb_dst_mtu() ends up
returning the route's MTU stored in the dst entry's metrics. Ideally, in
case the dst entry is the bridge's fake rtable one, this should be the
bridge's MTU as the bridge takes care of updating this metric when its
MTU changes (see br_change_mtu()).
Unfortunately, the last operation is a no-op given the metrics attached
to the fake rtable entry are marked as read-only. Therefore,
ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
defragged packets are dropped during fragmentation when dealing with
large fragments and high MTU (e.g., 9k).
Fix by moving the fake rtable entry's metrics to be per-bridge (in a
similar fashion to the fake rtable entry itself) and marking them as
writable, thereby allowing MTU changes to be reflected.
Fixes: 62fa8a846d ("net: Implement read-only protection and COW'ing of metrics.")
Fixes: 33eb9873a2 ("bridge: initialize fake_rtable metrics")
Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250515084848.727706-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
l2cap_check_enc_key_size shall check the security level of the
l2cap_chan rather than the hci_conn since for incoming connection
request that may be different as hci_conn may already been
encrypted using a different security level.
Fixes: 522e9ed157 ("Bluetooth: l2cap: Check encryption key size on incoming connection")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Kernel panic occurs when a devmem TCP socket is closed after NIC module
is unloaded.
This is Devmem TCP unregistration scenarios. number is an order.
(a)netlink socket close (b)pp destroy (c)uninstall result
1 2 3 OK
1 3 2 (d)Impossible
2 1 3 OK
3 1 2 (e)Kernel panic
2 3 1 (d)Impossible
3 2 1 (d)Impossible
(a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is
closed.
(b) page_pool_destroy() is called when the interface is down.
(c) mp_ops->uninstall() is called when an interface is unregistered.
(d) There is no scenario in mp_ops->uninstall() is called before
page_pool_destroy().
Because unregister_netdevice_many_notify() closes interfaces first
and then calls mp_ops->uninstall().
(e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire
netdev_lock().
But if the interface module has already been removed, net_device
pointer is invalid, so it causes kernel panic.
In summary, there are only 3 possible scenarios.
A. sk close -> pp destroy -> uninstall.
B. pp destroy -> sk close -> uninstall.
C. pp destroy -> uninstall -> sk close.
Case C is a kernel panic scenario.
In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set
binding->dev to NULL.
It indicates an bound net_device was unregistered.
It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock()
if binding->dev is NULL.
A new binding->lock is added to protect a dev of a binding.
So, lock ordering is like below.
priv->lock
netdev_lock(dev)
binding->lock
Tests:
Scenario A:
./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
-v 7 -t 1 -q 1 &
pid=$!
sleep 10
kill $pid
ip link set $interface down
modprobe -rv $module
Scenario B:
./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
-v 7 -t 1 -q 1 &
pid=$!
sleep 10
ip link set $interface down
kill $pid
modprobe -rv $module
Scenario C:
./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \
-v 7 -t 1 -q 1 &
pid=$!
sleep 10
modprobe -rv $module
sleep 5
kill $pid
Splat looks like:
Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf]
CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G B W 6.15.0-rc1+ #2 PREEMPT(undef) 0947ec89efa0fd68838b78e36aa1617e97ff5d7f
Tainted: [B]=BAD_PAGE, [W]=WARN
RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746)
Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f
RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811
RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc
RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4
R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e
R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88
FS: 0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
...
netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3))
genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705)
...
netlink_release (net/netlink/af_netlink.c:737)
...
__sock_release (net/socket.c:647)
sock_close (net/socket.c:1393)
Fixes: 1d22d3060b ("net: drop rtnl_lock for queue_mgmt operations")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250514154028.1062909-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We cannot set frag_list to NULL pointer when alloc_page failed.
It will be used in tls_strp_check_queue_ok when the next time
tls_strp_read_sock is called.
This is because we don't reset full_len in tls_strp_flush_anchor_copy()
so the recv path will try to continue handling the partial record
on the next call but we dettached the rcvq from the frag list.
Alternative fix would be to reset full_len.
Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000028
Call trace:
tls_strp_check_rcv+0x128/0x27c
tls_strp_data_ready+0x34/0x44
tls_data_ready+0x3c/0x1f0
tcp_data_ready+0x9c/0xe4
tcp_data_queue+0xf6c/0x12d0
tcp_rcv_established+0x52c/0x798
Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Signed-off-by: Pengtao He <hept.hept.hept@gmail.com>
Link: https://patch.msgid.link/20250514132013.17274-1-hept.hept.hept@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make sure that n_channels is set after allocating the
struct cfg80211_registered_device::int_scan_req member. Seen with
syzkaller:
UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:1208:5
index 0 is out of range for type 'struct ieee80211_channel *[] __counted_by(n_channels)' (aka 'struct ieee80211_channel *[]')
This was missed in the initial conversions because I failed to locate
the allocation likely due to the "sizeof(void *)" not matching the
"channels" array type.
Reported-by: syzbot+4bcdddd48bb6f0be0da1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/680fd171.050a0220.2b69d1.045e.GAE@google.com/
Fixes: e3eac9f32e ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by")
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/20250509184641.work.542-kees@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Prior to this patch, the mark is sanitized (applying the state's mask to
the state's value) only on inserts when checking if a conflicting XFRM
state or policy exists.
We discovered in Cilium that this same sanitization does not occur
in the hot-path __xfrm_state_lookup. In the hot-path, the sk_buff's mark
is simply compared to the state's value:
if ((mark & x->mark.m) != x->mark.v)
continue;
Therefore, users can define unsanitized marks (ex. 0xf42/0xf00) which will
never match any packet.
This commit updates __xfrm_state_insert and xfrm_policy_insert to store
the sanitized marks, thus removing this footgun.
This has the side effect of changing the ip output, as the
returned mark will have the mask applied to it when printed.
Fixes: 3d6acfa764 ("xfrm: SA lookups with mark")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
Co-developed-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
__netdev_update_features() expects the netdevice to be ops-locked, but
it gets called recursively on the lower level netdevices to sync their
features, and nothing locks those.
This commit fixes that, with the assumption that it shouldn't be possible
for both higher-level and lover-level netdevices to require the instance
lock, because that would lead to lock dependency warnings.
Without this, playing with higher level (e.g. vxlan) netdevices on top
of netdevices with instance locking enabled can run into issues:
WARNING: CPU: 59 PID: 206496 at ./include/net/netdev_lock.h:17 netif_napi_add_weight_locked+0x753/0xa60
[...]
Call Trace:
<TASK>
mlx5e_open_channel+0xc09/0x3740 [mlx5_core]
mlx5e_open_channels+0x1f0/0x770 [mlx5_core]
mlx5e_safe_switch_params+0x1b5/0x2e0 [mlx5_core]
set_feature_lro+0x1c2/0x330 [mlx5_core]
mlx5e_handle_feature+0xc8/0x140 [mlx5_core]
mlx5e_set_features+0x233/0x2e0 [mlx5_core]
__netdev_update_features+0x5be/0x1670
__netdev_update_features+0x71f/0x1670
dev_ethtool+0x21c5/0x4aa0
dev_ioctl+0x438/0xae0
sock_ioctl+0x2ba/0x690
__x64_sys_ioctl+0xa78/0x1700
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
Fixes: 7e4d784f58 ("net: hold netdev instance lock during rtnetlink operations")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250509072850.2002821-1-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- fix duplicate MAC address check, by Matthias Schiffer
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmgdxEYWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoSG7D/9UpiHwuCb3FCT9hZVNbRX6XChV
DSv8lYedVflwYgIZCqcKlg4/svygmnNNYy4XOJeLw7kjeegGmdQsol0TouZGkqTB
pNxKnporVTxwA4FDSaRwVxwF1UOlelAzcMRxAk5tpIkGPl1wBFLx7dFzWWuU7wvT
c0mtkJlV0kag0OZJEZqY5fYmQNHmYde9rsTpdP44ie7SPqGHpby6MYjqfGl0ztxy
70XCMgp9ByRsCnL45UQYua17gYavKWalCsAYCikQc/+nZGkgymAaIFv5elFrou9N
Dbw+k9X2ipw3v2atAUP40Bw6dvdIkTRaEpAyO5oDGM1//kqwAfUQazS+IWFYE50v
eN2KRGSNJH/J2RHY0ODRomcFDotHykR8aESbkePJHc5ZFgM9E6vAzy+BPrFJaROX
nRO/XmWXeVwyl9CrxoEtA0UusIEr3OQBMi2p5cV7vbcPHSPzQRuA2iKah0ZFsmB0
pGyaeEh60lO2r84lYsO04bA9q8eyZkNHUxd3pZnCpy4iAzEmgKaD7RAIygwHtRim
UezzP8BO7m6dlox16/eJ2Bev2EmA3m3w/rkyFU2jrE9xv8tTkjIYs04zTifs8+F7
hAU+2WM/P0z+lCVEZzTaguEjFK4UrmnmSZxF1B4mh5AVQL5q4R5np4ncTiBdGJCl
uS34escHTXWW4gHttQ==
=kqur
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-pullrequest-20250509' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- fix duplicate MAC address check, by Matthias Schiffer
* tag 'batadv-net-pullrequest-20250509' of git://git.open-mesh.org/linux-merge:
batman-adv: fix duplicate MAC address check
====================
Link: https://patch.msgid.link/20250509090240.107796-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mctp_flow_prepare_output() is called in mctp_route_output(), which
places outbound packets onto a given interface. The packet may represent
a message fragment, in which case we provoke an unbalanced reference
count to the underlying device. This causes trouble if we ever attempt
to remove the interface:
[ 48.702195] usb 1-1: USB disconnect, device number 2
[ 58.883056] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2
[ 69.022548] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2
[ 79.172568] unregister_netdevice: waiting for mctpusb0 to become free. Usage count = 2
...
Predicate the invocation of mctp_dev_set_key() in
mctp_flow_prepare_output() on not already having associated the device
with the key. It's not yet realistic to uphold the property that the key
maintains only one device reference earlier in the transmission sequence
as the route (and therefore the device) may not be known at the time the
key is associated with the socket.
Fixes: 67737c4572 ("mctp: Pass flow data & flow release events to drivers")
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Link: https://patch.msgid.link/20250508-mctp-dev-refcount-v1-1-d4f965c67bb5@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously, when reducing a qdisc's limit via the ->change() operation, only
the main skb queue was trimmed, potentially leaving packets in the gso_skb
list. This could result in NULL pointer dereference when we only check
sch->limit against sch->q.qlen.
This patch introduces a new helper, qdisc_dequeue_internal(), which ensures
both the gso_skb list and the main queue are properly flushed when trimming
excess packets. All relevant qdiscs (codel, fq, fq_codel, fq_pie, hhf, pie)
are updated to use this helper in their ->change() routines.
Fixes: 76e3cc126b ("codel: Controlled Delay AQM")
Fixes: 4b549a2ef4 ("fq_codel: Fair Queue Codel AQM")
Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Fixes: ec97ecf1eb ("net: sched: add Flow Queue PIE packet scheduler")
Fixes: 10239edf86 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Fixes: d4b36210c2 ("net: pkt_sched: PIE AQM scheme")
Reported-by: Will <willsroot@protonmail.com>
Reported-by: Savy <savy@syst3mfailure.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags
- hci_event: Fix not using key encryption size when its known
-----BEGIN PGP SIGNATURE-----
iQJNBAABCgA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmgcyKYZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKWSqEACGdIL6oDrfUmE2mFUxlT2C
eUSe1RDvXi63pFCxv4a1/JrnAICfIPbuKhvkOhf9g3JoQXjqdtX5REjEPcCndyBg
sSyGGfxcaoYLsSWQY1nSb2p/bttk1R3LfPT78QhKPDsh63FkvRw77P++8anzG3T1
wqgjKU9XZb7zYmMElYCqYbnd0K2PymJtD+Oml1srBRdOpz1ex3s6Qj4Cy6cg07Vb
SlxqWMnWZG0wfnusAFZ48zwYu/7/LQQnSJ6rbHSfrKLQnizNVvFTtsW+xrIfQ2m6
G7/WOX2LVwtP3XMe8VIDV0kdDtkDFtHq0KuNToAtt4DS582RHw0AVe+Xc2x5FFKA
rmcukZLvg6tv/DM/PM5zJdvZW4M+r1IOnBSZvFMAdYb4Af04DHjI1k1ow9On6O3f
oJCeRZ4LoOREljR+xdO/Ewn207za0wGR7IlDXFVpeGEOnLcSAxvqUl6biL3cEQpA
bb97I7fjqwSPqpAGsMV0uhULTJxT7QPhF05rL3bpSzAzZoDpRFhc070TLBBuQvW1
zQmf+SCeg72vDtv1vsekyoXWfM+SpwlsNYuStPBztrKZgBGeRKjtZc2/L6dLPKbo
bATO6Dhidm+outsF59YaqORFteON4yLAXSryLw6uVNIT3ApyUspNICyn4zi0tbYY
8N4rjTclIk5zV1z8zCJ7Lg==
=uyVh
-----END PGP SIGNATURE-----
Merge tag 'for-net-2025-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags
- hci_event: Fix not using key encryption size when its known
* tag 'for-net-2025-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_event: Fix not using key encryption size when its known
Bluetooth: MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags
====================
Link: https://patch.msgid.link/20250508150927.385675-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This fixes the regression introduced by 50c1241e6a8a ("Bluetooth: l2cap:
Check encryption key size on incoming connection") introduced a check for
l2cap_check_enc_key_size which checks for hcon->enc_key_size which may
not be initialized if HCI_OP_READ_ENC_KEY_SIZE is still pending.
If the key encryption size is known, due previously reading it using
HCI_OP_READ_ENC_KEY_SIZE, then store it as part of link_key/smp_ltk
structures so the next time the encryption is changed their values are
used as conn->enc_key_size thus avoiding the racing against
HCI_OP_READ_ENC_KEY_SIZE.
Now that the enc_size is stored as part of key the information the code
then attempts to check that there is no downgrade of security if
HCI_OP_READ_ENC_KEY_SIZE returns a value smaller than what has been
previously stored.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220061
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220063
Fixes: 522e9ed157 ("Bluetooth: l2cap: Check encryption key size on incoming connection")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Older drivers and drivers with lower queue counts often have a static
array of queues, rather than allocating structs for each queue on demand.
Add a helper for adding up qstats from a queue range. Expectation is
that driver will pass a queue range [netdev->real_num_*x_queues, MAX).
It was tempting to always use num_*x_queues as the end, but virtio
seems to clamp its queue count after allocating the netdev. And this
way we can trivaly reuse the helper for [0, real_..).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250507003221.823267-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When bpf_redirect_peer is used to redirect packets to a device in
another network namespace, the skb isn't scrubbed. That can lead skb
information from one namespace to be "misused" in another namespace.
As one example, this is causing Cilium to drop traffic when using
bpf_redirect_peer to redirect packets that just went through IPsec
decryption to a container namespace. The following pwru trace shows (1)
the packet path from the host's XFRM layer to the container's XFRM
layer where it's dropped and (2) the number of active skb extensions at
each function.
NETNS MARK IFACE TUPLE FUNC
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 xfrm_rcv_cb
.active_extensions = (__u8)2,
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 xfrm4_rcv_cb
.active_extensions = (__u8)2,
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 gro_cells_receive
.active_extensions = (__u8)2,
[...]
4026533547 0 eth0 10.244.3.124:35473->10.244.2.158:53 skb_do_redirect
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 ip_rcv
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 ip_rcv_core
.active_extensions = (__u8)2,
[...]
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 udp_queue_rcv_one_skb
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 __xfrm_policy_check
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 __xfrm_decode_session
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 security_xfrm_decode_session
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
.active_extensions = (__u8)2,
In this case, there are no XFRM policies in the container's network
namespace so the drop is unexpected. When we decrypt the IPsec packet,
the XFRM state used for decryption is set in the skb extensions. This
information is preserved across the netns switch. When we reach the
XFRM policy check in the container's netns, __xfrm_policy_check drops
the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
policy can't be found that matches the (host-side) XFRM state used for
decryption.
This patch fixes this by scrubbing the packet when using
bpf_redirect_peer, as is done on typical netns switches via veth
devices except skb->mark and skb->tstamp are not zeroed.
Fixes: 9aa1206e8f ("bpf: Add redirect_peer helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/1728ead5e0fe45e7a6542c36bd4e3ca07a73b7d6.1746460653.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmgb2D4ACgkQ1w0aZmrP
KyHB4g//SzExpaI5eTduMIg9OwLmWyUNpsxCX8Qk3E5FX8jyGZDU1CAyTnbYZyuJ
VAn+5PwhhswEDs85ZgXiHXMjdOXvmPs1l0IKVVlM4DoJqlRDXpAcuKzPH0m0J46Q
A+nfIIsFfH8roQq+PhLPQ3TEOnIGK1vgNWVICl4oX5sYjkYTIzPnadEaTpNsG26I
gZlvpqM9EH7KLWqhcX1h2/O+JB/6oNaFQTUK77qc9sGijMVMUlGUBtDN40aZ1IEW
RP6l6J1Oea3j3wavmrrQxHSDtvGS4rnC0qTQ4Xs3OIeTrQiou4cmsCqp+ZLmMu7Q
YNBU04o78uEOX9n3ux6Xm4tr2WLizHFNkGBU+m0WQSsEGYx7CWsALXIqnEsZWDLQ
IK4y/d7M0rWLNB4nYTePIBPjKt+qzBe0gTgVr9jMLFI/X60vz3XYHJDBovaBL3dd
skiVohmYxjiQj5Y6WPtZaLvubzVEULfjVGHatYLGheLXHsEO/BjXFJeIfYK6bmfO
eXAx9pPsnqCbtKzh6n+tQzYCeRhlw+Oh9Uw48MDpeIX+cTUNr+eaN1LalQz3C3zX
KqfnK/ZlWUWpQmyLzhMA6DwSUKPBbJ5nzl/NuWSowUPfnldT++rGAuW4lFOTtMPo
DsJDnONuE6FPaNU5Ef3rIivVfrAnCAmRK797psL0buHDY9YRGcM=
=Jwld
-----END PGP SIGNATURE-----
Merge tag 'nf-25-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contain Netfilter/IPVS fixes for net:
1) Fix KMSAN uninit-value in do_output_route4, reported by syzbot.
Patch from Julian Anastasov.
2) ipset hashtable set type breaks up the hashtable into regions of
2^10 buckets. Fix the macro that determines the hashtable lock
region to protect concurrent updates. From Jozsef Kadlecsik.
* tag 'nf-25-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: ipset: fix region locking in hash types
ipvs: fix uninit-value for saddr in do_output_route4
====================
Link: https://patch.msgid.link/20250507221952.86505-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch replaces the manual Netlink attribute iteration in
output_userspace() with nla_for_each_nested(), which ensures that only
well-formed attributes are processed.
Fixes: ccb1352e76 ("net: Add Open vSwitch kernel components.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/0bd65949df61591d9171c0dc13e42cea8941da10.1746541734.git.echaudro@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Region locking introduced in v5.6-rc4 contained three macros to handle
the region locks: ahash_bucket_start(), ahash_bucket_end() which gave
back the start and end hash bucket values belonging to a given region
lock and ahash_region() which should give back the region lock belonging
to a given hash bucket. The latter was incorrect which can lead to a
race condition between the garbage collector and adding new elements
when a hash type of set is defined with timeouts.
Fixes: f66ee0410b ("netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports")
Reported-by: Kota Toda <kota.toda@gmo-cybersecurity.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Device flags could be updated in the meantime while MGMT_OP_ADD_DEVICE
is pending on hci_update_passive_scan_sync so instead of setting the
current_flags as cmd->user_data just do a lookup using
hci_conn_params_lookup and use the latest stored flags.
Fixes: a182d9c84f ("Bluetooth: MGMT: Fix Add Device to responding before completing")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
* iwlwifi: add two missing device entries
* cfg80211: fix a potential out-of-bounds access
* mac80211: fix format of TID to link mapping action frames
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmgacfwACgkQ10qiO8sP
aABEuw//X1RW+5zHLmH3EHW+buxDeqev9C/YStL/1+qtiQG4HSQGVrdr9Vv3PqFt
3MYqPzZEWQrRl4l8GW8ezsNvkGCoYwUBOq92Qa1v7tBrauOQH/zkh3KtgjgT9xql
0ZhIL0JXVmDmhv8WF7JqogwZDJ/wUy/QH006jbRvwhZ9VtQJpSVECr3hxy+pCP88
JNNVD4yQMr8al5zY8hdbgPneAvk/2llmz542BiQ65zoqI7TqERQlCwKFWat7kfFl
7+rTNRDUD0RAPnr5FE1VjIaQfHV7J0OnDpnH70C18mDsRSkU7E0GjHg0SZk3bPgV
WH4OoxmnwQ1vR/vwLvFQ0Ly8oruet6SRJ4R4l0hcDD1Kkv4jpzQZzxojlf5EqUt8
N/oYuff0ju8Kb7ELR/i9PmVIl9bC24mDvzWeOn3xwiCRywGc2MF2MITz8iTzlMQk
lPYe4cBhaSYwn6heyQiT1IpOEYzyfZE1zX83k8vIR3mO+umZOBlh6sv4NvMx5lbW
f1u8W2sgONjehWFjFVp2h4fmj4UYy6N8NRHPt8gjkZkkfuto55/2+QPMFN9Jpg9H
3qqFoXg3uZUhVyNQ3V//R+A1eBbYwCzwE43D1Zf3RlEs5V3T1TUp10sxn1/9Jp46
wiMEY4LUujwKwAckLU2HyQwf3SSSGVKx8/PW76cWZhYjSTwcBLM=
=Bzvu
-----END PGP SIGNATURE-----
Merge tag 'wireless-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Couple of fixes:
* iwlwifi: add two missing device entries
* cfg80211: fix a potential out-of-bounds access
* mac80211: fix format of TID to link mapping action frames
* tag 'wireless-2025-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: add support for Killer on MTL
wifi: mac80211: fix the type of status_code for negotiated TID to Link Mapping
wifi: cfg80211: fix out-of-bounds access during multi-link element defragmentation
====================
Link: https://patch.msgid.link/20250506203506.158818-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmgaFN8THG1rbEBwZW5n
dXRyb25peC5kZQAKCRAMdGXf+ZCRnOZVB/0SLNCKblGAISuevXuFKlXAFXcW2oxN
4g38M6PzZ4zS+XWTHeGYn5XoKuwCTT/9vkD55ADLy4Xo0JDI/4sHxh6y0h9ROGL8
wbyzSTmhiIAA5JftudvrA51Ac1pD2RWU8O6tb8xg0/+wFX0H/U7k9mZnttZBArgM
NnsLGZEjAJdeqKZkcXR1GTveK9aE3Hntpz5dacMO92yMWYts5DMpyc99itfSEa0G
M8HGWI7G9blBgw1BeCIpjnbfdQluitg2B9sbnJlzVZ9+iIHCnoykXo4HpnglksfW
DTYQmbNVhVh0hUQVf9L4eOEgq/gH0cU5RwoMtsaiQjpUTuNHtHDtbE0U
=5sTX
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-6.15-20250506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-05-06
The first patch is by Antonios Salios and adds a missing
spin_lock_init() to the m_can driver.
The next 3 patches are by me and fix the unregistration order in the
mcp251xfd, rockchip_canfd and m_can driver.
The last patch is by Oliver Hartkopp and fixes RCU and BH
locking/handling in the CAN gw protocol.
* tag 'linux-can-fixes-for-6.15-20250506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: gw: fix RCU/BH usage in cgw_create_job()
can: mcan: m_can_class_unregister(): fix order of unregistration calls
can: rockchip_canfd: rkcanfd_remove(): fix order of unregistration calls
can: mcp251xfd: mcp251xfd_remove(): fix order of unregistration calls
can: mcp251xfd: fix TDC setting for low data bit rates
can: m_can: m_can_class_allocate_dev(): initialize spin lock on device probe
====================
Link: https://patch.msgid.link/20250506135939.652543-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Accidentally spotted while trying to understand what else needs
to be renamed to netif_ prefix. Most of the calls to dev_set_promiscuity
are adjacent to dev_set_allmulti or dev_disable_lro so it should
be safe to add the lock. Note that new netif_set_promiscuity is
currently unused, the locked paths call __dev_set_promiscuity directly.
Fixes: ad7c7b2172 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250506011919.2882313-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
__qdisc_destroy() calls into various qdiscs .destroy() op, which in turn
can call .ndo_setup_tc(), which requires the netdev instance lock.
This commit extends the critical section in
unregister_netdevice_many_notify() to cover dev_shutdown() (and
dev_tcx_uninstall() as a side-effect) and acquires the netdev instance
lock in __dev_change_net_namespace() for the other dev_shutdown() call.
This should now guarantee that for all qdisc ops, the netdev instance
lock is held during .ndo_setup_tc().
Fixes: a0527ee2df ("net: hold netdev instance lock during qdisc ndo_setup_tc")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250505194713.1723399-1-cratiu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The status code should be type of __le16.
Fixes: 83e897a961 ("wifi: ieee80211: add definitions for negotiated TID to Link map")
Fixes: 8f500fbc6c ("wifi: mac80211: process and save negotiated TID to Link mapping request")
Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com>
Link: https://patch.msgid.link/20250505081946.3927214-1-michael-cy.lee@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently during the multi-link element defragmentation process, the
multi-link element length added to the total IEs length when calculating
the length of remaining IEs after the multi-link element in
cfg80211_defrag_mle(). This could lead to out-of-bounds access if the
multi-link element or its corresponding fragment elements are the last
elements in the IEs buffer.
To address this issue, correctly calculate the remaining IEs length by
deducting the multi-link element end offset from total IEs end offset.
Cc: stable@vger.kernel.org
Fixes: 2481b5da9c ("wifi: cfg80211: handle BSS data contained in ML probe responses")
Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://patch.msgid.link/20250424-fix_mle_defragmentation_oob_access-v1-1-84412a1743fa@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As reported by Sebastian Andrzej Siewior the use of local_bh_disable()
is only feasible in uni processor systems to update the modification rules.
The usual use-case to update the modification rules is to update the data
of the modifications but not the modification types (AND/OR/XOR/SET) or
the checksum functions itself.
To omit additional memory allocations to maintain fast modification
switching times, the modification description space is doubled at gw-job
creation time so that only the reference to the active modification
description is changed under rcu protection.
Rename cgw_job::mod to cf_mod and make it a RCU pointer. Allocate in
cgw_create_job() and free it together with cgw_job in
cgw_job_free_rcu(). Update all users to dereference cgw_job::cf_mod with
a RCU accessor and if possible once.
[bigeasy: Replace mod1/mod2 from the Oliver's original patch with dynamic
allocation, use RCU annotation and accessor]
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Closes: https://lore.kernel.org/linux-can/20231031112349.y0aLoBrz@linutronix.de/
Fixes: dd895d7f21 ("can: cangw: introduce optional uid to reference created routing jobs")
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20250429070555.cs-7b_eZ@linutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE
devices in most cases and fall back to using add_v4_addrs() only in
case the GRE configuration is incompatible with addrconf_addr_gen().
GRE used to use addrconf_addr_gen() until commit e5dd729460 ("ip/ip6_gre:
use the same logic as SIT interfaces when computing v6LL address")
restricted this use to gretap and ip6gretap devices, and created
add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones.
The original problem came when commit 9af28511be ("addrconf: refuse
isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its
addr parameter was 0. The commit says that this would create an invalid
address, however, I couldn't find any RFC saying that the generated
interface identifier would be wrong. Anyway, since gre over IPv4
devices pass their local tunnel address to __ipv6_isatap_ifid(), that
commit broke their IPv6 link-local address generation when the local
address was unspecified.
Then commit e5dd729460 ("ip/ip6_gre: use the same logic as SIT
interfaces when computing v6LL address") tried to fix that case by
defining add_v4_addrs() and calling it to generate the IPv6 link-local
address instead of using addrconf_addr_gen() (apart for gretap and
ip6gretap devices, which would still use the regular
addrconf_addr_gen(), since they have a MAC address).
That broke several use cases because add_v4_addrs() isn't properly
integrated into the rest of IPv6 Neighbor Discovery code. Several of
these shortcomings have been fixed over time, but add_v4_addrs()
remains broken on several aspects. In particular, it doesn't send any
Router Sollicitations, so the SLAAC process doesn't start until the
interface receives a Router Advertisement. Also, add_v4_addrs() mostly
ignores the address generation mode of the interface
(/proc/sys/net/ipv6/conf/*/addr_gen_mode), thus breaking the
IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases.
Fix the situation by using add_v4_addrs() only in the specific scenario
where the normal method would fail. That is, for interfaces that have
all of the following characteristics:
* run over IPv4,
* transport IP packets directly, not Ethernet (that is, not gretap
interfaces),
* tunnel endpoint is INADDR_ANY (that is, 0),
* device address generation mode is EUI64.
In all other cases, revert back to the regular addrconf_addr_gen().
Also, remove the special case for ip6gre interfaces in add_v4_addrs(),
since ip6gre devices now always use addrconf_addr_gen() instead.
Note:
This patch was originally applied as commit 183185a18f ("gre: Fix
IPv6 link-local address generation."). However, it was then reverted
by commit fc486c2d06 ("Revert "gre: Fix IPv6 link-local address
generation."") because it uncovered another bug that ended up
breaking net/forwarding/ip6gre_custom_multipath_hash.sh. That other
bug has now been fixed by commit 4d0ab3a688 ("ipv6: Start path
selection from the first nexthop"). Therefore we can now revive this
GRE patch (no changes since original commit 183185a18f ("gre: Fix
IPv6 link-local address generation.").
Fixes: e5dd729460 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/a88cc5c4811af36007645d610c95102dccb360a6.1746225214.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alan reported a NULL pointer dereference in htb_next_rb_node()
after we made htb_qlen_notify() idempotent.
It turns out in the following case it introduced some regression:
htb_dequeue_tree():
|-> fq_codel_dequeue()
|-> qdisc_tree_reduce_backlog()
|-> htb_qlen_notify()
|-> htb_deactivate()
|-> htb_next_rb_node()
|-> htb_deactivate()
For htb_next_rb_node(), after calling the 1st htb_deactivate(), the
clprio[prio]->ptr could be already set to NULL, which means
htb_next_rb_node() is vulnerable here.
For htb_deactivate(), although we checked qlen before calling it, in
case of qlen==0 after qdisc_tree_reduce_backlog(), we may call it again
which triggers the warning inside.
To fix the issues here, we need to:
1) Make htb_deactivate() idempotent, that is, simply return if we
already call it before.
2) Make htb_next_rb_node() safe against ptr==NULL.
Many thanks to Alan for testing and for the reproducer.
Fixes: 5ba8b837b5 ("sch_htb: make htb_qlen_notify() idempotent")
Reported-by: Alan J. Wylie <alan@wylie.me.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250428232955.1740419-2-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Things have calmed down on our end (knock on wood), no outstanding
investigations. Including fixes from Bluetooth and WiFi.
Current release - fix to a fix:
- igc: fix lock order in igc_ptp_reset
Current release - new code bugs:
- Revert "wifi: iwlwifi: make no_160 more generic", fixes regression
to Killer line of devices reported by a number of people
- Revert "wifi: iwlwifi: add support for BE213", initial FW is too buggy
- number of fixes for mld, the new Intel WiFi subdriver
Previous releases - regressions:
- wifi: mac80211: restore monitor for outgoing frames
- drv: vmxnet3: fix malformed packet sizing in vmxnet3_process_xdp
- eth: bnxt_en: fix timestamping FIFO getting out of sync on reset,
delivering stale timestamps
- use sock_gen_put() in the TCP fraglist GRO heuristic, don't assume
every socket is a full socket
Previous releases - always broken:
- sched: adapt qdiscs for reentrant enqueue cases, fix list corruptions
- xsk: fix race condition in AF_XDP generic RX path, shared UMEM
can't be protected by a per-socket lock
- eth: mtk-star-emac: fix spinlock recursion issues on rx/tx poll
- btusb: avoid NULL pointer dereference in skb_dequeue()
- dsa: felix: fix broken taprio gate states after clock jump
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmgTmIoACgkQMUZtbf5S
IruoZhAApd5csY/U9Pl7h5CK6mQek7qk+EbDcLs+/j/0UFXcOayuFKILHT27BIe5
gEVbQBkcEpzj+QKJblWvuczreBXChTmhfQb1zItRMWJktYmlqYtPsPE4kM2inzGo
94iDeSTca/LKPfc4dPQ/rBfXMQb7NX7UhWRR9AFBWhOuM/SDjItyZUFPqhmGDctO
8+7YdftK2qFcwIN18kU4GqbmJEZK9Y4+ePabABR7VBFtw70bpdCptWp5tmRMbq6V
KphxkWbWg8SP1HUnAfRAS4C3TO4Dn6e7CVaAirroqrHWmawiWgHV/o1awc0ErO8k
aAgVhPOB5zzVqbV0ZeoEj5PL5/2rus/KmWkMludRAqGGivcQOu6GOuvo+QPwpYCU
3edxCU95kqRB6XrjsLnZ6HnpwtBr6CUVf9FXMGNt4199cWN/PE+w4qxTz2vxiiH3
At0su5W41s4mvldktait+ilwk+R3v1rIzTR6IDmNPW8pACI8RlQt8/Mgy7smDZp9
FEKtWhXHdww3DosB3vHOGZ5u6gRlGvfjPsJlt2C9fGZIdrOJzzHDZ37BXDYDTYnx
9mejzy/YAYjK826YGA3sqJ3fUDFUoAynZuajDKxErvtIZlRzEDLu+7pG4zAB8Nrx
tHdz5iR2rDdZ87S4jB8CAP7FGVs+YZkekVvY5fIw4dGs1QMc2wE=
=6xNz
-----END PGP SIGNATURE-----
Merge tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Happy May Day.
Things have calmed down on our end (knock on wood), no outstanding
investigations. Including fixes from Bluetooth and WiFi.
Current release - fix to a fix:
- igc: fix lock order in igc_ptp_reset
Current release - new code bugs:
- Revert "wifi: iwlwifi: make no_160 more generic", fixes regression
to Killer line of devices reported by a number of people
- Revert "wifi: iwlwifi: add support for BE213", initial FW is too
buggy
- number of fixes for mld, the new Intel WiFi subdriver
Previous releases - regressions:
- wifi: mac80211: restore monitor for outgoing frames
- drv: vmxnet3: fix malformed packet sizing in vmxnet3_process_xdp
- eth: bnxt_en: fix timestamping FIFO getting out of sync on reset,
delivering stale timestamps
- use sock_gen_put() in the TCP fraglist GRO heuristic, don't assume
every socket is a full socket
Previous releases - always broken:
- sched: adapt qdiscs for reentrant enqueue cases, fix list
corruptions
- xsk: fix race condition in AF_XDP generic RX path, shared UMEM
can't be protected by a per-socket lock
- eth: mtk-star-emac: fix spinlock recursion issues on rx/tx poll
- btusb: avoid NULL pointer dereference in skb_dequeue()
- dsa: felix: fix broken taprio gate states after clock jump"
* tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
net: vertexcom: mse102x: Fix RX error handling
net: vertexcom: mse102x: Add range check for CMD_RTS
net: vertexcom: mse102x: Fix LEN_MASK
net: vertexcom: mse102x: Fix possible stuck of SPI interrupt
net: hns3: defer calling ptp_clock_register()
net: hns3: fixed debugfs tm_qset size
net: hns3: fix an interrupt residual problem
net: hns3: store rx VLAN tag offload state for VF
octeon_ep: Fix host hang issue during device reboot
net: fec: ERR007885 Workaround for conventional TX
net: lan743x: Fix memleak issue when GSO enabled
ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations
net: use sock_gen_put() when sk_state is TCP_TIME_WAIT
bnxt_en: fix module unload sequence
bnxt_en: Fix ethtool -d byte order for 32-bit values
bnxt_en: Fix out-of-bound memcpy() during ethtool -w
bnxt_en: Fix coredump logic to free allocated buffer
bnxt_en: delay pci_alloc_irq_vectors() in the AER path
bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()
bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()
...
ipcomp_post_acomp currently drops all frags (via pskb_trim_unique(skb,
0)), and then subtracts the old skb->data_len from truesize. This
adjustment has already be done during trimming (in skb_condense), so
we don't need to do it again.
This shows up for example when running fragmented traffic over ipcomp,
we end up hitting the WARN_ON_ONCE in skb_try_coalesce.
Fixes: eb2953d269 ("xfrm: ipcomp: Use crypto_acomp interface")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
If any address or port is changed, update it in all packets and recalculate
checksum.
Fixes: 9fd1ff5d2a ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250426153210.14044-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As described in Gerrard's report [1], there are use cases where a netem
child qdisc will make the parent qdisc's enqueue callback reentrant.
In the case of qfq, there won't be a UAF, but the code will add the same
classifier to the list twice, which will cause memory corruption.
This patch checks whether the class was already added to the agg->active
list (cl_is_active) before doing the addition to cater for the reentrant
case.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
Fixes: 37d9cf1a3c ("sched: Fix detection of empty queues in child qdiscs")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-5-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As described in Gerrard's report [1], there are use cases where a netem
child qdisc will make the parent qdisc's enqueue callback reentrant.
In the case of ets, there won't be a UAF, but the code will add the same
classifier to the list twice, which will cause memory corruption.
In addition to checking for qlen being zero, this patch checks whether
the class was already added to the active_list (cl_is_active) before
doing the addition to cater for the reentrant case.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
Fixes: 37d9cf1a3c ("sched: Fix detection of empty queues in child qdiscs")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-4-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As described in Gerrard's report [1], we have a UAF case when an hfsc class
has a netem child qdisc. The crux of the issue is that hfsc is assuming
that checking for cl->qdisc->q.qlen == 0 guarantees that it hasn't inserted
the class in the vttree or eltree (which is not true for the netem
duplicate case).
This patch checks the n_active class variable to make sure that the code
won't insert the class in the vttree or eltree twice, catering for the
reentrant case.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
Fixes: 37d9cf1a3c ("sched: Fix detection of empty queues in child qdiscs")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-3-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As described in Gerrard's report [1], there are use cases where a netem
child qdisc will make the parent qdisc's enqueue callback reentrant.
In the case of drr, there won't be a UAF, but the code will add the same
classifier to the list twice, which will cause memory corruption.
In addition to checking for qlen being zero, this patch checks whether the
class was already added to the active_list (cl_is_active) before adding
to the list to cover for the reentrant case.
[1] https://lore.kernel.org/netdev/CAHcdcOm+03OD2j6R0=YHKqmy=VgJ8xEOKuP6c7mSgnp-TEJJbw@mail.gmail.com/
Fixes: 37d9cf1a3c ("sched: Fix detection of empty queues in child qdiscs")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250425220710.3964791-2-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Revert a v6.15 patch due to a report of SELinux test failures
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmgNA7gACgkQM2qzM29m
f5fL+RAAlzI/+eISq6euOz6f56tP5BcqLgLtjOuaM+7d0Ix+cAihj2LUJpGjhbDH
CalF2J3RxdAeK/PlHsbzOt5Aywec0pnYMuUG5+4BDuZyYl8D9ZuzO9zNjU8uf/Mp
sm0O7s/RV4jbNn/8lCrD7keDqqb12Qo5F+4vOD5bE99rs9SMQSXkDSd6lR+Htj/x
M6NGOP3FvgzjdZKuSBM1mRcJcermM76dE1DTdQ1zGswUoglomYC5v/y72Gdt/QRq
UpSMxlX6mmoSSQSVIPVitMlgG+PzpQnGMp207WfupRY+N2euWn+9BGgKtmzQu3NJ
BvctRSqNuc6rqRaH9LcSc5ArNB5UbBxjl385IMHhAs7w4JiLzObP6Shy3xHmDipY
5/X1IHLg8b+CRzyHy2/a2LsBde3kvlhIpkYMkIkzC7WJqJvEWiQQaMk4vKpOvmrW
WOR2wOLgX/Fq/2BZe3Ixe8PgdLck+qAjIYjesc/No/9Tb8Iz4CjnqOiv1WhxkG1P
cFZvO6MP63/AgVwVf6EHUq8L96CwdXo+XJl9P48fcYpUYLRpMn1+K1IrgfF9BC7T
TuQHn1/l427lGIg2yehHeHQ2hwBqQ4uYohJuJquSbKcprKcl2P1UNDCNoSymSoGH
PqwdXd45XGjTkNBWYS9JZMmIFJJGLl1Txypc+P9zbAruWJPQoLg=
=MOAx
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Revert a v6.15 patch due to a report of SELinux test failures
* tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "sunrpc: clean cache_detail immediately when flush is written frequently"
Ondrej reports that certain SELinux tests are failing after commit
fc2a169c56 ("sunrpc: clean cache_detail immediately when flush is
written frequently"), merged during the v6.15 merge window.
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Fixes: fc2a169c56 ("sunrpc: clean cache_detail immediately when flush is written frequently")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmgL1ZUTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi9KbCACTw0bAFjWzuG9EIF6UjFfDF1XVaU7f
teIHfARjveZIhEYC4veM//EZeiHoFxgwdwyN3TP6Q7nwjcxHEG5/z1KC6sPtYzjA
dZj/F4tXJ5h5TICxIIJAD4HSN5YgwHx/JTAq3ZkayVsAlNpUOslkK6BtilmUpPIE
T3dXOdnm1e6QNjXyeaqnJoYgWqjHrENOSjA8gF0yuEUwOfmln9OPIImFRNyMrQ3q
iePJd2tHzC8SKoub5qgMEXvQT6fI8zkELEvjGHk0MYSCfHK5E2LjjcEWlMqTVP4m
3n9uj2LWqKOyx5Viv+g1Ab/CuW/AnYLMfg4NPlyz7MFLts/Y/BcTElyA
=pFi+
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A small CephFS encryption-related fix and a dead code cleanup"
* tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client:
ceph: Fix incorrect flush end position calculation
ceph: Remove osd_client deadcode