mirror of
https://github.com/torvalds/linux.git
synced 2026-05-21 13:27:57 +02:00
2ca06a2f65
1324990 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2ca06a2f65 |
mptcp: be sure to send ack when mptcp-level window re-opens
mptcp_cleanup_rbuf() is responsible to send acks when the user-space reads enough data to update the receive windows significantly. It tries hard to avoid acquiring the subflow sockets locks by checking conditions similar to the ones implemented at the TCP level. To avoid too much code duplication - the MPTCP protocol can't reuse the TCP helpers as part of the relevant status is maintained into the msk socket - and multiple costly window size computation, mptcp_cleanup_rbuf uses a rough estimate for the most recently advertised window size: the MPTCP receive free space, as recorded as at last-ack time. Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect a zero to non-zero win change in some corner cases, skipping the tcp_cleanup_rbuf call and leaving the peer stuck. After commit |
||
|
|
665bcfc982 |
Merge branch 'vsock-some-fixes-due-to-transport-de-assignment'
Stefano Garzarella says: ==================== vsock: some fixes due to transport de-assignment v1: https://lore.kernel.org/netdev/20250108180617.154053-1-sgarzare@redhat.com/ v2: - Added patch 3 to cancel the virtio close delayed work when de-assigning the transport - Added patch 4 to clean the socket state after de-assigning the transport - Added patch 5 as suggested by Michael and Hyunwoo Kim. It's based on Hyunwoo Kim and Wongi Lee patch [1] but using WARN_ON and covering more functions - Added R-b/T-b tags This series includes two patches discussed in the thread started by Hyunwoo Kim a few weeks ago [1], plus 3 more patches added after some discussions on v1 (see changelog). All related to the case where a vsock socket is de-assigned from a transport (e.g., because the connect fails or is interrupted by a signal) and then assigned to another transport or to no-one (NULL). I tested with usual vsock test suite, plus Michal repro [2]. (Note: the repo works only if a G2H transport is not loaded, e.g. virtio-vsock driver). The first patch is a fix more appropriate to the problem reported in that thread, the second patch on the other hand is a related fix but of a different problem highlighted by Michal Luczaj. It's present only in vsock_bpf and already handled in af_vsock.c The third patch is to cancel the virtio close delayed work when de-assigning the transport, the fourth patch is to clean the socket state after de-assigning the transport, the last patch adds warnings and prevents null-ptr-deref in vsock_*[has_data|has_space]. Hyunwoo Kim, Michal, if you can test and report your Tested-by that would be great! [1] https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/ [2] https://lore.kernel.org/netdev/2b3062e3-bdaa-4c94-a3c0-2930595b9670@rbox.co/ ==================== Link: https://patch.msgid.link/20250110083511.30419-1-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
91751e2482 |
vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]
Recent reports have shown how we sometimes call vsock_*_has_data()
when a vsock socket has been de-assigned from a transport (see attached
links), but we shouldn't.
Previous commits should have solved the real problems, but we may have
more in the future, so to avoid null-ptr-deref, we can return 0
(no space, no data available) but with a warning.
This way the code should continue to run in a nearly consistent state
and have a warning that allows us to debug future problems.
Fixes:
|
||
|
|
a24009bc9b |
vsock: reset socket state when de-assigning the transport
Transport's release() and destruct() are called when de-assigning the
vsock transport. These callbacks can touch some socket state like
sock flags, sk_state, and peer_shutdown.
Since we are reassigning the socket to a new transport during
vsock_connect(), let's reset these fields to have a clean state with
the new transport.
Fixes:
|
||
|
|
df137da9d6 |
vsock/virtio: cancel close work in the destructor
During virtio_transport_release() we can schedule a delayed work to
perform the closing of the socket before destruction.
The destructor is called either when the socket is really destroyed
(reference counter to zero), or it can also be called when we are
de-assigning the transport.
In the former case, we are sure the delayed work has completed, because
it holds a reference until it completes, so the destructor will
definitely be called after the delayed work is finished.
But in the latter case, the destructor is called by AF_VSOCK core, just
after the release(), so there may still be delayed work scheduled.
Refactor the code, moving the code to delete the close work already in
the do_close() to a new function. Invoke it during destruction to make
sure we don't leave any pending work.
Fixes:
|
||
|
|
f6abafcd32 |
vsock/bpf: return early if transport is not assigned
Some of the core functions can only be called if the transport
has been assigned.
As Michal reported, a socket might have the transport at NULL,
for example after a failed connect(), causing the following trace:
BUG: kernel NULL pointer dereference, address: 00000000000000a0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+
RIP: 0010:vsock_connectible_has_data+0x1f/0x40
Call Trace:
vsock_bpf_recvmsg+0xca/0x5e0
sock_recvmsg+0xb9/0xc0
__sys_recvfrom+0xb3/0x130
__x64_sys_recvfrom+0x20/0x30
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
So we need to check the `vsk->transport` in vsock_bpf_recvmsg(),
especially for connected sockets (stream/seqpacket) as we already
do in __vsock_connectible_recvmsg().
Fixes:
|
||
|
|
2cb7c756f6 |
vsock/virtio: discard packets if the transport changes
If the socket has been de-assigned or assigned to another transport,
we must discard any packets received because they are not expected
and would cause issues when we access vsk->transport.
A possible scenario is described by Hyunwoo Kim in the attached link,
where after a first connect() interrupted by a signal, and a second
connect() failed, we can find `vsk->transport` at NULL, leading to a
NULL pointer dereference.
Fixes:
|
||
|
|
0865b9fdb2 |
Merge branch 'gtp-pfcp-fix-use-after-free-of-udp-tunnel-socket'
Kuniyuki Iwashima says: ==================== gtp/pfcp: Fix use-after-free of UDP tunnel socket. Xiao Liang pointed out weird netns usages in ->newlink() of gtp and pfcp. This series fixes the issues. Link: https://lore.kernel.org/netdev/20250104125732.17335-1-shaw.leon@gmail.com/ Changes: v2: * Patch 1 * Fix uninit/unused local var v1: https://lore.kernel.org/netdev/20250108062834.11117-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20250110014754.33847-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
|
|
ffc90e9ca6 |
pfcp: Destroy device along with udp socket's netns dismantle.
pfcp_newlink() links the device to a list in dev_net(dev) instead
of net, where a udp tunnel socket is created.
Even when net is removed, the device stays alive on dev_net(dev).
Then, removing net triggers the splat below. [0]
In this example, pfcp0 is created in ns2, but the udp socket is
created in ns1.
ip netns add ns1
ip netns add ns2
ip -n ns1 link add netns ns2 name pfcp0 type pfcp
ip netns del ns1
Let's link the device to the socket's netns instead.
Now, pfcp_net_exit() needs another netdev iteration to remove
all pfcp devices in the netns.
pfcp_dev_list is not used under RCU, so the list API is converted
to the non-RCU variant.
pfcp_net_exit() can be converted to .exit_batch_rtnl() in net-next.
[0]:
ref_tracker: net notrefcnt@00000000128b34dc has 1/1 users at
sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
__sock_create (net/socket.c:1558)
udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
pfcp_create_sock (drivers/net/pfcp.c:168)
pfcp_newlink (drivers/net/pfcp.c:182 drivers/net/pfcp.c:197)
rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
netlink_rcv_skb (net/netlink/af_netlink.c:2542)
netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
netlink_sendmsg (net/netlink/af_netlink.c:1891)
____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
___sys_sendmsg (net/socket.c:2639)
__sys_sendmsg (net/socket.c:2669)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
WARNING: CPU: 1 PID: 11 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 11 Comm: kworker/u16:0 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:ff11000007f3fb60 EFLAGS: 00010286
RAX: 00000000000020ef RBX: ff1100000d6481e0 RCX: 1ffffffff0e40d82
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c
RBP: ff1100000d648230 R08: 0000000000000001 R09: fffffbfff0e395af
R10: 0000000000000001 R11: 0000000000000000 R12: ff1100000d648230
R13: dead000000000100 R14: ff1100000d648230 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005620e1363990 CR3: 000000000eeb2002 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn (kernel/panic.c:748)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:285)
? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
? kfree (mm/slub.c:4613 mm/slub.c:4761)
net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>
Fixes:
|
||
|
|
eb28fd76c0 |
gtp: Destroy device along with udp socket's netns dismantle.
gtp_newlink() links the device to a list in dev_net(dev) instead of
src_net, where a udp tunnel socket is created.
Even when src_net is removed, the device stays alive on dev_net(dev).
Then, removing src_net triggers the splat below. [0]
In this example, gtp0 is created in ns2, and the udp socket is created
in ns1.
ip netns add ns1
ip netns add ns2
ip -n ns1 link add netns ns2 name gtp0 type gtp role sgsn
ip netns del ns1
Let's link the device to the socket's netns instead.
Now, gtp_net_exit_batch_rtnl() needs another netdev iteration to remove
all gtp devices in the netns.
[0]:
ref_tracker: net notrefcnt@000000003d6e7d05 has 1/2 users at
sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
__sock_create (net/socket.c:1558)
udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
gtp_create_sock (./include/net/udp_tunnel.h:59 drivers/net/gtp.c:1423)
gtp_create_sockets (drivers/net/gtp.c:1447)
gtp_newlink (drivers/net/gtp.c:1507)
rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
netlink_rcv_skb (net/netlink/af_netlink.c:2542)
netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
netlink_sendmsg (net/netlink/af_netlink.c:1891)
____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
___sys_sendmsg (net/socket.c:2639)
__sys_sendmsg (net/socket.c:2669)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
WARNING: CPU: 1 PID: 60 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 60 Comm: kworker/u16:2 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:ff11000009a07b60 EFLAGS: 00010286
RAX: 0000000000002bd3 RBX: ff1100000f4e1aa0 RCX: 1ffffffff0e40ac6
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c
RBP: ff1100000f4e1af0 R08: 0000000000000001 R09: fffffbfff0e395ae
R10: 0000000000000001 R11: 0000000000036001 R12: ff1100000f4e1af0
R13: dead000000000100 R14: ff1100000f4e1af0 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9b2464bd98 CR3: 0000000005286005 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn (kernel/panic.c:748)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:285)
? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
? kfree (mm/slub.c:4613 mm/slub.c:4761)
net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>
Fixes:
|
||
|
|
46841c7053 |
gtp: Use for_each_netdev_rcu() in gtp_genl_dump_pdp().
gtp_newlink() links the gtp device to a list in dev_net(dev).
However, even after the gtp device is moved to another netns,
it stays on the list but should be invisible.
Let's use for_each_netdev_rcu() for netdev traversal in
gtp_genl_dump_pdp().
Note that gtp_dev_list is no longer used under RCU, so list
helpers are converted to the non-RCU variant.
Fixes:
|
||
|
|
644f9108f3 |
udp: Make rehash4 independent in udp_lib_rehash()
As discussed in [0], rehash4 could be missed in udp_lib_rehash() when
udp hash4 changes while hash2 doesn't change. This patch fixes this by
moving rehash4 codes out of rehash2 checking, and then rehash2 and
rehash4 are done separately.
By doing this, we no longer need to call rehash4 explicitly in
udp_lib_hash4(), as the rehash callback in __ip4_datagram_connect takes
it. Thus, now udp_lib_hash4() returns directly if the sk is already
hashed.
Note that uhash4 may fail to work under consecutive connect(<dst
address>) calls because rehash() is not called with every connect(). To
overcome this, connect(<AF_UNSPEC>) needs to be called after the next
connect to a new destination.
[0]
https://lore.kernel.org/all/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715.git.pabeni@redhat.com/
Fixes:
|
||
|
|
1f691a1fc4 |
r8169: remove redundant hwmon support
The temperature sensor is actually part of the integrated PHY and available
also on the standalone versions of the PHY. Therefore hwmon support will
be added to the Realtek PHY driver and can be removed here.
Fixes:
|
||
|
|
9e2bbab94b |
net/ncsi: fix locking in Get MAC Address handling
Obtaining RTNL lock in a response handler is not allowed since it runs
in an atomic softirq context. Postpone setting the MAC address by adding
a dedicated step to the configuration FSM.
Fixes:
|
||
|
|
76201b5979 |
pktgen: Avoid out-of-bounds access in get_imix_entries
Passing a sufficient amount of imix entries leads to invalid access to the
pkt_dev->imix_entries array because of the incorrect boundary check.
UBSAN: array-index-out-of-bounds in net/core/pktgen.c:874:24
index 20 is out of range for type 'imix_pkt [20]'
CPU: 2 PID: 1210 Comm: bash Not tainted 6.10.0-rc1 #121
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl lib/dump_stack.c:117
__ubsan_handle_out_of_bounds lib/ubsan.c:429
get_imix_entries net/core/pktgen.c:874
pktgen_if_write net/core/pktgen.c:1063
pde_write fs/proc/inode.c:334
proc_reg_write fs/proc/inode.c:346
vfs_write fs/read_write.c:593
ksys_write fs/read_write.c:644
do_syscall_64 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:130
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
|
||
|
|
47e55e4b41 |
openvswitch: fix lockup on tx to unregistering netdev with carrier
Commit in a fixes tag attempted to fix the issue in the following
sequence of calls:
do_output
-> ovs_vport_send
-> dev_queue_xmit
-> __dev_queue_xmit
-> netdev_core_pick_tx
-> skb_tx_hash
When device is unregistering, the 'dev->real_num_tx_queues' goes to
zero and the 'while (unlikely(hash >= qcount))' loop inside the
'skb_tx_hash' becomes infinite, locking up the core forever.
But unfortunately, checking just the carrier status is not enough to
fix the issue, because some devices may still be in unregistering
state while reporting carrier status OK.
One example of such device is a net/dummy. It sets carrier ON
on start, but it doesn't implement .ndo_stop to set the carrier off.
And it makes sense, because dummy doesn't really have a carrier.
Therefore, while this device is unregistering, it's still easy to hit
the infinite loop in the skb_tx_hash() from the OVS datapath. There
might be other drivers that do the same, but dummy by itself is
important for the OVS ecosystem, because it is frequently used as a
packet sink for tcpdump while debugging OVS deployments. And when the
issue is hit, the only way to recover is to reboot.
Fix that by also checking if the device is running. The running
state is handled by the net core during unregistering, so it covers
unregistering case better, and we don't really need to send packets
to devices that are not running anyway.
While only checking the running state might be enough, the carrier
check is preserved. The running and the carrier states seem disjoined
throughout the code and different drivers. And other core functions
like __dev_direct_xmit() check both before attempting to transmit
a packet. So, it seems safer to check both flags in OVS as well.
Fixes:
|
||
|
|
e7e441a410 |
net: ravb: Fix max TX frame size for RZ/V2M
When tx_max_frame_size was added to struct ravb_hw_info, no value was
set in ravb_rzv2m_hw_info so the default value of zero was used.
The maximum MTU is set by subtracting from tx_max_frame_size to allow
space for headers and frame checksums. As ndev->max_mtu is unsigned,
this subtraction wraps around leading to a ridiculously large positive
value that is obviously incorrect.
Before tx_max_frame_size was introduced, the maximum MTU was based on
rx_max_frame_size. So, we can restore the correct maximum MTU by copying
the rx_max_frame_size value into tx_max_frame_size for RZ/V2M.
Fixes:
|
||
|
|
eaeea5028f |
net: mana: Cleanup "mana" debugfs dir after cleanup of all children
In mana_driver_exit(), mana_debugfs_root gets cleanup before any of it's
children (which happens later in the pci_unregister_driver()).
Due to this, when mana driver is configured as a module and rmmod is
invoked, following stack gets printed along with failure in rmmod command.
[ 2399.317651] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ 2399.318657] #PF: supervisor write access in kernel mode
[ 2399.319057] #PF: error_code(0x0002) - not-present page
[ 2399.319528] PGD 10eb68067 P4D 0
[ 2399.319914] Oops: Oops: 0002 [#1] SMP NOPTI
[ 2399.320308] CPU: 72 UID: 0 PID: 5815 Comm: rmmod Not tainted 6.13.0-rc5+ #89
[ 2399.320986] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024
[ 2399.321892] RIP: 0010:down_write+0x1a/0x50
[ 2399.322303] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc e8 9d cd ff ff 31 c0 ba 01 00 00 00 <f0> 49 0f b1 14 24 75 17 65 48 8b 05 f6 84 dd 5f 49 89 44 24 08 4c
[ 2399.323669] RSP: 0018:ff53859d6c663a70 EFLAGS: 00010246
[ 2399.324061] RAX: 0000000000000000 RBX: ff1d4eb505060180 RCX: ffffff8100000000
[ 2399.324620] RDX: 0000000000000001 RSI: 0000000000000064 RDI: 0000000000000098
[ 2399.325167] RBP: ff53859d6c663a78 R08: 00000000000009c4 R09: ff1d4eb4fac90000
[ 2399.325681] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000098
[ 2399.326185] R13: ff1d4e42e1a4a0c8 R14: ff1d4eb538ce0000 R15: 0000000000000098
[ 2399.326755] FS: 00007fe729570000(0000) GS:ff1d4eb2b7200000(0000) knlGS:0000000000000000
[ 2399.327269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2399.327690] CR2: 0000000000000098 CR3: 00000001c0584005 CR4: 0000000000373ef0
[ 2399.328166] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2399.328623] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 2399.329055] Call Trace:
[ 2399.329243] <TASK>
[ 2399.329379] ? show_regs+0x69/0x80
[ 2399.329602] ? __die+0x25/0x70
[ 2399.329856] ? page_fault_oops+0x271/0x550
[ 2399.330088] ? psi_group_change+0x217/0x470
[ 2399.330341] ? do_user_addr_fault+0x455/0x7b0
[ 2399.330667] ? finish_task_switch.isra.0+0x91/0x2f0
[ 2399.331004] ? exc_page_fault+0x73/0x160
[ 2399.331275] ? asm_exc_page_fault+0x27/0x30
[ 2399.343324] ? down_write+0x1a/0x50
[ 2399.343631] simple_recursive_removal+0x4d/0x2c0
[ 2399.343977] ? __pfx_remove_one+0x10/0x10
[ 2399.344251] debugfs_remove+0x45/0x70
[ 2399.344511] mana_destroy_rxq+0x44/0x400 [mana]
[ 2399.344845] mana_destroy_vport+0x54/0x1c0 [mana]
[ 2399.345229] mana_detach+0x2f1/0x4e0 [mana]
[ 2399.345466] ? ida_free+0x150/0x160
[ 2399.345718] ? __cond_resched+0x1a/0x50
[ 2399.345987] mana_remove+0xf4/0x1a0 [mana]
[ 2399.346243] mana_gd_remove+0x25/0x80 [mana]
[ 2399.346605] pci_device_remove+0x41/0xb0
[ 2399.346878] device_remove+0x46/0x70
[ 2399.347150] device_release_driver_internal+0x1e3/0x250
[ 2399.347831] ? klist_remove+0x81/0xe0
[ 2399.348377] driver_detach+0x4b/0xa0
[ 2399.348906] bus_remove_driver+0x83/0x100
[ 2399.349435] driver_unregister+0x31/0x60
[ 2399.349919] pci_unregister_driver+0x40/0x90
[ 2399.350492] mana_driver_exit+0x1c/0xb50 [mana]
[ 2399.351102] __do_sys_delete_module.constprop.0+0x184/0x320
[ 2399.351664] ? __fput+0x1a9/0x2d0
[ 2399.352200] __x64_sys_delete_module+0x12/0x20
[ 2399.352760] x64_sys_call+0x1e66/0x2140
[ 2399.353316] do_syscall_64+0x79/0x150
[ 2399.353813] ? syscall_exit_to_user_mode+0x49/0x230
[ 2399.354346] ? do_syscall_64+0x85/0x150
[ 2399.354816] ? irqentry_exit+0x1d/0x30
[ 2399.355287] ? exc_page_fault+0x7f/0x160
[ 2399.355756] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 2399.356302] RIP: 0033:0x7fe728d26aeb
[ 2399.356776] Code: 73 01 c3 48 8b 0d 45 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 33 0f 00 f7 d8 64 89 01 48
[ 2399.358372] RSP: 002b:00007ffff954d6f8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 2399.359066] RAX: ffffffffffffffda RBX: 00005609156cc760 RCX: 00007fe728d26aeb
[ 2399.359779] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005609156cc7c8
[ 2399.360535] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2399.361261] R10: 00007fe728dbeac0 R11: 0000000000000206 R12: 00007ffff954d950
[ 2399.361952] R13: 00005609156cc2a0 R14: 00007ffff954ee5f R15: 00005609156cc760
[ 2399.362688] </TASK>
Fixes:
|
||
|
|
5ef44b3cb4 |
xsk: Bring back busy polling support
Commit |
||
|
|
f0aa6a37a3 |
eth: bnxt: always recalculate features after XDP clearing, fix null-deref
Recalculate features when XDP is detached. Before: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 | grep gro rx-gro-hw: off [requested on] After: # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp # ip li set dev eth0 xdp off # ethtool -k eth0 | grep gro rx-gro-hw: on The fact that HW-GRO doesn't get re-enabled automatically is just a minor annoyance. The real issue is that the features will randomly come back during another reconfiguration which just happens to invoke netdev_update_features(). The driver doesn't handle reconfiguring two things at a time very robustly. Starting with commit |
||
|
|
b3af60928a |
bpf: Fix bpf_sk_select_reuseport() memory leak
As pointed out in the original comment, lookup in sockmap can return a TCP
ESTABLISHED socket. Such TCP socket may have had SO_ATTACH_REUSEPORT_EBPF
set before it was ESTABLISHED. In other words, a non-NULL sk_reuseport_cb
does not imply a non-refcounted socket.
Drop sk's reference in both error paths.
unreferenced object 0xffff888101911800 (size 2048):
comm "test_progs", pid 44109, jiffies 4297131437
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
80 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 9336483b):
__kmalloc_noprof+0x3bf/0x560
__reuseport_alloc+0x1d/0x40
reuseport_alloc+0xca/0x150
reuseport_attach_prog+0x87/0x140
sk_reuseport_attach_bpf+0xc8/0x100
sk_setsockopt+0x1181/0x1990
do_sock_setsockopt+0x12b/0x160
__sys_setsockopt+0x7b/0xc0
__x64_sys_setsockopt+0x1b/0x30
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes:
|
||
|
|
8c7a6efc01 |
ipv4: route: fix drop reason being overridden in ip_route_input_slow
When jumping to 'martian_destination' a drop reason is always set but
that label falls-through the 'e_nobufs' one, overriding the value.
The behavior was introduced by the mentioned commit. The logic went
from,
goto martian_destination;
...
martian_destination:
...
e_inval:
err = -EINVAL;
goto out;
e_nobufs:
err = -ENOBUFS;
goto out;
to,
reason = ...;
goto martian_destination;
...
martian_destination:
...
e_nobufs:
reason = SKB_DROP_REASON_NOMEM;
goto out;
A 'goto out' is clearly missing now after 'martian_destination' to avoid
overriding the drop reason.
Fixes:
|
||
|
|
03d120f27d |
net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()
CPSW ALE has 75-bit ALE entries stored across three 32-bit words.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions support
ALE field entries spanning up to two words at the most.
The cpsw_ale_get_field() and cpsw_ale_set_field() functions work as
expected when ALE field spanned across word1 and word2, but fails when
ALE field spanned across word2 and word3.
For example, while reading the ALE field spanned across word2 and word3
(i.e. bits 62 to 64), the word3 data shifted to an incorrect position
due to the index becoming zero while flipping.
The same issue occurred when setting an ALE entry.
This issue has not been seen in practice but will be an issue in the future
if the driver supports accessing ALE fields spanning word2 and word3
Fix the methods to handle getting/setting fields spanning up to two words.
Fixes:
|
||
|
|
c77cd47cee |
Including fixes from netfilter, Bluetooth and WPAN.
No outstanding fixes / investigations at this time.
Current release - new code bugs:
- eth: fbnic: revert HWMON support, it doesn't work at all
and revert is similar size as the fixes
Previous releases - regressions:
- tcp: allow a connection when sk_max_ack_backlog is zero
- tls: fix tls_sw_sendmsg error handling
Previous releases - always broken:
- netdev netlink family:
- prevent accessing NAPI instances from another namespace
- don't dump Tx and uninitialized NAPIs
- net: sysctl: avoid using current->nsproxy, fix null-deref if task
is exiting and stick to opener's netns
- sched: sch_cake: add bounds checks to host bulk flow fairness counts
Misc:
- annual cleanup of inactive maintainers
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmeAErUACgkQMUZtbf5S
IrugZBAAs19Ez9VcP9LvqpOxcgHv7xJB4PMsQW8Hu3EWSjwwzrFpiJ4EaLCqamE4
gAAtkwTcHgDZDhUHL5IxxpsmkB9BVPLh8JPIro4s7orBSbV6y4DDDUU4RLWnCB7v
z6x1t2WqRbGc4L12P5tVcyxfBnYatlNmOOLcc7PpXBacJrRiMYJ2pvYhYcuCRLLc
IHmS5viMTZF4v2qVhU3p4/3ZzUH+9H4v7RnZU0j4Jd8sKSIcZQFU3aH9bT9ZYDxd
DUWcIeUNMcQ6K2iFJGK4Cj9PQximE5fju8YT9JYyZ2hu+cJ8tK8bjoJx/7Y7VUUT
5iuwIH5SVjMgU1lXfKfp279Q4DZkxrqxasQkHlk8Ga/WI3GNbp0DJmkW8Ck2v6hJ
ADRB0/Mq1V6RU2YBL7VRPzEiSpAnulXiiFTaVoRmIvZTVNU/QScFzROIvj65WZAQ
D3UK8dxMNltdQ9nKdgVHs3KoGvbUV1Om4jkhBFl1VPAJxvpGAZk4BI6D63LmeCws
oxvW84f2OoAexdIwOlRyllQ21Xvj9RATLBOX4MN3WTOxSQ62UrSbtoOm2CTbpfOW
uUmV645l0nAB3rEyYVi2Tr8ArBRwi09Phjkcss/PJ9RVCJfm6aL/LBrmsn0HngJW
+KjRANUyMiD21+uc95SGduKE6L6vUtl/qncNmnMOt+M4Z9gcTZ4=
=ABI6
-----END PGP SIGNATURE-----
Merge tag 'net-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, Bluetooth and WPAN.
No outstanding fixes / investigations at this time.
Current release - new code bugs:
- eth: fbnic: revert HWMON support, it doesn't work at all and revert
is similar size as the fixes
Previous releases - regressions:
- tcp: allow a connection when sk_max_ack_backlog is zero
- tls: fix tls_sw_sendmsg error handling
Previous releases - always broken:
- netdev netlink family:
- prevent accessing NAPI instances from another namespace
- don't dump Tx and uninitialized NAPIs
- net: sysctl: avoid using current->nsproxy, fix null-deref if task
is exiting and stick to opener's netns
- sched: sch_cake: add bounds checks to host bulk flow fairness
counts
Misc:
- annual cleanup of inactive maintainers"
* tag 'net-6.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
sctp: sysctl: udp_port: avoid using current->nsproxy
sctp: sysctl: auth_enable: avoid using current->nsproxy
sctp: sysctl: rto_min/max: avoid using current->nsproxy
sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
mptcp: sysctl: sched: avoid using current->nsproxy
mptcp: sysctl: avail sched: remove write access
MAINTAINERS: remove Lars Povlsen from Microchip Sparx5 SoC
MAINTAINERS: remove Noam Dagan from AMAZON ETHERNET
MAINTAINERS: remove Ying Xue from TIPC
MAINTAINERS: remove Mark Lee from MediaTek Ethernet
MAINTAINERS: mark stmmac ethernet as an Orphan
MAINTAINERS: remove Andy Gospodarek from bonding
MAINTAINERS: update maintainers for Microchip LAN78xx
MAINTAINERS: mark Synopsys DW XPCS as Orphan
net/mlx5: Fix variable not being completed when function returns
rtase: Fix a check for error in rtase_alloc_msix()
net: stmmac: dwmac-tegra: Read iommu stream id from device tree
...
|
||
|
|
643e2e259c |
for-6.13-rc6-tag
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmd/7dYACgkQxWXV+ddt
WDuX7Q//UkrNtVh7UEiyNyujLjjvczfMXhpD1fAdVU0zMon6ux3RQ3JSs3xvAGrb
jFFa9c9+Db8/kWzdWp5n1u9Q/+sy4XBaeKGuzPRLPPGT1yXfKEa4mrm1sCrWRJoS
c8b07Kfuepldcim80x8WSa2qhr5gmDmSZBgvjKt63ppp5/jaNKCZg+d3BhwqhHbI
XA9JjIk9j0ZsAYauYflQTwgUpkyvXV1a9YyeKv4U6mYA1r+rXl2aolcndNkS1U/D
dDGuiDpOjKtIUecRi4YbOkt2zvwREDdQCbRV/QLsZajHxqeHV5QH0TBI/URikx2z
1shwYMzLfLtQIW0+PhHCGKiftMIb4NliyMUxxviPdN78nCFmocrR/ZkPx+a5M9Io
d7oqwS/8U3pFGeB4bAey8WvMzQI5BtCCYJY+3HreNTDkiubqcRtTCtJ9dNDTAMFH
FMZ6DA8wTsqSA2e9Q8OwKNjvMCLAKevXn/4wiJi5b75Fiu5ZB/imTfJ+geEMUZCR
3uq9oybFCKti7lestM0z06K19AKtmPWLoq5YJ1Hg69DsafS2aR3CBeYOi7uQ+56D
7uwAFjVrGPrxOgGkCohYpPLCUikJ0y3Nl/k5fnybsnLPWr0cenLroUeP7Rao4fFU
8hLzMSv3ImL+Io0RjH0XBAM8YLN+xO3CLYCv6D8d42AlQTgAIVw=
=QYC1
-----END PGP SIGNATURE-----
Merge tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes.
Besides the one-liners in Btrfs there's fix to the io_uring and
encoded read integration (added in this development cycle). The update
to io_uring provides more space for the ongoing command that is then
used in Btrfs to handle some cases.
- io_uring and encoded read:
- provide stable storage for io_uring command data
- make a copy of encoded read ioctl call, reuse that in case the
call would block and will be called again
- properly initialize zlib context for hardware compression on s390
- fix max extent size calculation on filesystems with non-zoned
devices
- fix crash in scrub on crafted image due to invalid extent tree"
* tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path
btrfs: zoned: calculate max_extent_size properly on non-zoned setup
btrfs: avoid NULL pointer dereference if no valid extent tree
btrfs: don't read from userspace twice in btrfs_uring_encoded_read()
io_uring: add io_uring_cmd_get_async_data helper
io_uring/cmd: add per-op data to struct io_uring_cmd_data
io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
|
||
|
|
b5cf67a8f7 |
netfilter pull request 25-01-09
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmd/wXMACgkQ1w0aZmrP KyHUPRAAolGTWs541hsCAqrtGN+hk4jzm9WKBAVr7A/52TVqxIT4pfzVoeFMkCyI Q+1US3W/PuuZOUcdE1Nh6fGu+FycTVY2HO/hsWTzNt617JmtYYCANkNYnfKBcMrQ o7yCwxCI8KNBdsG/yBByrnHiuxZC1gI0d2SzCLN/2phtW/8nMevHcMZfhUzl6D0s eIbzgNQE4KB2+ntGGgCqfNf2hTJrhQDvIDUKDho5hC+FQOo91tNt1zED+u/N3S5Y ak16JYGBju9JLuMssDqgYLuupKGOzxmUFQokqMEMCkydoPlX8R3fxRSVAXv7DVUq Y+W6Lo+5zj8BduDsCTEIkk3piv7jMcZAc7E8BkXLcR/Wj/XUyP4pKHhX+pDGZjA5 rHsemh17Xozj431XLhxxDhJ8p7n4xx8d9auijoBoeWREpacLIQvBUW4L7HEwfq+p 0wg8a0F1izwJZWQS4+zQis8EcDo3da/0idFSMP0JFVigPVGefHnJkj9iO0l+VfoL PlDY5Qqty+uQbykvV7YFVRt3iD4nttux2hP+r7k/KCcrD2BmJbkgpC9tTej9noiK OjStQWbvyLKFE9bAxvxOYa7NaE3TdOkQFTgJYVny5r0oQU7aClv092WM3ReQSlvr tHN3+ze2hiehKwYKmRvHUSbJPNrC6ISh4qpNBoZKJ2TgGkKsGys= =PG5v -----END PGP SIGNATURE----- Merge tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix imbalance between flowtable BIND and UNBIND calls to configure hardware offload, this fixes a possible kmemleak. 2) Clamp maximum conntrack hashtable size to INT_MAX to fix a possible WARN_ON_ONCE splat coming from kvmalloc_array(), only possible from init_netns. * tag 'nf-25-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: conntrack: clamp maximum hashtable size to INT_MAX netfilter: nf_tables: imbalance in flowtable binding ==================== Link: https://patch.msgid.link/20250109123532.41768-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
2664bc9c7d |
Merge branch 'net-sysctl-avoid-using-current-nsproxy'
Matthieu Baerts says: ==================== net: sysctl: avoid using current->nsproxy As pointed out by Al Viro and Eric Dumazet in [1], using the 'net' structure via 'current' is not recommended for different reasons: - Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns as it is usually done. This could cause unexpected issues when other operations are done on the wrong netns. - current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2). The 'net' or 'pernet' structure can be obtained from the table->data using container_of(). Note that table->data could also be used directly in more places, but that would increase the size of this fix to replace all accesses via 'net'. Probably best to avoid that for fixes. Patches 2-9 remove access of net via current->nsproxy in sysfs handlers in MPTCP, SCTP and RDS. There are multiple patches doing almost the same thing, but the reason is to ease the backports. Patch 1 is not directly linked to this, but it is a small fix for MPTCP available_schedulers sysctl knob to explicitly mark it as read-only. Please note that this series does not address Al's comment [2]. In SCTP, some sysctl knobs set other sysfs-exposed variables for the min/max: two processes could then write two linked values at the same time, resulting in new values being outside the new boundaries. It would be great if SCTP developers can look at this problem. Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Link: https://lore.kernel.org/20250105211158.GL1977892@ZenIV [2] ==================== Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-0-5df34b2083e8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
7f5611cbc4 |
rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using
container_of(), then the 'net' one can be retrieved from the listen
socket (if available).
Fixes:
|
||
|
|
6259d2484d |
sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.probe_interval' is
used.
Fixes:
|
||
|
|
c10377bbc1 |
sctp: sysctl: udp_port: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
Fixes:
|
||
|
|
15649fd541 |
sctp: sysctl: auth_enable: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, but that would
increase the size of this fix, while 'sctp.ctl_sock' still needs to be
retrieved from 'net' structure.
Fixes:
|
||
|
|
9fc17b76fc |
sctp: sysctl: rto_min/max: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.
Fixes:
|
||
|
|
ea62dd1383 |
sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy
As mentioned in a previous commit of this series, using the 'net'
structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using
container_of().
Note that table->data could also be used directly, as this is the only
member needed from the 'net' structure, but that would increase the size
of this fix, to use '*data' everywhere 'net->sctp.sctp_hmac_alg' is
used.
Fixes:
|
||
|
|
92cf7a51bd |
mptcp: sysctl: blackhole timeout: avoid using current->nsproxy
As mentioned in the previous commit, using the 'net' structure via
'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only
from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops'
(null-ptr-deref), e.g. when the current task is exiting, as spotted by
syzbot [1] using acct(2).
The 'pernet' structure can be obtained from the table->data using
container_of().
Fixes:
|
||
|
|
d38e26e362 |
mptcp: sysctl: sched: avoid using current->nsproxy
Using the 'net' structure via 'current' is not recommended for different
reasons.
First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table->data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.
Another reason is that access to current->nsproxy->netns can oops if
attempted when current->nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
__kernel_write_iter+0x318/0xa80 fs/read_write.c:612
__kernel_write+0xf6/0x140 fs/read_write.c:632
do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
acct_pin_kill+0x2d/0x100 kernel/acct.c:192
pin_kill+0x194/0x7c0 fs/fs_pin.c:44
mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
task_work_run+0x14e/0x250 kernel/task_work.c:239
exit_task_work include/linux/task_work.h:43 [inline]
do_exit+0xad8/0x2d70 kernel/exit.c:938
do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
get_signal+0x2576/0x2610 kernel/signal.c:3017
arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fee3cb87a6a
Code: Unable to access opcode bytes at 0x7fee3cb87a40.
RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess), 1 bytes skipped:
0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1)
5: 0f 85 fe 02 00 00 jne 0x309
b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12
12: 00
13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
1a: fc ff df
1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi
22: 48 89 fa mov %rdi,%rdx
25: 48 c1 ea 03 shr $0x3,%rdx
* 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
2d: 0f 85 cc 02 00 00 jne 0x2ff
33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15
38: 48 rex.W
39: 8d .byte 0x8d
3a: 84 24 c8 test %ah,(%rax,%rcx,8)
Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table->data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.
Fixes:
|
||
|
|
771ec78dc8 |
mptcp: sysctl: avail sched: remove write access
'net.mptcp.available_schedulers' sysctl knob is there to list available
schedulers, not to modify this list.
There are then no reasons to give write access to it.
Nothing would have been written anyway, but no errors would have been
returned, which is unexpected.
Fixes:
|
||
|
|
92afd9f3bc |
Merge branch 'maintainers-spring-2025-cleanup-of-networking-maintainers'
Jakub Kicinski says: ==================== MAINTAINERS: spring 2025 cleanup of networking maintainers Annual cleanup of inactive maintainers. To identify inactive maintainers we use Jon Corbet's maintainer analysis script from gitdm, and some manual scanning of lore. v1: https://lore.kernel.org/20250106165404.1832481-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250108155242.2575530-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
d9e03c6ffc |
MAINTAINERS: remove Lars Povlsen from Microchip Sparx5 SoC
We have not seen emails or tags from Lars in almost 4 years.
Steen and Daniel are pretty active, but the review coverage
isn't stellar (35% of changes go in without a review tag).
Subsystem ARM/Microchip Sparx5 SoC support
Changes 28 / 79 (35%)
Last activity: 2024-11-24
Lars Povlsen <lars.povlsen@microchip.com>:
Steen Hegelund <Steen.Hegelund@microchip.com>:
Tags
|
||
|
|
d95e2cc737 |
MAINTAINERS: remove Noam Dagan from AMAZON ETHERNET
Noam Dagan was added to ENA reviewers in 2021, we have not seen a single email from this person to any list, ever (according to lore). Git history mentions the name in 2 SoB tags from 2020. Acked-by: Arthur Kiyanovski <akiyano@amazon.com> Link: https://patch.msgid.link/20250108155242.2575530-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
d4782fbab1 |
MAINTAINERS: remove Ying Xue from TIPC
There is a steady stream of fixes for TIPC, even tho the development
has slowed down a lot. Over last 2 years we have merged almost 70
TIPC patches, but we haven't heard from Ying Xue once:
Subsystem TIPC NETWORK LAYER
Changes 42 / 69 (60%)
Last activity: 2023-10-04
Jon Maloy <jmaloy@redhat.com>:
Tags
|
||
|
|
9d7b1191d0 |
MAINTAINERS: remove Mark Lee from MediaTek Ethernet
The mailing lists have seen no email from Mark Lee in the last 4 years.
gitdm missingmaints says:
Subsystem MEDIATEK ETHERNET DRIVER
Changes 103 / 400 (25%)
Last activity: 2024-12-19
Felix Fietkau <nbd@nbd.name>:
Author
|
||
|
|
03868822c5 |
MAINTAINERS: mark stmmac ethernet as an Orphan
I tried a couple of things to reinvigorate the stmmac maintainers
over the last few years but with little effect. The maintainers
are not active, let the MAINTAINERS file reflect reality.
The Synopsys IP this driver supports is very popular we need
a solid maintainer to deal with the complexity of the driver.
gitdm missingmaints says:
Subsystem STMMAC ETHERNET DRIVER
Changes 344 / 978 (35%)
Last activity: 2020-05-01
Alexandre Torgue <alexandre.torgue@foss.st.com>:
Tags
|
||
|
|
e049fb86d3 |
MAINTAINERS: remove Andy Gospodarek from bonding
Andy does not participate much in bonding reviews, unfortunately.
Move him to CREDITS.
gitdm missingmaint says:
Subsystem BONDING DRIVER
Changes 149 / 336 (44%)
Last activity: 2024-09-05
Jay Vosburgh <jv@jvosburgh.net>:
Tags
|
||
|
|
b506668613 |
MAINTAINERS: update maintainers for Microchip LAN78xx
Woojung Huh seems to have only replied to the list 35 times
in the last 5 years, and didn't provide any reviews in 3 years.
The LAN78XX driver has seen quite a bit of activity lately.
gitdm missingmaints says:
Subsystem USB LAN78XX ETHERNET DRIVER
Changes 35 / 91 (38%)
(No activity)
Top reviewers:
[23]: andrew@lunn.ch
[3]: horms@kernel.org
[2]: mateusz.polchlopek@intel.com
INACTIVE MAINTAINER Woojung Huh <woojung.huh@microchip.com>
Move Woojung to CREDITS and add new maintainers who are more
likely to review LAN78xx patches.
Acked-by: Woojung Huh <woojung.huh@microchip.com>
Acked-by: Rengarajan Sundararajan <rengarajan.s@microchip.com>
Link: https://patch.msgid.link/20250108155242.2575530-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
d58200966e |
MAINTAINERS: mark Synopsys DW XPCS as Orphan
There's not much review support from Jose, there is a sharp
drop in his participation around 4 years ago.
The DW XPCS IP is very popular and the driver requires active
maintenance.
gitdm missingmaints says:
Subsystem SYNOPSYS DESIGNWARE ETHERNET XPCS DRIVER
Changes 33 / 94 (35%)
(No activity)
Top reviewers:
[16]: andrew@lunn.ch
[12]: vladimir.oltean@nxp.com
[2]: f.fainelli@gmail.com
INACTIVE MAINTAINER Jose Abreu <Jose.Abreu@synopsys.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250108155242.2575530-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
||
|
|
0e2909c6be |
net/mlx5: Fix variable not being completed when function returns
When cmd_alloc_index(), fails cmd_work_handler() needs
to complete ent->slotted before returning early.
Otherwise the task which issued the command may hang:
mlx5_core 0000:01:00.0: cmd_work_handler:877:(pid 3880418): failed to allocate command entry
INFO: task kworker/13:2:4055883 blocked for more than 120 seconds.
Not tainted 4.19.90-25.44.v2101.ky10.aarch64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/13:2 D 0 4055883 2 0x00000228
Workqueue: events mlx5e_tx_dim_work [mlx5_core]
Call trace:
__switch_to+0xe8/0x150
__schedule+0x2a8/0x9b8
schedule+0x2c/0x88
schedule_timeout+0x204/0x478
wait_for_common+0x154/0x250
wait_for_completion+0x28/0x38
cmd_exec+0x7a0/0xa00 [mlx5_core]
mlx5_cmd_exec+0x54/0x80 [mlx5_core]
mlx5_core_modify_cq+0x6c/0x80 [mlx5_core]
mlx5_core_modify_cq_moderation+0xa0/0xb8 [mlx5_core]
mlx5e_tx_dim_work+0x54/0x68 [mlx5_core]
process_one_work+0x1b0/0x448
worker_thread+0x54/0x468
kthread+0x134/0x138
ret_from_fork+0x10/0x18
Fixes:
|
||
|
|
2055272e3a |
rtase: Fix a check for error in rtase_alloc_msix()
The pci_irq_vector() function never returns zero. It returns negative
error codes or a positive non-zero IRQ number. Fix the error checking to
test for negatives.
Fixes:
|
||
|
|
426046e2d6 |
net: stmmac: dwmac-tegra: Read iommu stream id from device tree
Nvidia's Tegra MGBE controllers require the IOMMU "Stream ID" (SID) to be
written to the MGBE_WRAP_AXI_ASID0_CTRL register.
The current driver is hard coded to use MGBE0's SID for all controllers.
This causes softirq time outs and kernel panics when using controllers
other than MGBE0.
Example dmesg errors when an ethernet cable is connected to MGBE1:
[ 116.133290] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 121.851283] tegra-mgbe 6910000.ethernet eth1: NETDEV WATCHDOG: CPU: 5: transmit queue 0 timed out 5690 ms
[ 121.851782] tegra-mgbe 6910000.ethernet eth1: Reset adapter.
[ 121.892464] tegra-mgbe 6910000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 121.905920] tegra-mgbe 6910000.ethernet eth1: PHY [stmmac-1:00] driver [Aquantia AQR113] (irq=171)
[ 121.907356] tegra-mgbe 6910000.ethernet eth1: Enabling Safety Features
[ 121.907578] tegra-mgbe 6910000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported
[ 121.908399] tegra-mgbe 6910000.ethernet eth1: registered PTP clock
[ 121.908582] tegra-mgbe 6910000.ethernet eth1: configuring for phy/10gbase-r link mode
[ 125.961292] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 181.921198] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 181.921404] rcu: 7-....: (1 GPs behind) idle=540c/1/0x4000000000000002 softirq=1748/1749 fqs=2337
[ 181.921684] rcu: (detected by 4, t=6002 jiffies, g=1357, q=1254 ncpus=8)
[ 181.921878] Sending NMI from CPU 4 to CPUs 7:
[ 181.921886] NMI backtrace for cpu 7
[ 181.922131] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6
[ 181.922390] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024
[ 181.922658] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 181.922847] pc : handle_softirqs+0x98/0x368
[ 181.922978] lr : __do_softirq+0x18/0x20
[ 181.923095] sp : ffff80008003bf50
[ 181.923189] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000
[ 181.923379] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0
[ 181.924486] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70
[ 181.925568] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000
[ 181.926655] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000
[ 181.931455] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d
[ 181.938628] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160
[ 181.945804] x8 : ffff8000827b3160 x7 : f9157b241586f343 x6 : eeb6502a01c81c74
[ 181.953068] x5 : a4acfcdd2e8096bb x4 : ffffce78ea277340 x3 : 00000000ffffd1e1
[ 181.960329] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000
[ 181.967591] Call trace:
[ 181.970043] handle_softirqs+0x98/0x368 (P)
[ 181.974240] __do_softirq+0x18/0x20
[ 181.977743] ____do_softirq+0x14/0x28
[ 181.981415] call_on_irq_stack+0x24/0x30
[ 181.985180] do_softirq_own_stack+0x20/0x30
[ 181.989379] __irq_exit_rcu+0x114/0x140
[ 181.993142] irq_exit_rcu+0x14/0x28
[ 181.996816] el1_interrupt+0x44/0xb8
[ 182.000316] el1h_64_irq_handler+0x14/0x20
[ 182.004343] el1h_64_irq+0x80/0x88
[ 182.007755] cpuidle_enter_state+0xc4/0x4a8 (P)
[ 182.012305] cpuidle_enter+0x3c/0x58
[ 182.015980] cpuidle_idle_call+0x128/0x1c0
[ 182.020005] do_idle+0xe0/0xf0
[ 182.023155] cpu_startup_entry+0x3c/0x48
[ 182.026917] secondary_start_kernel+0xdc/0x120
[ 182.031379] __secondary_switched+0x74/0x78
[ 212.971162] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6103 jiffies s: 417 root: 0x80/.
[ 212.985935] rcu: blocking rcu_node structures (internal RCU debug):
[ 212.992758] Sending NMI from CPU 0 to CPUs 7:
[ 212.998539] NMI backtrace for cpu 7
[ 213.004304] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6
[ 213.016116] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024
[ 213.030817] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 213.040528] pc : handle_softirqs+0x98/0x368
[ 213.046563] lr : __do_softirq+0x18/0x20
[ 213.051293] sp : ffff80008003bf50
[ 213.055839] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000
[ 213.067304] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0
[ 213.077014] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70
[ 213.087339] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000
[ 213.097313] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000
[ 213.107201] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d
[ 213.116651] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160
[ 213.127500] x8 : ffff8000827b3160 x7 : 0a37b344852820af x6 : 3f049caedd1ff608
[ 213.138002] x5 : cff7cfdbfaf31291 x4 : ffffce78ea277340 x3 : 00000000ffffde04
[ 213.150428] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000
[ 213.162063] Call trace:
[ 213.165494] handle_softirqs+0x98/0x368 (P)
[ 213.171256] __do_softirq+0x18/0x20
[ 213.177291] ____do_softirq+0x14/0x28
[ 213.182017] call_on_irq_stack+0x24/0x30
[ 213.186565] do_softirq_own_stack+0x20/0x30
[ 213.191815] __irq_exit_rcu+0x114/0x140
[ 213.196891] irq_exit_rcu+0x14/0x28
[ 213.202401] el1_interrupt+0x44/0xb8
[ 213.207741] el1h_64_irq_handler+0x14/0x20
[ 213.213519] el1h_64_irq+0x80/0x88
[ 213.217541] cpuidle_enter_state+0xc4/0x4a8 (P)
[ 213.224364] cpuidle_enter+0x3c/0x58
[ 213.228653] cpuidle_idle_call+0x128/0x1c0
[ 213.233993] do_idle+0xe0/0xf0
[ 213.237928] cpu_startup_entry+0x3c/0x48
[ 213.243791] secondary_start_kernel+0xdc/0x120
[ 213.249830] __secondary_switched+0x74/0x78
This bug has existed since the dwmac-tegra driver was added in Dec 2022
(See Fixes tag below for commit hash).
The Tegra234 SOC has 4 MGBE controllers, however Nvidia's Developer Kit
only uses MGBE0 which is why the bug was not found previously. Connect Tech
has many products that use 2 (or more) MGBE controllers.
The solution is to read the controller's SID from the existing "iommus"
device tree property. The 2nd field of the "iommus" device tree property
is the controller's SID.
Device tree snippet from tegra234.dtsi showing MGBE1's "iommus" property:
smmu_niso0: iommu@12000000 {
compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500";
...
}
/* MGBE1 */
ethernet@6900000 {
compatible = "nvidia,tegra234-mgbe";
...
iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
...
}
Nvidia's arm-smmu driver reads the "iommus" property and stores the SID in
the MGBE device's "fwspec" struct. The dwmac-tegra driver can access the
SID using the tegra_dev_iommu_get_stream_id() helper function found in
linux/iommu.h.
Calling tegra_dev_iommu_get_stream_id() should not fail unless the "iommus"
property is removed from the device tree or the IOMMU is disabled.
While the Tegra234 SOC technically supports bypassing the IOMMU, it is not
supported by the current firmware, has not been tested and not recommended.
More detailed discussion with Thierry Reding from Nvidia linked below.
Fixes:
|
||
|
|
737d4d91d3 |
sched: sch_cake: add bounds checks to host bulk flow fairness counts
Even though we fixed a logic error in the commit cited below, syzbot
still managed to trigger an underflow of the per-host bulk flow
counters, leading to an out of bounds memory access.
To avoid any such logic errors causing out of bounds memory accesses,
this commit factors out all accesses to the per-host bulk flow counters
to a series of helpers that perform bounds-checking before any
increments and decrements. This also has the benefit of improving
readability by moving the conditional checks for the flow mode into
these helpers, instead of having them spread out throughout the
code (which was the cause of the original logic error).
As part of this change, the flow quantum calculation is consolidated
into a helper function, which means that the dithering applied to the
ost load scaling is now applied both in the DRR rotation and when a
sparse flow's quantum is first initiated. The only user-visible effect
of this is that the maximum packet size that can be sent while a flow
stays sparse will now vary with +/- one byte in some cases. This should
not make a noticeable difference in practice, and thus it's not worth
complicating the code to preserve the old behaviour.
Fixes:
|
||
|
|
b541ba7d1f |
netfilter: conntrack: clamp maximum hashtable size to INT_MAX
Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when resizing hashtable because __GFP_NOWARN is unset. See: |