linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-10 15:42:19 +02:00

Author	SHA1	Message	Date
Håkon Bugge	236f718ac8	net/rds: Optimize rds_ib_laddr_check rds_ib_laddr_check() creates a CM_ID and attempts to bind the address in question to it. This in order to qualify the allegedly local address as a usable IB/RoCE address. In the field, ExaWatcher runs rds-ping to all ports in the fabric from all local ports. This using all active ToS'es. In a full rack system, we have 14 cell servers and eight db servers. Typically, 6 ToS'es are used. This implies 528 rds-ping invocations per ExaWatcher's "RDSinfo" interval. Adding to this, each rds-ping invocation creates eight sockets and binds the local address to them: socket(AF_RDS, SOCK_SEQPACKET, 0) = 3 bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 4 bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 5 bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 6 bind(6, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 7 bind(7, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 8 bind(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 9 bind(9, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 socket(AF_RDS, SOCK_SEQPACKET, 0) = 10 bind(10, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.36.2")}, 16) = 0 So, at every interval ExaWatcher executes rds-ping's, 4224 CM_IDs are allocated, considering this full-rack system. After the a CM_ID has been allocated, rdma_bind_addr() is called, with the port number being zero. This implies that the CMA will attempt to search for an un-used ephemeral port. Simplified, the algorithm is to start at a random position in the available port space, and then if needed, iterate until an un-used port is found. The book-keeping of used ports uses the idr system, which again uses slab to allocate new struct idr_layer's. The size is 2092 bytes and slab tries to reduce the wasted space. Hence, it chooses an order:3 allocation, for which 15 idr_layer structs will fit and only 1388 bytes are wasted per the 32KiB order:3 chunk. Although this order:3 allocation seems like a good space/speed trade-off, it does not resonate well with how it used by the CMA. The combination of the randomized starting point in the port space (which has close to zero spatial locality) and the close proximity in time of the 4224 invocations of the rds-ping's, creates a memory hog for order:3 allocations. These costly allocations may need reclaims and/or compaction. At worst, they may fail and produce a stack trace such as (from uek4): [<ffffffff811a72d5>] __inc_zone_page_state+0x35/0x40 [<ffffffff811c2e97>] page_add_file_rmap+0x57/0x60 [<ffffffffa37ca1df>] remove_migration_pte+0x3f/0x3c0 [ksplice_6cn872bt_vmlinux_new] [<ffffffff811c3de8>] rmap_walk+0xd8/0x340 [<ffffffff811e8860>] remove_migration_ptes+0x40/0x50 [<ffffffff811ea83c>] migrate_pages+0x3ec/0x890 [<ffffffff811afa0d>] compact_zone+0x32d/0x9a0 [<ffffffff811b00ed>] compact_zone_order+0x6d/0x90 [<ffffffff811b03b2>] try_to_compact_pages+0x102/0x270 [<ffffffff81190e56>] __alloc_pages_direct_compact+0x46/0x100 [<ffffffff8119165b>] __alloc_pages_nodemask+0x74b/0xaa0 [<ffffffff811d8411>] alloc_pages_current+0x91/0x110 [<ffffffff811e3b0b>] new_slab+0x38b/0x480 [<ffffffffa41323c7>] __slab_alloc+0x3b7/0x4a0 [ksplice_s0dk66a8_vmlinux_new] [<ffffffff811e42ab>] kmem_cache_alloc+0x1fb/0x250 [<ffffffff8131fdd6>] idr_layer_alloc+0x36/0x90 [<ffffffff8132029c>] idr_get_empty_slot+0x28c/0x3d0 [<ffffffff813204ad>] idr_alloc+0x4d/0xf0 [<ffffffffa051727d>] cma_alloc_port+0x4d/0xa0 [rdma_cm] [<ffffffffa0517cbe>] rdma_bind_addr+0x2ae/0x5b0 [rdma_cm] [<ffffffffa09d8083>] rds_ib_laddr_check+0x83/0x2c0 [ksplice_6l2xst5i_rds_rdma_new] [<ffffffffa05f892b>] rds_trans_get_preferred+0x5b/0xa0 [rds] [<ffffffffa05f09f2>] rds_bind+0x212/0x280 [rds] [<ffffffff815b4016>] SYSC_bind+0xe6/0x120 [<ffffffff815b4d3e>] SyS_bind+0xe/0x10 [<ffffffff816b031a>] system_call_fastpath+0x18/0xd4 To avoid these excessive calls to rdma_bind_addr(), we optimize rds_ib_laddr_check() by simply checking if the address in question has been used before. The rds_rdma module keeps track of addresses associated with IB devices, and the function rds_ib_get_device() is used to determine if the address already has been qualified as a valid local address. If not found, we call the legacy rds_ib_laddr_check(), now renamed to rds_ib_laddr_check_cm(). Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260408080420.540032-2-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:33:19 -07:00
Jakub Kicinski	2654557112	Merge branch 'net-hamradio-fix-missing-input-validation-in-bpqether-and-scc' Mashiro Chen says: ==================== net: hamradio: fix missing input validation in bpqether and scc This series fixes two missing input validation bugs in the hamradio drivers. Both patches were reviewed by Joerg Reuter (hamradio maintainer). ==================== Link: https://patch.msgid.link/20260409024927.24397-1-mashiro.chen@mailbox.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:19:08 -07:00
Mashiro Chen	8263e484d6	net: hamradio: scc: validate bufsize in SIOCSCCSMEM ioctl The SIOCSCCSMEM ioctl copies a scc_mem_config from user space and assigns its bufsize field directly to scc->stat.bufsize without any range validation: scc->stat.bufsize = memcfg.bufsize; If a privileged user (CAP_SYS_RAWIO) sets bufsize to 0, the receive interrupt handler later calls dev_alloc_skb(0) and immediately writes a KISS type byte via skb_put_u8() into a zero-capacity socket buffer, corrupting the adjacent skb_shared_info region. Reject bufsize values smaller than 16; this is large enough to hold at least one KISS header byte plus useful data. Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org> Acked-by: Joerg Reuter <jreuter@yaina.de> Link: https://patch.msgid.link/20260409024927.24397-3-mashiro.chen@mailbox.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:19:03 -07:00
Mashiro Chen	6183bd8723	net: hamradio: bpqether: validate frame length in bpq_rcv() The BPQ length field is decoded as: len = skb->data[0] + skb->data[1] * 256 - 5; If the sender sets bytes [0..1] to values whose combined value is less than 5, len becomes negative. Passing a negative int to skb_trim() silently converts to a huge unsigned value, causing the function to be a no-op. The frame is then passed up to AX.25 with its original (untrimmed) payload, delivering garbage beyond the declared frame boundary. Additionally, a negative len corrupts the 64-bit rx_bytes counter through implicit sign-extension. Add a bounds check before pulling the length bytes: reject frames where len is negative or exceeds the remaining skb data. Acked-by: Joerg Reuter <jreuter@yaina.de> Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org> Link: https://patch.msgid.link/20260409024927.24397-2-mashiro.chen@mailbox.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:19:03 -07:00
Paul Chaignon	2fefa9c81a	selftests/bpf: Fix reg_bounds to match new tnum-based refinement Commit `efc11a6678` ("bpf: Improve bounds when tnum has a single possible value") improved the bounds refinement to detect when the tnum and u64 range overlap in a single value (and the bounds can thus be set to that value). Eduard then noticed that it broke the slow-mode reg_bounds selftests because they don't have an equivalent logic and are therefore unable to refine the bounds as much as the verifier. The following test case illustrates this. ACTUAL TRUE1: scalar(u64=0xffffffff00000000,u32=0,s64=0xffffffff00000000,s32=0) EXPECTED TRUE1: scalar(u64=[0xfffffffe00000001; 0xffffffff00000000],u32=0,s64=[0xfffffffe00000001; 0xffffffff00000000],s32=0) [...] #323/1007 reg_bounds_gen_consts_s64_s32/(s64)[0xfffffffe00000001; 0xffffffff00000000] (s32)<op> S64_MIN:FAIL with the verifier logs: [...] 19: w0 = w6 ; R0=scalar(smin=0,smax=umax=0xffffffff, var_off=(0x0; 0xffffffff)) R6=scalar(smin=0xfffffffe00000001,smax=0xffffffff00000000, umin=0xfffffffe00000001,umax=0xffffffff00000000, var_off=(0xfffffffe00000000; 0x1ffffffff)) 20: w0 = w7 ; R0=0 R7=0x8000000000000000 21: if w6 == w7 goto pc+3 [...] from 21 to 25: [...] 25: w0 = w6 ; R0=0 R6=0xffffffff00000000 ; ^ ; unexpected refined value 26: w0 = w7 ; R0=0 R7=0x8000000000000000 27: exit When w6 == w7 is true, the verifier can deduce that the R6's tnum is equal to (0xfffffffe00000000; 0x100000000) and then use that information to refine the bounds: the tnum only overlap with the u64 range in 0xffffffff00000000. The reg_bounds selftest doesn't know about tnums and therefore fails to perform the same refinement. This issue happens when the tnum carries information that cannot be represented in the ranges, as otherwise the selftest could reach the same refined value using just the ranges. The tnum thus needs to represent non-contiguous values (ex., R6's tnum above, after the condition). The only way this can happen in the reg_bounds selftest is at the boundary between the 32 and 64bit ranges. We therefore only need to handle that case. This patch fixes the selftest refinement logic by checking if the u32 and u64 ranges overlap in a single value. If so, the ranges can be set to that value. We need to handle two cases: either they overlap in umin64... u64 values matching u32 range: xxx xxx xxx xxx \|--------------------------------------\| u64 range: 0 xxxxx UMAX64 or in umax64: u64 values matching u32 range: xxx xxx xxx xxx \|--------------------------------------\| u64 range: 0 xxxxx UMAX64 To detect the first case, we decrease umax64 to the maximum value that matches the u32 range. If that happens to be umin64, then umin64 is the only overlap. We proceed similarly for the second case, increasing umin64 to the minimum value that matches the u32 range. Note this is similar to how the verifier handles the general case using tnum, but we don't need to care about a single-value overlap in the middle of the range. That case is not possible when comparing two ranges. This patch also adds two test cases reproducing this bug as part of the normal test runs (without SLOW_TESTS=1). Fixes: `efc11a6678` ("bpf: Improve bounds when tnum has a single possible value") Reported-by: Eduard Zingerman <eddyz87@gmail.com> Closes: https://lore.kernel.org/bpf/4e6dd64a162b3cab3635706ae6abfdd0be4db5db.camel@gmail.com/ Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/ada9UuSQi2SE2IfB@mail.gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 13:15:31 -07:00
Mashiro Chen	2835750dd6	net: rose: reject truncated CLEAR_REQUEST frames in state machines All five ROSE state machines (states 1-5) handle ROSE_CLEAR_REQUEST by reading the cause and diagnostic bytes directly from skb->data[3] and skb->data[4] without verifying that the frame is long enough: rose_disconnect(sk, ..., skb->data[3], skb->data[4]); The entry-point check in rose_route_frame() only enforces ROSE_MIN_LEN (3 bytes), so a remote peer on a ROSE network can send a syntactically valid but truncated CLEAR_REQUEST (3 or 4 bytes) while a connection is open in any state. Processing such a frame causes a one- or two-byte out-of-bounds read past the skb data, leaking uninitialized heap content as the cause/diagnostic values returned to user space via getsockopt(ROSE_GETCAUSE). Add a single length check at the rose_process_rx_frame() dispatch point, before any state machine is entered, to drop frames that carry the CLEAR_REQUEST type code but are too short to contain the required cause and diagnostic fields. Signed-off-by: Mashiro Chen <mashiro.chen@mailbox.org> Link: https://patch.msgid.link/20260408172551.281486-1-mashiro.chen@mailbox.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:09:45 -07:00
Billy Tsai	d35a6db887	i3c: mipi-i3c-hci: fix IBI payload length calculation for final status In DMA mode, the IBI status descriptor encodes the payload using CHUNKS (number of chunks) and DATA_LENGTH (valid bytes in the last chunk). All preceding chunks are implicitly full-sized. The current code accumulates full chunk sizes for non-final status descriptors, but for the final status descriptor it only adds DATA_LENGTH. This ignores the contribution of the preceding full chunks described by the same final status entry. As a result, the computed IBI payload length is truncated whenever the final status spans multiple chunks. For example, with a chunk size of 4 bytes, CHUNKS=2 and DATA_LENGTH=1 should result in a total payload size of 5 bytes, but the current code reports only 1 byte. Fix the calculation by adding the size of (CHUNKS - 1) full chunks plus DATA_LENGTH for the last chunk. Fixes: `9ad9a52cce` ("i3c/master: introduce the mipi-i3c-hci driver") Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260407-i3c-hci-dma-v2-1-a583187b9d22@aspeedtech.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>	2026-04-12 22:06:02 +02:00
Jakub Kicinski	baf7cebcf9	Merge branch 'net-enetc-improve-statistics-for-v1-and-add-statistics-for-v4' Wei Fang says: ==================== net: enetc: improve statistics for v1 and add statistics for v4 For ENETC v1, some standardized statistics were redundantly included in the unstructured statistics, so remove these duplicated entries. Previously, the unstructured statistics only contained eMAC data and did not include pMAC data; add pMAC statistics to ensure completeness. For ENETC v4, the driver previously reported MAC statistics only for the internal ENETC (Pseudo MAC). Extend the implementation to provide additional statistics for both the internal ENETC and the standalone ENETC. ==================== Link: https://patch.msgid.link/20260408055849.1314033-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:03:52 -07:00
Wei Fang	98a4f3d341	net: enetc: add unstructured counters for ENETC v4 Like ENETC v1, ENETC v4 also has many non-standard counters, so these counters are added to improve statistical coverage. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260408055849.1314033-6-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:03:48 -07:00
Wei Fang	dbc30b154e	net: enetc: add unstructured pMAC counters for ENETC v1 The ENETC v1 has two MACs (eMAC and pMAC) to support preemption. The existing unstructured counters include the eMAC counters, but not the pMAC counters. So add pMAC counters to improve statistical coverage. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260408055849.1314033-5-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:03:48 -07:00
Wei Fang	6d78c37a73	net: enetc: remove standardized counters from enetc_pm_counters The standardized counters are already exposed via the get_pause_stats(), get_rmon_stats(), get_eth_ctrl_stats() and get_eth_mac_stats() interfaces. Keeping the same counters in enetc_pm_counters results in redundant output. Remove these standardized counters from enetc_pm_counters and rely on the existing statistics interfaces to report them. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260408055849.1314033-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:03:48 -07:00
Wei Fang	c571d309d4	net: enetc: show RX drop counters only for assigned RX rings For ENETC v1, each SI provides 16 RBDCR registers for RX ring drop counters, but this does not imply that an SI actually owns 16 RX rings. The ENETC hardware supports a total of 16 RX rings, which are assigned to 3 SIs (1 PSI and 2 VSIs), so each SI is assigned fewer than 16 RX rings. The current implementation always reports 16 RX drop counters per SI, leading to redundant output for SIs with fewer RX rings. Update the logic to display drop counters only for the RX rings that are actually assigned to the SI. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260408055849.1314033-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:03:48 -07:00
Wei Fang	c6c223fd06	net: enetc: add support for the standardized counters ENETC v4 provides 64-bit counters for IEEE 802.3 basic and mandatory managed objects, the IETF Management Information Database (MIB) package (RFC2665), and Remote Network Monitoring (RMON) statistics. In addition, some ENETCs support preemption, so these ENETCs have two MACs: MAC 0 is the express MAC (eMAC), MAC 1 is the preemptible MAC (pMAC). Both MACs support these statistics. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260408055849.1314033-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 13:03:48 -07:00
Emil Tsalapatis	4c5f21d4df	selftests/bpf: Add tests for non-arena/arena operations Add a selftest that ensures instructions with arena source and non-arena destination registers are accepted by the verifier. Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20260412174546.18684-3-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:48:11 -07:00
Emil Tsalapatis	ac61bffe91	bpf: Allow instructions with arena source and non-arena dest registers The compiler sometimes stores the result of a PTR_TO_ARENA and SCALAR operation into the scalar register rather than the pointer register. Relax the verifier to allow operations between a source arena register and a destination non-arena register, marking the destination's value as a PTR_TO_ARENA. Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Song Liu <song@kernel.org> Fixes: `6082b6c328` ("bpf: Recognize addr_space_cast instruction in the verifier.") Link: https://lore.kernel.org/r/20260412174546.18684-2-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:47:39 -07:00
Alexei Starovoitov	9623c3c69e	Merge branch 'bpf-add-the-missing-fsession' Menglong Dong says: ==================== bpf: add the missing fsession Add the missing fsession attach type to the BPF docs, verifier log and bpftool. Changes since v2: - replace "FENTRY/FEXIT/FSESSION" with "Tracing" in the 1st patch - v2: https://lore.kernel.org/all/20260408062109.386083-1-dongml2@chinatelecom.cn/ Changes since v1: - add a missing FSESSION in bpf_check_attach_target() in the 1st patch - v1: https://lore.kernel.org/all/20260408031416.266229-1-dongml2@chinatelecom.cn/ ==================== Link: https://patch.msgid.link/20260412060346.142007-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:42:39 -07:00
Menglong Dong	f0e16ac716	bpftool: add missing fsession to the usage and docs of bpftool Add the fsession attach type to the usage of bpftool in do_help(). Meanwhile, add it to the bash-completion and bpftool-prog.rst too. Acked-by: Leon Hwang <leon.hwang@linux.dev> Acked-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260412060346.142007-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:42:38 -07:00
Menglong Dong	46d9f15a55	docs/bpf: add missing fsession attach type to docs Add the fsession attach type to program_types.rst and drgn.rst. Acked-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260412060346.142007-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:42:38 -07:00
Menglong Dong	9fd19e3ed7	bpf: add missing fsession to the verifier log The fsession attach type is missed in the verifier log in check_get_func_ip(), bpf_check_attach_target() and check_attach_btf_id(). Update them to make the verifier log proper. Meanwhile, update the corresponding selftests. Acked-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260412060346.142007-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:42:38 -07:00
Alexei Starovoitov	46ffc1f782	Merge branch 'bpf-split-verifier-c' Alexei Starovoitov says: ==================== v3->v4: Restore few minor comments and undo few function moves v2->v3: Actually restore comments lost in patch 3 (instead of adding them to patch 4) v1->v2: Restore comments lost in patch 3 verifier.c is huge. Split it into logically independent pieces. No functional changes. The diff is impossible to review over email. 'git show' shows minimal actual changes. Only plenty of moved lines. Such split may cause backport headaches. We should have split it long ago. Even after split verifier.c is still 20k lines, but further split is harder. ==================== Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20260412152936.54262-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:37:21 -07:00
Alexei Starovoitov	99a832a2b5	bpf: Move BTF checking logic into check_btf.c BTF validation logic is independent from the main verifier. Move it into check_btf.c Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260412152936.54262-7-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:37:04 -07:00
Alexei Starovoitov	ed0b9710bd	bpf: Move backtracking logic to backtrack.c Move precision propagation and backtracking logic to backtrack.c to reduce verifier.c size. No functional changes. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260412152936.54262-6-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:36:58 -07:00
Alexei Starovoitov	c82834a5a1	bpf: Move state equivalence logic to states.c verifier.c is huge. Move is_state_visited() to states.c, so that all state equivalence logic is in one file. Mechanical move. No functional changes. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260412152936.54262-5-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:36:52 -07:00
Alexei Starovoitov	f8a8faceab	bpf: Move check_cfg() into cfg.c verifier.c is huge. Move check_cfg(), compute_postorder(), compute_scc() into cfg.c Mechanical move. No functional changes. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260412152936.54262-4-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:36:45 -07:00
Alexei Starovoitov	fc150cddee	bpf: Move compute_insn_live_regs() into liveness.c verifier.c is huge. Move compute_insn_live_regs() into liveness.c. Mechanical move. No functional changes. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260412152936.54262-3-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:36:38 -07:00
Alexei Starovoitov	449f08fa59	bpf: Move fixup/post-processing logic from verifier.c into fixups.c verifier.c is huge. Split fixup/post-processing logic that runs after the verifier accepted the program into fixups.c. Mechanical move. No functional changes. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260412152936.54262-2-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-12 12:35:54 -07:00
Gal Pressman	8632175ccb	gre: Count GRE packet drops GRE is silently dropping packets without updating statistics. In case of drop, increment rx_dropped counter to provide visibility into packet loss. For the case where no GRE protocol handler is registered, use rx_nohandler. Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260409090945.1542440-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 12:33:33 -07:00
Jakub Kicinski	ba69b788ed	Merge branch 'bpf-fix-sock_ops_get_sk-same-register-oob-read-in-sock_ops-and-add-selftest' Jiayuan Chen says: ==================== bpf: Fix SOCK_OPS_GET_SK same-register OOB read in sock_ops and add selftest When a BPF sock_ops program accesses ctx fields with dst_reg == src_reg, the SOCK_OPS_GET_SK() and SOCK_OPS_GET_FIELD() macros fail to zero the destination register in the !fullsock / !locked_tcp_sock path, leading to OOB read (GET_SK) and kernel pointer leak (GET_FIELD). Patch 1: Fix both macros by adding BPF_MOV64_IMM(si->dst_reg, 0) in the !fullsock landing pad. Patch 2: Add selftests covering same-register and different-register cases for both GET_SK and GET_FIELD. [1] https://lore.kernel.org/bpf/6fe1243e-149b-4d3b-99c7-fcc9e2f75787@std.uestc.edu.cn/T/#u ==================== Link: https://patch.msgid.link/20260407022720.162151-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 12:28:08 -07:00
Jiayuan Chen	04013c3ca0	selftests/bpf: Add tests for sock_ops ctx access with same src/dst register Add selftests to verify SOCK_OPS_GET_SK() and SOCK_OPS_GET_FIELD() correctly return NULL/zero when dst_reg == src_reg and is_fullsock == 0. Three subtests are included: - get_sk: ctx->sk with same src/dst register (SOCK_OPS_GET_SK) - get_field: ctx->snd_cwnd with same src/dst register (SOCK_OPS_GET_FIELD) - get_sk_diff_reg: ctx->sk with different src/dst register (baseline) Each BPF program uses inline asm (__naked) to force specific register allocation, reads is_fullsock first, then loads the field using the same (or different) register. The test triggers TCP_NEW_SYN_RECV via a TCP handshake and checks that the result is NULL/zero when is_fullsock == 0. Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260407022720.162151-3-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 12:28:05 -07:00
Jiayuan Chen	10f86a2a5c	bpf: Fix same-register dst/src OOB read and pointer leak in sock_ops When a BPF sock_ops program accesses ctx fields with dst_reg == src_reg, the SOCK_OPS_GET_SK() and SOCK_OPS_GET_FIELD() macros fail to zero the destination register in the !fullsock / !locked_tcp_sock path. Both macros borrow a temporary register to check is_fullsock / is_locked_tcp_sock when dst_reg == src_reg, because dst_reg holds the ctx pointer. When the check is false (e.g., TCP_NEW_SYN_RECV state with a request_sock), dst_reg should be zeroed but is not, leaving the stale ctx pointer: - SOCK_OPS_GET_SK: dst_reg retains the ctx pointer, passes NULL checks as PTR_TO_SOCKET_OR_NULL, and can be used as a bogus socket pointer, leading to stack-out-of-bounds access in helpers like bpf_skc_to_tcp6_sock(). - SOCK_OPS_GET_FIELD: dst_reg retains the ctx pointer which the verifier believes is a SCALAR_VALUE, leaking a kernel pointer. Fix both macros by: - Changing JMP_A(1) to JMP_A(2) in the fullsock path to skip the added instruction. - Adding BPF_MOV64_IMM(si->dst_reg, 0) after the temp register restore in the !fullsock path, placed after the restore because dst_reg == src_reg means we need src_reg intact to read ctx->temp. Fixes: `fd09af0107` ("bpf: sock_ops ctx access may stomp registers in corner case") Fixes: `84f44df664` ("bpf: sock_ops sk access may stomp registers when dst_reg = src_reg") Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reported-by: Dongliang Mu <dzm91@hust.edu.cn> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Closes: https://lore.kernel.org/bpf/6fe1243e-149b-4d3b-99c7-fcc9e2f75787@std.uestc.edu.cn/T/#u Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260407022720.162151-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 12:28:05 -07:00
Ian Rogers	fab205e492	perf sample: Fix documentation typo s/PEF/PERF/ Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2026-04-12 12:12:11 -07:00
Sukrut Heroorkar	40a3f6c5e2	Documentation: core-api: real-time: correct spelling Fix typo "excpetion" with "exception". Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260411155120.233357-1-hsukrut3@gmail.com>	2026-04-12 13:11:50 -06:00
Frederic Weisbecker	f0efd29aa6	doc: Add CPU Isolation documentation nohz_full was introduced in v3.10 in 2013, which means this documentation is overdue for 13 years. Fortunately Paul wrote a part of the needed documentation a while ago, especially concerning nohz_full in Documentation/timers/no_hz.rst and also about per-CPU kthreads in Documentation/admin-guide/kernel-per-CPU-kthreads.rst Introduce a new page that gives an overview of CPU isolation in general. Acked-by: Waiman Long <longman@redhat.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260402094749.18879-1-frederic@kernel.org>	2026-04-12 13:08:28 -06:00
Linus Torvalds	10d97b74e2	- Fix the error path ordering when the driver-private descriptor allocation fails -----BEGIN PGP SIGNATURE----- iQIyBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmnb5aYACgkQEsHwGGHe VUqu5Q/3eYW1Ygbiti2MFGqLJPwAtzgQ0w7X6YjxzcfjG/zRuUjXjorCYz0NQSik dMMh1X2ebLj3YuLnhM3KDBoU7EM8nIvfQDw4aUU05k9UaXnxxsAb/VTMw62w/9SP bwl/3XfxlMw85kj+G1WBhPfhxX9t4v45PxSVV8gGaGwmMyDR2wRMkV9z3GYCctU7 OKDhe4gJa28wB8BdvyJ0yvRFvjxhtPC3KAjDVfqwoMaTghpE2camzE6bOxkXgZ6o dTPbRq494+cJTtfr1RgkYeATSv3yzvezxbkGoLVYe2Ch2Os+1hwMlk7ZSI7/mecC 2gwavPZ7F2mqj/y/QI+HRy9SaDJP+ntYamtw46xGjvxYadzmsrLVGgtLnFT0u0PR IoajSaY+LMiuEQCIlrBRtE1E0X10XCaIgTvl3W31+vvjvw655OSPWOQqDPoIpBvj 4SZ+89wb5t8vF9LVECYp66AMAkl3N20esm2yNzAEObOi+hsaSggG+RSPd4JOvpkn nJ30P4AUFsRun+Jdq7m0vKltT5OuheToSToDRAKpizxtoWBHg/+rUy2FHI7RmPoD V4JodpsoK3RbR4kWzovotOBiDjt9SFIVh1Y5kSULG3U31kGuWUgkaYMTmsH6A43t g3Sc+Xb7f2F8wsExFB38zID8p+eOCyUb0BsYaJp4mMK5Iafz/g== =pvsc -----END PGP SIGNATURE----- Merge tag 'edac_urgent_for_7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: - Fix the error path ordering when the driver-private descriptor allocation fails * tag 'edac_urgent_for_7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/mc: Fix error path ordering in edac_mc_alloc()	2026-04-12 11:56:07 -07:00
Greg Kroah-Hartman	46ce8be2ce	NFC: digital: Bounds check NFC-A cascade depth in SDD response handler The NFC-A anti-collision cascade in digital_in_recv_sdd_res() appends 3 or 4 bytes to target->nfcid1 on each round, but the number of cascade rounds is controlled entirely by the peer device. The peer sets the cascade tag in the SDD_RES (deciding 3 vs 4 bytes) and the cascade-incomplete bit in the SEL_RES (deciding whether another round follows). ISO 14443-3 limits NFC-A to three cascade levels and target->nfcid1 is sized accordingly (NFC_NFCID1_MAXSIZE = 10), but nothing in the driver actually enforces this. This means a malicious peer can keep the cascade running, writing past the heap-allocated nfc_target with each round. Fix this by rejecting the response when the accumulated UID would exceed the buffer. Commit `e329e71013` ("NFC: nci: Bounds check struct nfc_target arrays") fixed similar missing checks against the same field on the NCI path. Cc: Simon Horman <horms@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Thierry Escande <thierry.escande@linux.intel.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Fixes: `2c66daecc4` ("NFC Digital: Add NFC-A technology support") Cc: stable <stable@kernel.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026040913-figure-seducing-bd3f@gregkh Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:40:45 -07:00
Fernando Fernandez Mancera	a6bd339dbb	net_sched: fix skb memory leak in deferred qdisc drops When the network stack cleans up the deferred list via qdisc_run_end(), it operates on the root qdisc. If the root qdisc do not implement the TCQ_F_DEQUEUE_DROPS flag the packets queue to free are never freed and gets stranded on the child's local to_free list. Fix this by making qdisc_dequeue_drop() aware of the root qdisc. It fetches the root qdisc and check for the TCQ_F_DEQUEUE_DROPS flag. If the flag is present, the packet is appended directly to the root's to_free list. Otherwise, drop it directly as it was done before the optimization was implemented. Fixes: `a6efc273ab` ("net_sched: use qdisc_dequeue_drop() in cake, codel, fq_codel") Reported-by: Damilola Bello <damilola@aterlo.com> Closes: https://lore.kernel.org/netdev/CAPgFtOLaedBMU0f_BxV2bXftTJSmJr018Q5uozOo5vVo6b9tjw@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260408100044.4530-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:38:18 -07:00
Jakub Kicinski	4e17b9b433	Merge branch 'net-phy-add-support-for-disabling-autonomous-eee' Nicolai Buchwitz says: ==================== net: phy: add support for disabling autonomous EEE Some PHYs implement autonomous EEE where the PHY manages EEE independently, preventing the MAC from controlling LPI signaling. This conflicts with MACs that implement their own LPI control. This series adds a .disable_autonomous_eee callback to struct phy_driver and calls it from phy_support_eee(). When a MAC indicates it supports EEE, the PHY's autonomous EEE is automatically disabled. The setting is persisted across suspend/resume by re-applying it in phy_init_hw() after soft reset, following the same pattern suggested by Russell King for PHY tunables [1]. Patch 1 adds the phylib infrastructure. Patch 2 implements it for Broadcom BCM54xx (AutogrEEEn). Patch 3 converts the Realtek RTL8211F, which previously unconditionally disabled PHY-mode EEE in config_init. This came up while adding EEE support to the Cadence macb driver (used on Raspberry Pi 5 with a BCM54210PE PHY). The PHY's AutogrEEEn mode prevented the MAC from tracking LPI state. The Realtek RTL8211F has the same pattern, unconditionally disabling PHY-mode EEE with the comment "Disable PHY-mode EEE so LPI is passed to the MAC". Other BCM54xx PHYs likely have the same AutogrEEEn register layout, but I only have access to the BCM54210PE/BCM54213PE datasheets. It would be appreciated if Florian or others could confirm which other BCM54xx variants share this register so we can wire them up too. Tested on Raspberry Pi CM4 (bcmgenet + BCM54210PE), Raspberry Pi CM5 (Cadence GEM + BCM54210PE) and Raspberry Pi 5 (Cadence GEM + BCM54213PE). [1] https://lore.kernel.org/netdev/acuwvoydmJusuj9x@shell.armlinux.org.uk/ ==================== Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-0-b335e7143711@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:33:26 -07:00
Nicolai Buchwitz	bb14e3b63c	net: phy: realtek: convert RTL8211F to .disable_autonomous_eee The RTL8211F previously unconditionally disabled PHY-mode EEE in config_init. Convert this to use the new .disable_autonomous_eee callback so it is only disabled when the MAC indicates EEE support via phy_support_eee(). This preserves PHY-autonomous EEE for MACs that do not support EEE, while still disabling it when the MAC manages LPI. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-3-b335e7143711@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:33:23 -07:00
Nicolai Buchwitz	bcb3e89fc0	net: phy: broadcom: implement .disable_autonomous_eee for BCM54xx Implement the .disable_autonomous_eee callback for the BCM54210E. In AutogrEEEn mode the PHY manages EEE autonomously. Clearing the AutogrEEEn enable bit in MII_BUF_CNTL_0 switches the PHY to Native EEE mode. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-2-b335e7143711@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:33:23 -07:00
Nicolai Buchwitz	7ef629b458	net: phy: add support for disabling PHY-autonomous EEE Some PHYs (e.g. Broadcom BCM54xx, Realtek RTL8211F) implement autonomous EEE where the PHY manages LPI signaling without forwarding it to the MAC. This conflicts with MAC drivers that implement their own LPI control. Add a .disable_autonomous_eee callback to struct phy_driver and call it from phy_support_eee(). When a MAC driver indicates it supports EEE via phy_support_eee(), the PHY's autonomous EEE is automatically disabled so the MAC can manage LPI entry/exit. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-1-b335e7143711@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:33:23 -07:00
Lorenzo Bianconi	656121b155	net: airoha: Add missing RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue() When the descriptor index written in REG_RX_CPU_IDX() is equal to the one stored in REG_RX_DMA_IDX(), the hw will stop since the QDMA RX ring is empty. Add missing REG_RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue routine during QDMA RX ring cleanup. Fixes: `514aac3599` ("net: airoha: Add missing cleanup bits in airoha_qdma_cleanup_rx_queue()") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260408-airoha-cpu-idx-airoha_qdma_cleanup_rx_queue-v1-1-8efa64844308@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:25:17 -07:00
Jakub Kicinski	200df94709	Merge branch 'ynl-ethtool-netlink-fix-nla_len-overflow-for-large-string-sets' Hangbin Liu says: ==================== ynl/ethtool/netlink: fix nla_len overflow for large string sets This series addresses a silent data corruption issue triggered when ynl retrieves string sets from NICs with a large number of statistics entries (e.g. mlx5_core with thousands of ETH_SS_STATS strings). The root cause is that struct nlattr.nla_len is a __u16 (max 65535 bytes). When a NIC exports enough statistics strings, the ETHTOOL_A_STRINGSET_STRINGS nest built by strset_fill_set() exceeds this limit. nla_nest_end() silently truncates the length on assignment, producing a corrupted netlink message. Patch 1 moves ethtool.py to selftest. Patch 2 improves the ethtool tool: rename the doit/dumpit helpers to do_set/do_get and convert do_get to use ynl.do() with an explicit device header instead of a full dump with client-side filtering. Patch 3 adds a --dbg-small-recv option to the YNL ethtool tool, matching the same option already present in cli.py, to help debug netlink message size issues Patch 4 adds a new helper nla_nest_end_safe() to check whether the nla_len is overflow and return -EMSGSIZE early if so. Patch 5 uses the new helper in ethtool to make sure the ethtool doesn't reply a corrupted netlink message. ==================== Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-0-7623a5e8f70b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:23:53 -07:00
Hangbin Liu	b2fb1a3363	ethtool: strset: check nla_len overflow The netlink attribute length field nla_len is a __u16, which can only represent values up to 65535 bytes. NICs with a large number of statistics strings (e.g. mlx5_core with thousands of ETH_SS_STATS entries) can produce a ETHTOOL_A_STRINGSET_STRINGS nest that exceeds this limit. When nla_nest_end() writes the actual nest size back to nla_len, the value is silently truncated. This results in a corrupted netlink message being sent to userspace: the parser reads a wrong (truncated) attribute length and misaligns all subsequent attribute boundaries, causing decode errors. Fix this by using the new helper nla_nest_end_safe and error out if the size exceeds U16_MAX. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-5-7623a5e8f70b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:23:50 -07:00
Hangbin Liu	1346586a9a	netlink: add a nla_nest_end_safe() helper The nla_len field in struct nlattr is a __u16, which can only hold values up to 65535. If a nested attribute grows beyond this limit, nla_nest_end() silently truncates the length, producing a corrupted netlink message with no indication of the problem. Since nla_nest_end() is used everywhere and this issue rarely happens, let's add a new helper to check the length. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-4-7623a5e8f70b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:23:50 -07:00
Hangbin Liu	594ba44771	tools: ynl: ethtool: add --dbg-small-recv option Add a --dbg-small-recv debug option to control the recv() buffer size used by YNL, matching the same option already present in cli.py. This is useful if user need to get large netlink message. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-3-7623a5e8f70b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:23:49 -07:00
Hangbin Liu	1c43d471a5	tools: ynl: ethtool: use doit instead of dumpit for per-device GET Rename the local helper doit() to do_set() and dumpit() to do_get() to better reflect their purpose. Convert do_get() to use ynl.do() with an explicit device header instead of ynl.dump() followed by client-side filtering. This is more efficient as the kernel only processes and returns data for the requested device, rather than dumping all devices across the netns. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-2-7623a5e8f70b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:23:49 -07:00
Hangbin Liu	22ef8a263c	tools: ynl: move ethtool.py to selftest We have converted all the samples to selftests. This script is the last piece of random "PoC" code we still have lying around. Let's move it to tests. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-1-7623a5e8f70b@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:23:49 -07:00
Jakub Kicinski	ed45d380c5	Merge branch 'net-mana-fix-debugfs-directory-naming-and-file-lifecycle' Erni Sri Satya Vennela says: ==================== net: mana: Fix debugfs directory naming and file lifecycle This series fixes two pre-existing debugfs issues in the MANA driver. Patch 1 fixes the per-device debugfs directory naming to use the unique PCI BDF address via pci_name(), avoiding a potential NULL pointer dereference when pdev->slot is NULL (e.g. VFIO passthrough, nested KVM) and preventing name collisions across multiple PFs or VFs. Patch 2 moves the current_speed debugfs file creation from mana_probe_port() to mana_init_port() so it survives detach/attach cycles triggered by MTU changes or XDP program changes. ==================== Link: https://patch.msgid.link/20260408081224.302308-1-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:22:56 -07:00
Erni Sri Satya Vennela	3b7c7fc97a	net: mana: Move current_speed debugfs file to mana_init_port() Move the current_speed debugfs file creation from mana_probe_port() to mana_init_port(). The file was previously created only during initial probe, but mana_cleanup_port_context() removes the entire vPort debugfs directory during detach/attach cycles. Since mana_init_port() recreates the directory on re-attach, moving current_speed here ensures it survives these cycles. Fixes: `75cabb4693` ("net: mana: Add support for net_shaper_ops") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260408081224.302308-3-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:22:54 -07:00
Erni Sri Satya Vennela	c116f07ab9	net: mana: Use pci_name() for debugfs directory naming Use pci_name(pdev) for the per-device debugfs directory instead of hardcoded "0" for PFs and pci_slot_name(pdev->slot) for VFs. The previous approach had two issues: 1. pci_slot_name() dereferences pdev->slot, which can be NULL for VFs in environments like generic VFIO passthrough or nested KVM, causing a NULL pointer dereference. 2. Multiple PFs would all use "0", and VFs across different PCI domains or buses could share the same slot name, leading to -EEXIST errors from debugfs_create_dir(). pci_name(pdev) returns the unique BDF address, is always valid, and is unique across the system. Fixes: `6607c17c6c` ("net: mana: Enable debugfs files for MANA device") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260408081224.302308-2-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-12 11:22:54 -07:00

... 93 94 95 96 97 ...

1447111 Commits