linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-30 18:13:41 +02:00

Author	SHA1	Message	Date
Weiming Shi	3bcf7aec6a	tap: free page on error paths in tap_get_user_xdp() tap_get_user_xdp() rejects a frame shorter than ETH_HLEN with -EINVAL, and returns -ENOMEM when build_skb() fails. Both paths jump to the err label without freeing the page that vhost_net_build_xdp() allocated for the frame. tap_sendmsg() discards the per-buffer return value and always returns 0, so vhost_tx_batch() takes the success path and never frees the page; each rejected frame in a batch leaks one page-frag chunk. Free the page on both error paths, before the skb is built. This is the tap counterpart of the same leak in tun_xdp_one(). Fixes: `0efac27791` ("tap: accept an array of XDP buffs through sendmsg()") Fixes: `ed7f2afdd0` ("tap: add missing verification for short frame") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260521163230.1478627-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-22 10:08:59 -07:00
Weiming Shi	f4feb1e200	tun: free page on short-frame rejection in tun_xdp_one() tun_xdp_one() returns -EINVAL on a frame shorter than ETH_HLEN without freeing the page that vhost_net_build_xdp() allocated for it. tun_sendmsg() discards that -EINVAL and still returns total_len, so vhost_tx_batch() takes the success path and never frees the page; each short frame in a batch leaks one page-frag chunk. A local process that can open /dev/net/tun and /dev/vhost-net can hit this path: it attaches a tun/tap device as the vhost-net backend and feeds TX descriptors whose length minus the virtio-net header is below ETH_HLEN. Each kick leaks the page-frag chunks for that batch, and a tight submission loop exhausts host memory and triggers an OOM panic. Free the page before returning -EINVAL, matching the XDP-program error path in the same function. Fixes: `049584807f` ("tun: add missing verification for short frame") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260520160020.375349-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-05-22 07:56:27 -07:00
Linus Torvalds	45255ea1ca	Power management fixes for 7.1-rc5 - Fix maximum frequency computation in the intel_pstate driver for Raptor Lake-E and Bartlett Lake that are SMP platforms derived from hybrid ones (Rafael Wysocki, Henry Tseng) - Fix the description of asymmetric packing with SMT in the intel_pstate driver documentation (Ricardo Neri) - Fix multiple amd-pstate driver issues related to dynamic EPP support added recently, including making it opt-in only (K Prateek Nayak, Mario Limonciello) -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmoQUcUSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO11vsH/jvBd0iLTh4n5yqReewjoRMsEox2xM+j z+GjLcLINdb5Tfafbj9N19O7l1KrmlliFWv0aVEHkMhwByP/mrLHKqAwk11HpYlj 4qctvsBjaT0NMjF3+yQ3k8aL1eUL8HhHND9WHNJBiZgh1VT2hgJ/63MXFeL/dBH4 gK9UUdb0JPBso0XxUFtPKfbSYnzTZTX1WBWsNdkFFbb6gzaO68FQFocls2c/7Ykn ctrLEAo/cKbL817osKiwcTlmTwXC85L5eF1woWyLMdqHJ4MnmaVhQGZgirXdoDV+ 1gZPaO3M/bm+B1esI35B2Jhkml5t9ejooqAg9SO4JJ1LcoxvltwOBJw= =ufSV -----END PGP SIGNATURE----- Merge tag 'pm-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix maximum frequency computation in the intel_pstate driver for two processor models, update its documentation and fix issues related to the dynamic EPP support (added during the current development cycle) in the amd-pstate driver: - Fix maximum frequency computation in the intel_pstate driver for Raptor Lake-E and Bartlett Lake that are SMP platforms derived from hybrid ones (Rafael Wysocki, Henry Tseng) - Fix the description of asymmetric packing with SMT in the intel_pstate driver documentation (Ricardo Neri) - Fix multiple amd-pstate driver issues related to dynamic EPP support added recently, including making it opt-in only (K Prateek Nayak, Mario Limonciello)" * tag 'pm-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq/amd-pstate: Drop Kconfig option for dynamic EPP cpufreq: intel_pstate: Use HYBRID_SCALING_FACTOR_ADL for Bartlett Lake cpufreq: intel_pstate: Use correct scaling factor on Raptor Lake-E Documentation: intel_pstate: Fix description of asymmetric packing with SMT cpufreq/amd-pstate-ut: Drop policy reference before driver switch cpufreq/amd-pstate: Use "epp_default_dc" as default when dynamic_epp is disabled cpufreq/amd-pstate: Reorder notifier unregistration and floor perf reset cpufreq/amd-pstate: Allow writes to dynamic_epp when state isn't modified cpufreq/amd-pstate: Return -ENOMEM on failure to allocate profile_name cpufreq/amd-pstate: Grab "amd_pstate_driver_lock" when toggling dynamic_epp	2026-05-22 07:13:13 -07:00
Linus Torvalds	28222dcdad	ACPI support fix for 7.1-rc5 Unbreak system wakeup on critical battery status in the ACPI battery driver inadvertently broken during the 7.0 development cycle. -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmoQURsSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1bYQH/285ZHlySoQOrVyChdDpn18ZImO7LP/J BNt88rytoK5z3//fGS7QHLeOy9TnoRBeSytTkIgNN8D9zSdYTIorUJufZVlIj6Ul kscJMf/F4ihcklRtN30SAwC5AScGfPmkj8BKG+eB2aHzfSjgNFIiItDD9qbca4rJ imcTgDUS2nAuuPHCv/ARX+nZjIeLkXzOUCene20lxWOd2Cs37Cbwi3lVphIRepK4 +4b/Dgu334w9YG4nTCOtIsLfyCXm8lHNKjDUFikPuih+JiYwHJMqKvcmNZsF271m wcLc7V7DJiQxPU2X9doBspe2IKFZBAjzs26ghoKjrPEL+j1gijMJmq8= =4uNW -----END PGP SIGNATURE----- Merge tag 'acpi-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI support fix from Rafael Wysocki: "Unbreak system wakeup on critical battery status in the ACPI battery driver inadvertently broken during the 7.0 development cycle" * tag 'acpi-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: battery: Fix system wakeup on critical battery status	2026-05-22 07:06:21 -07:00
Damien Le Moal	f698276991	block: avoid use-after-free in disk_free_zone_resources() The function disk_update_zone_resources() may call disk_free_zone_resources() in case of error, and following this, blk_revalidate_disk_zones() will again calls disk_free_zone_resources() if disk_update_zone_resources() failed. If a zone worker thread is being used (which is the default for a rotational media zoned device), disk_free_zone_resources() will try to stop the zone worker thread twice because disk->zone_wplugs_worker is not reset to NULL when the worker thread is stopped the first time. In disk_free_zone_resources(), fix this by correctly clearing disk->zone_wplugs_worker to NULL when the worker thread is stopped. And while at it, since disk_free_zone_resources() is always called after a failed call to disk_update_zone_resources(), remove the unnecessary call to disk_free_zone_resources() in disk_update_zone_resources(). Fixes: `1365b6904f` ("block: allow submitting all zone writes from a single context") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260522115622.588535-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-05-22 08:01:52 -06:00
Linus Torvalds	ef7f594f5d	arm64 fixes: - Handle probe on hinted conditional branch instructions. BC.cond instructions can be simulated in the same way as B.cond instructions, so extend the decode mask for B.cond to cover BC.cond - Flush the walk cache when unsharing PMD tables. Recent changes to huge_pmd_unshare() introduced mmu_gather::unshared_tables but the arm64 code was still treating the TLB flushing as only targeting leaf entries (TLBI VALE1IS). Fix it by using non-leaf-only instructions (TLBI VAE1IS) when tlb->unshared_tables is set -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmoQOZMACgkQa9axLQDI XvGxPQ//euo/wUa6/Xf9ZbkVIzhKNijNXt2kO4fLFqx6PfwpdLLvcqWzxL45HznQ YyrAxdrGk8FptKEapUec+nzFV1seBQ2hjk+Lij2xPNMLWo2kQH5yZVTU9MsBM1Xw OFzq8zuJlnu+al83kpBdghwwYpXNsE10BBq7LwOC4F+5cx6F+4v6dt/g38H3+oQS hTAnq6vfFcvlHWxeCTF4PEG+eAoOmJxiIX27hRHqCCsVh19Am0KGZTE01wTyNV7W wx51F2t4CKqNSJx+nyoFahSKoB7Sw9stitfjyW1RLk88V/WbeYQPGyWTY3LVN2t6 xfI31LnlUVEyXwkL5fnjm+yfdLkxooQRZ9e5/IQ4Mv8vqiCGDpSfJdoSXacXTJIl xUjechaV9RSEaQHGnXpohemfE9rt5G/SjcnNNjK1GCWs0OkhwKhJVxsl16Eb9Nek RzMpsRcdDsUKA94E0PAxLp0RqzS/1TRk15JRMH5r+E7swbkWiwwCaAOVPYf4J05h /JXiSVSC90Cat7kehsoBFwns2I9RMioYVSTg3BxyIjA2fD5/BGnTlarM6AiVDOrR BdCI6bFh5YqDkCY7PjRg10opj4fEcc13hNhxEpAeRpd5PWNU67iTzOzhvmmcMpIB J7gKfxChvIpBA1Qqdecee1ZFDyg8wxHH2JjIY+U81lsuGVRlCFc= =koH2 -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Handle probe on hinted conditional branch instructions. BC.cond instructions can be simulated in the same way as B.cond instructions, so extend the decode mask for B.cond to cover BC.cond - Flush the walk cache when unsharing PMD tables. Recent changes to huge_pmd_unshare() introduced mmu_gather::unshared_tables but the arm64 code was still treating the TLB flushing as only targeting leaf entries (TLBI VALE1IS). Fix it by using non-leaf-only instructions (TLBI VAE1IS) when tlb->unshared_tables is set * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: tlb: Flush walk cache when unsharing PMD tables arm64: probes: Handle probes on hinted conditional branch instructions	2026-05-22 06:53:11 -07:00
Linus Torvalds	cbadb98b7c	s390 fixes for 7.1-rc5 - Fix PAI NNPA mismatch between counting and recording, where sampling reports twice the value - Fix loss of PAI counter increments during recording on systems with many CPUs under heavy load, while counting is not affected - On some supported machines, CHSC cannot access memory outside the DMA zone, causing CHSC command failures. Restore GFP_DMA flag when allocating memory for CHSC control blocks - Align the numbering scheme for higher-level topology structures like socket, book, drawer with other hardware identifiers e.g. in sysfs, procfs and tools like lscpu -----BEGIN PGP SIGNATURE----- iI0EABYKADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCag8nxxccYWdvcmRlZXZA bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8OulAQC/lQlKztFCEUad/yCXhfZnJbwz zqUrq/5+JyKm8w1kPgEAq1cYMpzoUdcQ1D8N889SLe6o5aUPHO1fBu3VC6V0cg0= =OS/x -----END PGP SIGNATURE----- Merge tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - Fix PAI NNPA mismatch between counting and recording, where sampling reports twice the value - Fix loss of PAI counter increments during recording on systems with many CPUs under heavy load, while counting is not affected - On some supported machines, CHSC cannot access memory outside the DMA zone, causing CHSC command failures. Restore GFP_DMA flag when allocating memory for CHSC control blocks - Align the numbering scheme for higher-level topology structures like socket, book, drawer with other hardware identifiers e.g. in sysfs, procfs and tools like lscpu * tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/topology: Use zero-based numbering for containing entities s390/cio: Restore GFP_DMA for CHSC allocation s390/pai: Fix missing PAI counter increments under heavy load s390/pai: Disable duplicate read of kernel PAI counter value	2026-05-22 06:40:31 -07:00
Linus Torvalds	46de4087d3	slab fixes for 7.1-rc4 -----BEGIN PGP SIGNATURE----- iQFPBAABCAA5FiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmoQJvUbFIAAAAAABAAO bWFudTIsMi41KzEuMTIsMiwyAAoJELvgsHXSRYiavTcH/iKK1G+ygWOCqKrlZxir /Ga7vPPQSXgiZEtEQ9ZIC/1UQeBp/eQc4uCzgy/aAMqksFjdwsFF/qs6mUr1ffHB ne9RBk40zoP6KQfKkP8GW22ERmDOLnJ1t+QiCibrre8roM7vguDx0Sr5VXGoDmU5 p9mtomLzxJkuDV1gY7LGsT26Es+3PIpma8DTJwlhsUw2igAsp9XnfXJEjBu4eEXP vfGkxK+go5+u4iH5/qKKw6MpfmV7z+O5WJsHaGaKyp8chc6lMSkqtFe1g15h6j6D bL4D6jvmUuwTGsDVaufUnnKdM64+BtcCYJAkgzfKHhge8KG3eBlmsiLkK4uvNRo4 8NI= =E0Wr -----END PGP SIGNATURE----- Merge tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - Stable fix for a missing cpus_read_lock in one of the cpu sheaves flushing paths (Qing Wang) * tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slub: hold cpus_read_lock around flush_rcu_sheaves_on_cache()	2026-05-22 06:23:56 -07:00
Linus Torvalds	1c04dcd891	dma-mapping fixes for Linux 7.1 Two minor updates for the DMA-mapping code, mainly fixing some rare corner cases (Petr Tesarik, Jianpeng Chang). -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCag/+/AAKCRCJp1EFxbsS RC9gAP4qM5M9S2WrUJBCoeQrhUrQajNBXN1HV3N+hncHcgkCUwEA2nJq1oETLONH UI4HDrtEBIUEXQgPWEmCj7krN5IYOw0= =I4my -----END PGP SIGNATURE----- Merge tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "Two minor updates for the DMA-mapping code, mainly fixing some rare corner cases (Petr Tesarik, Jianpeng Chang)" * tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: move dma_map_resource() sanity check into debug code dma-direct: fix use of max_pfn	2026-05-22 06:16:00 -07:00
Linus Torvalds	23884007af	tracing fixes for v7.1: - Avoid NULL return from hist_field_name() The function hist_field_name() is directly passed to a strcat() which does not handle "NULL" characters. Return a zero length string when size is greater than the limit. This is used only to output already created histograms and no field currently is greater than the limit. But it should still not return NULL. - Do not call map->ops->elt_free() on allocation failure When elt_alloc() fails, it should not call the map->ops->elt_free() function if it exists, as that function may not be able to handle the free on allocation failures. The ->elt_free() should only be called when elt_alloc() succeeds. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCag+BVhQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qor+AP94efFkLGAxuv7YIZsPrrkz+dh0XI/N 5asQe9sTnrfGiAD8DhE77S0DkZpMO+OE0J6mqTWmOVqds4RcuCWABxx12Ag= =F67c -----END PGP SIGNATURE----- Merge tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Avoid NULL return from hist_field_name() The function hist_field_name() is directly passed to a strcat() which does not handle "NULL" characters. Return a zero length string when size is greater than the limit. This is used only to output already created histograms and no field currently is greater than the limit. But it should still not return NULL. - Do not call map->ops->elt_free() on allocation failure When elt_alloc() fails, it should not call the map->ops->elt_free() function if it exists, as that function may not be able to handle the free on allocation failures. The ->elt_free() should only be called when elt_alloc() succeeds. * tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Do not call map->ops->elt_free() if elt_alloc() fails tracing: Avoid NULL return from hist_field_name() on truncation	2026-05-22 06:09:58 -07:00
Arnd Bergmann	654ddf855b	platform/x86: bitland-mifs-wmi: add CONFIG_LEDS_CLASS dependency The newly added driver requires the LED classdev support and causes a link failure when that is disabled: x86_64-linux-ld: vmlinux.o: in function `bitland_mifs_wmi_probe': bitland-mifs-wmi.c:(.text+0xede02a): undefined reference to `devm_led_classdev_register_ext' Fixes: `dc1ec4fa86` ("platform/x86: bitland-mifs-wmi: Add new Bitland MIFS WMI driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260519202804.1339581-1-arnd@kernel.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2026-05-22 15:48:41 +03:00
Fernando Fernandez Mancera	18014147d3	netfilter: nf_tables: fix dst corruption in same register operation For lshift and rshift, the shift operations are performed in a loop over 32-bit words. The loop calculates the shifted value and write it to dst, and then immediately reads from src to calculate the carry for the next iteration. Because src and dst could point to the same memory location, the carry is incorrectly calculated using the newly modified dst value instead of the original src value. Adding a temporary local variable to cache the original value before writing to dst and using it for the carry calculation solves the problem. In addition, partial overlap is rejected from control plane for all kind of operations including byteorder. This was tested with the following bytecode: table test_table ip flags 0 use 1 handle 1 ip test_table test_chain use 3 type filter hook input prio 0 policy accept packets 0 bytes 0 flags 1 ip test_table test_chain 2 [ immediate reg 1 0x44332211 0x88776655 ] [ bitwise reg 1 = ( reg 1 << 0x08000000 ) ] [ cmp eq reg 1 0x66443322 0x00887766 ] [ counter pkts 0 bytes 0 ] ip test_table test_chain 4 3 [ immediate reg 1 0x44332211 0x88776655 ] [ bitwise reg 1 = ( reg 1 << 0x08000000 ) ] [ cmp eq reg 1 0x55443322 0x00887766 ] [ counter pkts 21794 bytes 1917798 ] Fixes: `567d746b55` ("netfilter: bitwise: add support for shifts.") Acked-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:46 +02:00
Jiayuan Chen	a40aaaef2f	selftests: netfilter: add nft_fib_nexthop test Functional coverage of nft_fib6_eval()'s nexthop enumeration over three route shapes: 1) single external nexthop (nhid) 2) external nexthop group (nhid -> group) 3) old-style multipath (nexthop ... nexthop ...) Each scenario places one nexthop on the input device (veth0). For (2) and (3) the matching nexthop is the second member, so the walk has to traverse beyond the primary nh. Two nft counters on prerouting verify the data path: one increments only when fib reports veth0 as the oif, the other counts "missing" results and must stay at zero. ./nft_fib_nexthop.sh PASS: single external nexthop (nhid -> veth0) PASS: nexthop group (dummy0 + veth0) PASS: old-style multipath (sibling on veth0) Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:46 +02:00
Jiayuan Chen	f81b0c2d28	netfilter: nft_fib_ipv6: handle routes via external nexthop fib6_info has a union: union { struct list_head fib6_siblings; struct list_head nh_list; }; Old-style multipath (ip -6 route add ... nexthop ... nexthop ...) uses fib6_siblings. External nexthop (ip -6 route add ... nhid N) uses nh_list, linked into &nh->f6i_list. nft_fib6_info_nh_uses_dev() blindly walks &rt->fib6_siblings, causing an OOB read past the struct nexthop slab when rt->nh is set: ================================================================== BUG: KASAN: slab-out-of-bounds in nft_fib6_eval+0x1362/0x16c0 Read of size 8 at addr ffff888103a099d0 by task ping/386 CPU: 2 UID: 0 PID: 386 Comm: ping Not tainted 7.1.0-rc3+ #251 PREEMPT Call Trace: <IRQ> dump_stack_lvl+0x76/0xa0 print_report+0xd1/0x5f0 kasan_report+0xe7/0x130 __asan_report_load8_noabort+0x14/0x30 nft_fib6_eval+0x1362/0x16c0 nft_do_chain+0x279/0x18c0 nft_do_chain_ipv6+0x1a8/0x230 nf_hook_slow+0xad/0x200 ipv6_rcv+0x152/0x380 __netif_receive_skb_one_core+0x118/0x1c0 ================================================================== Branch by route shape: when rt->nh is set, walk via nexthop_for_each_fib6_nh() (also covers nh groups, which the original code missed); otherwise walk fib6_siblings, guarded by READ_ONCE() of rt->fib6_nsiblings as required by commit `31d7d67ba1` ("ipv6: annotate data-races around rt->fib6_nsiblings"). Fixes: `1c32b24c23` ("netfilter: nft_fib_ipv6: switch to fib6_lookup") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:46 +02:00
Jiayuan Chen	1d001b0a61	netfilter: nft_fib_ipv6: walk fib6_siblings under RCU nft_fib6_info_nh_uses_dev() runs from nft_fib6_eval() in softirq under rcu_read_lock(). fib6_siblings is modified by writers that hold tb6_lock but do not wait for RCU readers, so the sibling walk should use list_for_each_entry_rcu(): it adds READ_ONCE() on the ->next pointer and lets CONFIG_PROVE_RCU_LIST validate the locking. No functional change for non-debug builds. Fixes: `1c32b24c23` ("netfilter: nft_fib_ipv6: switch to fib6_lookup") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:46 +02:00
Florian Westphal	f438d1786d	netfilter: ebtables: fix OOB read in compat_mtw_from_user Luxiao Xu says: The function compat_mtw_from_user() converts ebtables extensions from 32-bit user structures to kernel native structures. However, it lacks proper validation of the user-supplied match_size/target_size. When certain extensions are processed, the kernel-side translation logic may perform memory accesses based on the extension's expected size. If the user provides a size smaller than what the extension requires, it results in an out-of-bounds read as reported by KASAN. This fix introduces a check to ensure match_size is at least as large as the extension's required compatsize. This covers matches, watchers, and targets, while maintaining compatibility with standard targets. AFAIU this is relevant for matches that need to go though match->compat_from_user() call. Those that use plain memcpy with the user-provided size are ok because the caller checks that size vs the start of the next rule entry offset (which itself is checked vs. total size copied from userspace). The ->compat_from_user() callbacks assume they can read compatsize bytes, so they need this extra check. Based on an earlier patch from Luxiao Xu. Fixes: `81e675c227` ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Luxiao Xu <rakukuip@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:46 +02:00
Florian Westphal	968cc2c963	netfilter: disable payload mangling in userns Several parts of network stack rely on iph->ihl validation done by network stack before PRE_ROUTING. Disable this feature for user namespaces for now. tcp option handling is likely safe even for LOCAL_IN, so this this leaves tcp option mangling via nft_exthdr.c as-is. I don't think these are the only means to alter packets, but these appear to be relatively prominent. This could be relaxed later. Example: - allow userns for ingress hook. - allow userns if base is transport header. Also, we should revalidate or restrict generally: - Don't allow linklayer writes to spill into network header - restrict ipv4 and ipv6 to 'known safe' writes, e.g. saddr/daddr/check/tos Reported-by: Qi Tang <tpluszz77@gmail.com> Reported-by: Tong Liu <lyutoon@gmail.com> Tested-by: Qi Tang <tpluszz77@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20260515100411.3141-1-fw@strlen.de/ Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:46 +02:00
Florian Westphal	c376f07e16	netfilter: xt_cpu: prefer raw_smp_processor_id With PREEMPT_RCU we get splat: BUG: using smp_processor_id() in preemptible [..] caller is cpu_mt+0x53/0xd0 net/netfilter/xt_cpu.c:37 CPU: 1 .. Comm: syz.3.1377 #0 PREEMPT(full) Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 check_preemption_disabled+0xd3/0xe0 lib/smp_processor_id.c:47 cpu_mt+0x53/0xd0 net/netfilter/xt_cpu.c:37 [..] Just use raw version instead. This is similar to `14d14a5d29` ("netfilter: nft_meta: use raw_smp_processor_id()"). Fixes: `0ca743a559` ("netfilter: nf_tables: add compatibility layer for x_tables") Reported-by: syzbot+690d3e3ffa7335ac10eb@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:46 +02:00
Florian Westphal	47980b6dbf	netfilter: nf_conntrack_gre: fix gre keymap list corruption Quoting reporter: A race between GRE keymap insertion and destruction can corrupt the kernel list or use a freed object. `nf_ct_gre_keymap_add()` publishes a new keymap pointer before the embedded `list_head` is linked, while `nf_ct_gre_keymap_destroy()` can concurrently delete and free that same object. An unprivileged user can reach this through the PPTP conntrack helper by racing PPTP control messages or helper teardown, leading to KASAN-detectable list corruption/UAF in kernel context. ## Root Cause Analysis `exp_gre()` installs GRE expectations for a PPTP control flow and then adds two GRE keymap entries [..] The add path publishes `ct_pptp_info->keymap[dir]` before linking the embedded list node [..] Concurrent teardown deletes that partially initialized object. Make add/destroy symmetric: install both, destroy both while under lock. Furthermore, we should refuse to publish a new mapping in case ct is going away, else we may leak the allocation. The "retrans" detection is strange: existing mapping is checked for key equality with the new mapping, then for "is on the list" via list walk. But I can't see how an existing keymap entry can be NOT on list. Change this to only check if we're asked to map same tuple again -- if so, skip re-install, else signal failure. Last, add a bug trap for the keymap list; it has to be empty when namespace is going away. Reported-by: Leo Lin <leo@depthfirst.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:46 +02:00
Chris Mason	92170e6afe	netfilter: synproxy: refresh tcphdr after skb_ensure_writable synproxy_tstamp_adjust() rewrites the TCP timestamp option in place and then patches the TCP checksum via inet_proto_csum_replace4() on the caller-supplied tcphdr pointer. Both ipv4_synproxy_hook() and ipv6_synproxy_hook() obtain that pointer with skb_header_pointer() before calling in, so it may either alias skb->head directly or point at the caller's on-stack _tcph buffer. Between obtaining the pointer and using it, the function calls skb_ensure_writable(skb, optend), which on a cloned or non-linear skb invokes pskb_expand_head() and frees the old skb->head. After that point the cached th is stale: caller (ipv[46]_synproxy_hook) th = skb_header_pointer(skb, ..., &_tcph) synproxy_tstamp_adjust(skb, protoff, th, ...) skb_ensure_writable(skb, optend) pskb_expand_head() /* kfree(old skb->head) / ... inet_proto_csum_replace4(&th->check, ...) / writes into freed head, or into the caller's stack copy leaving the on-wire checksum stale */ The option bytes are written through skb->data and are fine; only the checksum update goes through th and so lands in the wrong place. The result is either a write into freed slab memory or a packet leaving with a checksum that does not match its payload. Fix by re-deriving th from skb->data + protoff immediately after skb_ensure_writable() succeeds, so the subsequent checksum update targets the linear, writable header. Fixes: `48b1de4c11` ("netfilter: add SYNPROXY core/target") Assisted-by: kres (claude-opus-4-7) Signed-off-by: Chris Mason <clm@meta.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:28:40 +02:00
Hamza Mahfooz	bed6e04be8	netfilter: conntrack: tcp: do not force CLOSE on invalid-seq RST without direction check An unintended behavior in the TCP conntrack state machine allows a connection to be forced into the CLOSE state using an RST packet with an invalid sequence number. Specifically, after a SYN packet is observed, an RST with an invalid SEQ can transition the conntrack entry to TCP_CONNTRACK_CLOSE, regardless of whether the RST corresponds to the expected reply direction. The relevant code path assumes the RST is a response to an outgoing SYN, but does not validate packet direction or ensure that a matching SYN was actually sent in the opposite direction. As a result, a crafted packet sequence consisting of a SYN followed by an invalid-sequence RST can prematurely terminate an active NAT entry. This makes connection teardown easier than intended. So, tighten the state transition logic to ensure that RST-triggered CLOSE transitions only occur when the RST is a valid response to a previously observed SYN in the correct direction. Cc: stable@vger.kernel.org Fixes: `9fb9cbb108` ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-05-22 12:27:55 +02:00
Bartosz Golaszewski	215c90ee65	device property: set fwnode->secondary to NULL in fwnode_init() If a firmware node is allocated on the stack (for instance: temporary software node whose life-time we control) or on the heap - but using a non-zeroing allocation function - and initialized using fwnode_init(), its secondary pointer will contain uninitalized memory which likely will be neither NULL nor IS_ERR() and so may end up being dereferenced (for example: in dev_to_swnode()). Set fwnode->secondary to NULL on initialization. Cc: stable <stable@kernel.org> Fixes: `01bb86b380` ("driver core: Add fwnode_init()") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-05-22 12:24:08 +02:00
Zeng Heng	c2ff4764e0	arm64: tlb: Flush walk cache when unsharing PMD tables When huge_pmd_unshare() is called to unshare a PMD table, the tlb_unshare_pmd_ptdesc() function sets tlb->unshared_tables=true but the aarch64 tlb_flush() only checked tlb->freed_tables to determine whether to use TLBF_NONE (vae1is, invalidates walk cache) or TLBF_NOWALKCACHE (vale1is, leaf-only). This caused the stale PMD page table entry to remain in the walk cache after unshare, potentially leading to incorrect page table walks. Fix by including unshared_tables in the check, so that when unsharing tables, TLBF_NONE is used and the walk cache is properly invalidated. Here is the detailed distinction between vae1is and vale1is: \| Instruction Combination \| Actual Invalidation Scope \| \| ------------------------ \| --------------------------------------------------\| \| `VAE1IS` + TTL=`0` \| All entries at all levels (full invalidation) \| \| `VAE1IS` + TTL=`2` (L2) \| Non-leaf at Level 0/1 + leaf at Level 2 \| \| `VALE1IS` + TTL=`0` \| Leaf entries at all levels (non-leaf not cleared) \| \| `VALE1IS` + TTL=`2` (L2) \| Leaf entry at Level 2 only \| Signed-off-by: Zeng Heng <zengheng4@huawei.com> Fixes: `8ce720d5bd` ("mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather") Cc: <stable@vger.kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2026-05-22 11:13:51 +01:00
Claudio Imbrenda	9029496abf	KVM: s390: Properly reset zero bit in PGSTE In case of memory pressure, it's possible that a guest page gets freed and then almost immediately reused by the guest. If CMMA is enabled, _essa_clear_cbrl() will discard all pages that are either unused or zero. If a discarded page is reused before _essa_clear_cbrl() is called, and the pgste.zero bit is not cleared, the page will be discarded despite not being unused. When calling _gmap_ptep_xchg(), always clear the pgste.zero bit. This prevents the page from being accidentally discarded when not unused. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: `a2c17f9270` ("KVM: s390: New gmap code") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2026-05-22 11:25:11 +02:00
Claudio Imbrenda	a488e753de	KVM: s390: vsie: Fix redundant rmap entries The address passed to the gmap rmap was not being masked. As a consequence several different (but functionally equivalent) rmap entries were being created for each shadowed table. Fix this by properly masking the address depending on the table level. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: `a2c17f9270` ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2026-05-22 11:25:11 +02:00
Claudio Imbrenda	2d505c2906	KVM: s390: vsie: Fix unshadowing logic In some cases (i.e. under extreme memory pressure on the host), attempting to shadow memory will result in the same memory being unshadowed, causing a loop. Add a PGSTE bit to distinguish between shadowed memory and shadowed DAT tables, fix the unshadowing logic in _gmap_ptep_xchg() to prevent unnecessary unshadowing and perform better checks. Also fix the unshadowing logic in _gmap_crstep_xchg_atomic() which did not unshadow properly when the large page would become unprotected. Opportunistically add a check in gmap_protect_rmap() to make sure it won't be called with level == TABLE_TYPE_PAGE_TABLE. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: `a2c17f9270` ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2026-05-22 11:25:10 +02:00
Claudio Imbrenda	4df4b7cdf5	KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors Fix a memory leak that can happen if gmap_ucas_map_one() or kvm_s390_mmu_cache_topup() return error values. Also fix a similar issue in gmap_set_limit(). Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: `a2c17f9270` ("KVM: s390: New gmap code") Reported-by: Jiaxin Fan <jiaxin.fan@ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2026-05-22 11:25:10 +02:00
Claudio Imbrenda	d0f2eb4493	KVM: s390: vsie: Fix memory leak when unshadowing When performing a partial unshadowing, the rmap was being leaked. Add the missing kfree(). Fixes: `a2c17f9270` ("KVM: s390: New gmap code") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>	2026-05-22 11:25:10 +02:00
Jingguo Tan	dfa0d7b0ff	xfrm: esp: restore combined single-frag length gate The ESP out-of-place fast path appends the trailer in esp_output_head() before esp_output_tail() allocates the destination page frag. The head-side gate currently checks skb->data_len and tailen separately, but the tail code allocates a single destination frag from the combined post-trailer skb->data_len. Reject the page-frag fast path when the combined aligned length exceeds a page. Otherwise skb_page_frag_refill() may fall back to a single page while the destination sg still spans the combined skb->data_len. Restore this combined-length page gate for both IPv4 and IPv6. Fixes: `5bd8baab08` ("esp: limit skb_page_frag_refill use to a single page") Cc: stable@vger.kernel.org Signed-off-by: Lin Ma <malin89@huawei.com> Signed-off-by: Chenyuan Mi <michenyuan@huawei.com> Signed-off-by: Jingguo Tan <tanjingguo@huawei.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2026-05-22 09:20:26 +02:00
e521588	2982e599ff	esp: fix page frag reference leak on skb_to_sgvec failure In esp_output_tail(), when esp->inplace is false, the old skb page frags are replaced with a new page from the xfrm page_frag cache. The source scatterlist (sg) is built from the old frags before the replacement, and esp_ssg_unref() is responsible for releasing the old page references after the crypto operation completes. However, if the second skb_to_sgvec() call (which builds the destination scatterlist from the new page) fails, the code jumps to error_free which only calls kfree(tmp). The old page frag references captured in the source scatterlist are never released: 1. sg[] is built from old frags via skb_to_sgvec() (no extra get_page) 2. nr_frags is set to 1 and frag[0] is replaced with the new page 3. Second skb_to_sgvec() fails -> goto error_free 4. kfree(tmp) frees the sg[] memory but old frags are not unref'd 5. kfree_skb() only releases frag[0] (the new page), not the old ones Fix this by adding a bool parameter to esp_ssg_unref() that, when true, unconditionally unrefs the source scatterlist frags without checking req->src and req->dst, since those fields are not yet initialized by aead_request_set_crypt() at the point of the error. Existing callers pass false to preserve the original behavior. The same issue exists in both esp4 and esp6 as the code is identical. Fixes: `cac2661c53` ("esp4: Avoid skb_cow_data whenever possible") Fixes: `03e2a30f6a` ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: Alessandro Schino <7991aleschino@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2026-05-22 09:19:37 +02:00
Bibo Mao	4a09f4a23a	LoongArch: KVM: Move some variable declarations to paravirt.h Some variables relative with paravirt feature are declared in the header file asm/qspinlock.h, however this file can be included only when option CONFIG_SMP is on. There is compiling warnings if CONFIG_SMP is off since variables are not declared. Move these variable declarations to header file asm/paravirt.h to avoid compiling warnings. Fixes: `c43dce6f13` ("LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605061313.O8Hswm2b-lkp@intel.com/ Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2026-05-22 15:05:12 +08:00
Tiezhu Yang	1c856e158f	LoongArch: kprobes: Fix handling of fatal unrecoverable recursions KPROBE_HIT_SS and KPROBE_REENTER are two types of fatal recursions that can not be safely recovered in kprobes. KPROBE_HIT_SS means that a kprobe is hit during single-stepping. At this point, the architecture-specific single-step context is already active. Nested single-stepping would corrupt the state, as the kprobe control block (kcb) and hardware registers cannot safely store multiple levels of stepping state. KPROBE_REENTER means that a third-level recursion occurs when a probe is hit while the system is already handling a nested probe (second- level). The kcb only provides a single slot (prev_kprobe) to backup the state. When a third probe is hit, there is no more space to save the state without corrupting the first-level backup. Kprobes work by replacing instructions with breakpoints. In order to execute the original instruction and continue, it must be moved to a temporary "single-step" slot. Since there is no backup space left to set up this slot safely, the CPU would be forced to return to the same original breakpoint address, triggering an endless loop. Currently, the code only prints a warning and returns. This leads to an infinite re-entry loop as the CPU repeatedly hits the same trap and a "stuck" CPU core because preemption was disabled at the start of the handler and never re-enabled in this early return path. Fix the logic by: 1. Merging KPROBE_HIT_SS and KPROBE_REENTER cases, as both represent fatal recursions that cannot be safely recovered. 2. Replacing WARN_ON_ONCE() with BUG() to terminate the system. This aligns LoongArch with other architectures (x86, arm64, riscv) and prevents stack overflow while providing diagnostic information. Fixes: `6d4cc40fb5` ("LoongArch: Add kprobes support") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2026-05-22 15:05:07 +08:00
Tiezhu Yang	e3ef9a28f5	LoongArch: kprobes: Use larch_insn_text_copy() to patch instructions On SMP systems, kprobe handlers would occasionally fail to execute on certain CPU cores. The issue is hard to reproduce and typically occurs randomly under high system load. The root cause is a software-side instruction hazard. According to the LoongArch Reference Manual, while the cache coherency is maintained by hardware, software must explicitly use the "IBAR" instruction to ensure the instruction fetch unit (IFU) observes the effects of recent stores. The current arch_arm_kprobe() and arch_disarm_kprobe() only execute the "IBAR" barrier (via flush_insn_slot -> local_flush_icache_range) on the local CPU. This leaves a vulnerable window where remote CPU cores may continue executing stale instructions from their pipelines or prefetch buffers, as they have not executed an "IBAR" since the code modification. Switch to larch_insn_text_copy() to fix this: 1. Synchronization: It uses stop_machine_cpuslocked() to synchronize all online CPUs, ensuring no CPU is executing the target code area during modification. 2. Visibility: By passing cpu_online_mask to stop_machine_cpuslocked(), the callback text_copy_cb() is executed on all online cores. Each CPU core invokes local_flush_icache_range() to execute "IBAR", clearing instruction hazards system-wide and ensuring the "break" instruction is visible to the fetch units of all cores. 3. Robustness: It properly manages memory write permissions (ROX/RW) for the kernel text segment during patching, ensuring compatibility with CONFIG_STRICT_KERNEL_RWX. Cc: <stable@vger.kernel.org> # 6.18+ Fixes: `6d4cc40fb5` ("LoongArch: Add kprobes support") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2026-05-22 15:05:07 +08:00
Takashi Iwai	2519003dd5	ASoC: Fixes for v7.1 A bigger batch of fixes than usual due to -next not happeing last week, this is mostly stuff for laptops - a lot of quirks and small fixes, mainly for x86 and SoundWire. Nothing too big or exciting individually, just two week's worth. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmoPk5oACgkQJNaLcl1U h9AW2Af+IFfNdP+xpv6d+aOjyvifBggBhCEUjbVJU/R5RVNd4Za3cHdSw1tueHqC /Bk9s+S9uoWMOvpsnYiqMG7ez1p3LAQbvV+ASSCgcsmZ7LohUxQY8nQAURGWq1mc 7zdDYeb/Lh+QikSaMQxxL0f5DLFctdGiHtlmJs34kDh8OTle0EDqG2r4rjNCFOqN fvRNjlArTRo1IHU8qryeyfm68C/80od36cuWsoGicVOuJoBvDTq6hVeVv+gL6jL1 QTKhDG6aOl0+zVYfy6fOy1LdA164O/NR5ptFnos7DtRf7qzqOuEWpuzm6Vzmqrwz bNuqL+6SuuaRdcD13LRnQaL8fdxcpQ== =ZHJg -----END PGP SIGNATURE----- Merge tag 'asoc-fix-v7.1-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v7.1 A bigger batch of fixes than usual due to -next not happeing last week, this is mostly stuff for laptops - a lot of quirks and small fixes, mainly for x86 and SoundWire. Nothing too big or exciting individually, just two week's worth.	2026-05-22 08:25:18 +02:00
Byungchul Park	54cf41c969	Revert "mm: introduce a new page type for page pool in page type" This reverts commit `db359fccf2` ("mm: introduce a new page type for page pool in page type") and a part of `735a309b4b` ("net: add net_iov_init() and use it to initialize ->page_type"). Netpp page_type'ed pages might be used in mapping so as to use @_mapcount. However, since @page_type and @_mapcount are union'ed in struct page, these two can't be used at the same time. Revert the commit introducing page_type for Netpp for now. The patch will be retried once @page_type and @_mapcount get allowed to be used at the same time. The revert also includes removal of @page_type initialization part introduced by commit `735a309b4b` ("net: add net_iov_init() and use it to initialize ->page_type"), which will be restored on the retry. Link: https://lore.kernel.org/20260515034701.17027-1-byungchul@sk.com Fixes: `db359fccf2` ("mm: introduce a new page type for page pool in page type") Signed-off-by: Byungchul Park <byungchul@sk.com> Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Closes: https://lore.kernel.org/all/982b9bc1-0a0a-4fc5-8e3a-3672db2b29a1@nvidia.com Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Mark Bloch <mbloch@nvidia.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Simon Horman <horms@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Toke Hoiland-Jorgensen <toke@redhat.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:13 -07:00
Uladzislau Rezki (Sony)	04aa71da5f	mm/vmalloc: do not trigger BUG() on BH disabled context __get_vm_area_node() currently triggers a BUG() if in_interrupt() returns true. However, in_interrupt() also reports true when BH are disabled. The bridge code can call rhashtable_lookup_insert_fast() with bottom halves disabled: __vlan_add() -> br_fdb_add_local() spin_lock_bh(&br->hash_lock); <-- Disable BH -> fdb_add_local() -> fdb_create() -> rhashtable_lookup_insert_fast() -> kvmalloc() -> vmalloc() -> __get_vm_area_node() -> BUG_ON(in_interrupt()) spin_unlock_bh(&br->hash_lock) this triggers the BUG() despite the caller not being in NMI or hard IRQ context. Replace the in_interrupt() check with in_nmi() \|\| in_hardirq(). Link: https://lore.kernel.org/20260515153009.2296191-1-urezki@gmail.com Fixes: `c6307674ed` ("mm: kvmalloc: add non-blocking support for vmalloc") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Ido Schimmel <idosch@nvidia.com> Reported-by: syzbot+8b12fc6e0fb139765b58@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69ff8c7c.050a0220.1036b8.000b.GAE@google.com/ Reviewed-by: Baoquan He <baoquan.he@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:13 -07:00
Eugen Hristev	f0af98ff6b	MAINTAINERS, mailmap: change email for Eugen Hristev Replace old bouncing emails with ehristev@kernel.org Link: https://lore.kernel.org/20260425-eh-mailmap-v1-1-58788d401eef@kernel.org Signed-off-by: Eugen Hristev <ehristev@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:13 -07:00
Sunny Patel	2c6f81d587	mm/migrate_device: fix pgtable leak in migrate_vma_insert_huge_pmd_page When migrate_vma_insert_huge_pmd_page() jumps to unlock_abort due to a PMD check failure, the pgtable allocated earlier via pte_alloc_one() is never freed, causing a memory leak. Added free_abort label to release the pgtable in error path. Link: https://lore.kernel.org/20260501115122.23288-1-nueralspacetech@gmail.com Fixes: `a30b48bf1b` ("mm/migrate_device: implement THP migration of zone device pages") Signed-off-by: Sunny Patel <nueralspacetech@gmail.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:13 -07:00
Deepanshu Kartikey	09e7827e78	kernel/fork: validate exit_signal in kernel_clone() When a child process exits, it sends exit_signal to its parent via do_notify_parent(). The clone() syscall constructs exit_signal as: (lower_32_bits(clone_flags) & CSIGNAL) CSIGNAL is 0xff, so values in the range 65-255 are possible. However, valid_signal() only accepts signals up to _NSIG (64 on x86_64). A non-zero non-valid exit_signal acts the same as exit_signal == 0: the parent process is not signaled when the child terminates. The syzkaller reproducer triggers this by calling clone() with flags=0x80, resulting in exit_signal = (0x80 & CSIGNAL) = 128, which exceeds _NSIG and is not a valid signal. The v1 of this patch added the check only in the clone() syscall handler, which is incomplete. kernel_clone() has other callers such as sys_ia32_clone() which would remain unprotected. Move the check to kernel_clone() to cover all callers. Since the valid_signal() check is now in kernel_clone() and covers all callers including clone3(), the same check in copy_clone_args_from_user() becomes redundant and is removed. The higher 32bits check for clone3() is kept as it is clone3() specific. Note that this is a user-visible change: previously, passing an invalid exit_signal to clone() was silently accepted. The man page for clone() does not document any defined behavior for invalid exit_signal values, so rejecting them with -EINVAL is the correct behavior. It is unlikely that any sane application relies on passing an invalid exit_signal. [oleg@redhat.com: the comment above kernel_clone() should be updated] Link: https://lore.kernel.org/abwvgU17W8wuW2-J@redhat.com Link: https://lore.kernel.org/20260316151956.563558-1-kartikey406@gmail.com Fixes: `3f2c788a13` ("fork: prevent accidental access to clone3 features") Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: syzbot+bbe6b99feefc3a0842de@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bbe6b99feefc3a0842de Tested-by: syzbot+bbe6b99feefc3a0842de@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/20260307064202.353405-1-kartikey406@gmail.com/T/ [v1] Link: https://lore.kernel.org/all/20260316104536.558108-1-kartikey406@gmail.com/T/ [v2] Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ben Segall <bsegall@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:12 -07:00
Alexandre Ghiti	e16f17a9c5	mm: memcontrol: propagate NMI slab stats to memcg vmstats flush_nmi_stats() drains per-node NMI slab atomics into the per-node lruvec_stats, but does not propagate them to the memcg-level vmstats. For non NMI case, account_slab_nmi_safe() calls mod_memcg_lruvec_state() which updates both per-node lruvec_stats and memcg-level vmstats, so flush_nmi_stats() needs to flush to per-node lruvec_stats as well as memcg-level vmstats. So fix this by flushing to the memcg-level vmstats for NMI too. Link: https://lore.kernel.org/20260518082830.599102-1-alex@ghiti.fr Fixes: `940b01fc8d` ("memcg: nmi safe memcg stats for specific archs") Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:12 -07:00
SeongJae Park	441f92f7d3	mm/damon/sysfs-schemes: delete tried region in regions_rmdirs() DAMON sysfs maintains the DAMOS tried region directory objects via a linked list. When the user requests refresh of the directories, DAMON sysfs removes all the region directories first, and then generate updated regions directory on the empty space. The removal function (damon_sysfs_scheme_regions_rm_dirs()) only puts the kobj objects. Deletion of the container region object from the linked list is done inside the kobj release callback function. If somehow the callback invocation is delayed, the list will contain regions list that gonna be freed. If the updated region directories creation is started in this situation, the list can be corrupted and use-after-free can happen. Because the kobj objects are managed by only DAMON sysfs, the issue cannot happen in normal situation. But, such delays can be made on kernels that built with CONFIG_DEBUG_KOBJECT_RELEASE. On the kernel, the issue can indeed be reproduced like below. # damo start --damos_action stat # cd /sys/kernel/mm/damon/admin/kdamonds/0/ # for i in {1..10}; do echo update_schemes_tried_regions > state; done # dmesg \| grep underflow [ 89.296152] refcount_t: underflow; use-after-free. Fix the issue by removing the region object from the list when decrementing the reference count. Also update damos_sysfs_populate_region_dir() to add the region object to the list only after the kobject_init_and_add() is success, so that fail of kobject_init_and_add() is not leaving the deallocated object on the list. The issue was discovered [1] by Sashiko. Link: https://lore.kernel.org/20260518152559.93038-1-sj@kernel.org Link: https://lore.kernel.org/20260513011920.119183-1-sj@kernel.org [1] Fixes: `9277d0367b` ("mm/damon/sysfs-schemes: implement scheme region directory") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.2.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:12 -07:00
Dev Jain	3f8968e9cb	mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one Initialize nr_pages to 1 at the start of each loop iteration, like folio_referenced_one() does. Without this, nr_pages computed by a previous folio_unmap_pte_batch() call can be reused on a later iteration that does not run folio_unmap_pte_batch() again. mmap a 64K large folio with MAP_ANONYMOUS \| MAP_DROPPABLE, then call madvise(MADV_FREE), then make the last page device-exclusive via HMM_DMIRROR_EXCLUSIVE. Trigger node reclaim through sysfs. Now, in try_to_unmap_one(), we will first clear the first 15 out of 16 entries mapping the lazyfree folio. This will set nr_pages to 15. In the next pvmw walk, this nr_pages gets reused on a device-exclusive pte, thus potentially corrupting folio refcount/mapcount. At the moment, I have a userspace program which can make the kernel spit out a trace, but the blow up is in folio_referenced_one(), because there are existing bugs in the interaction between device-private and rmap (which too I am investigating). I did a one liner kernel change to avoid going into folio_referenced_one(), and the kernel blows up at folio_remove_rmap_ptes in try_to_unmap_one which is what I wanted. Note that the bug is there not since file folio batching but lazyfree folio batching, since device-exclusive only works for anonymous folios. Userspace visible effect is simply kernel crashing somewhere due to refcount/mapcount corruption. Link: https://lore.kernel.org/20260518063656.3721056-1-dev.jain@arm.com Fixes: `354dffd295` ("mm: support batched unmap for lazyfree large folios during reclamation") Signed-off-by: Dev Jain <dev.jain@arm.com> Acked-by: Barry Song <baohua@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Harry Yoo <harry@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:12 -07:00
Richard Chang	bf62f69574	zram: fix use-after-free in zram_writeback_endio A crash was observed in zram_writeback_endio due to a NULL pointer dereference in wake_up. The root cause is a race condition between the bio completion handler (zram_writeback_endio) and the writeback task. In zram_writeback_endio, wake_up() is called on &wb_ctl->done_wait after releasing wb_ctl->done_lock. This creates a race window where the writeback task can see num_inflight become 0, return, and free wb_ctl before zram_writeback_endio calls wake_up(). CPU 0 (zram_writeback_endio) CPU 1 (writeback_store) ============================ ============================ zram_writeback_slots zram_submit_wb_request zram_submit_wb_request wait_event(wb_ctl->done_wait) spin_lock(&wb_ctl->done_lock); list_add(&req->entry, &wb_ctl->done_reqs); spin_unlock(&wb_ctl->done_lock); wake_up(&wb_ctl->done_wait); zram_complete_done_reqs spin_lock(&wb_ctl->done_lock); list_add(&req->entry, &wb_ctl->done_reqs); spin_unlock(&wb_ctl->done_lock); while (num_inflight) > 0) spin_lock(&wb_ctl->done_lock); list_del(&req->entry); spin_unlock(&wb_ctl->done_lock); // num_inflight becomes 0 atomic_dec(num_inflight); // Leave zram_writeback_slots // Free wb_ctl release_wb_ctl(wb_ctl); // UAF crash! wake_up(&wb_ctl->done_wait); This patch fixes this race by using RCU. By protecting wb_ctl with rcu_read_lock() in zram_writeback_endio and using kfree_rcu() to free it, we ensure that wb_ctl remains valid during the execution of zram_writeback_endio. Link: https://lore.kernel.org/20260512074918.2606208-1-richardycc@google.com Fixes: `f405066a1f` ("zram: introduce writeback bio batching") Signed-off-by: Richard Chang <richardycc@google.com> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Suggested-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Martin Liu <liumartin@google.com> Cc: wang wei <a929244872@163.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:11 -07:00
Pratyush Yadav (Google)	3b041514cb	memfd: deny writeable mappings when implying SEAL_WRITE When SEAL_EXEC is added, SEAL_WRITE is implied to make W^X. But the implied seal is set after the check that makes sure the memfd can not have any writable mappings. This means one can use SEAL_EXEC to apply SEAL_WRITE while having writeable mappings. This breaks the contract that SEAL_WRITE provides and can be used by an attacker to pass a memfd that appears to be write sealed but can still be modified arbitrarily. Fix this by adding the implied seals before the call for mapping_deny_writable() is done. Link: https://lore.kernel.org/20260505133922.797635-1-pratyush@kernel.org Fixes: `c4f75bc8bd` ("mm/memfd: add write seals when apply SEAL_EXEC to executable memfd") Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Jeff Xu <jeffxu@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kees Cook <kees@kernel.org> Cc: "David Hildenbrand (Arm)" <david@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:11 -07:00
Linpu Yu	fa0b9b2b7a	ipc: limit next_id allocation to the valid ID range The checkpoint/restore sysctl path can request the next SysV IPC id through ids->next_id. ipc_idr_alloc() currently forwards that request to idr_alloc() with an open-ended upper bound. If the valid tail of the SysV IPC id space is full, the allocation can spill beyond ipc_mni. The returned SysV IPC id still uses the normal index encoding, so later lookup and removal can target the wrong slot. This leaves the real IDR entry behind and breaks the IDR state for the object. The bug is in ipc_idr_alloc() in the checkpoint/restore path. 1. ids->next_id is passed to: idr_alloc(&ids->ipcs_idr, new, ipcid_to_idx(next_id), 0, ...) 2. The zero upper bound makes the allocation effectively open-ended. Once the valid SysV IPC tail is occupied, idr_alloc() can spill past ipc_mni and allocate an entry beyond the valid IPC id range. 3. The new object id is still encoded with the narrower SysV IPC index width: new->id = (new->seq << ipcmni_seq_shift()) + idx 4. Later removal goes through ipc_rmid(), which uses: ipcid_to_idx(ipcp->id) That truncates the real IDR index. An object actually stored at a high index can then be removed as if it lived at a low in-range index. 5. For shared memory, shm_destroy() frees the current object anyway, but the real high IDR slot is left behind as a dangling pointer. 6. A subsequent walk of /proc/sysvipc/shm reaches the stale IDR entry and dereferences freed memory. Prevent this by bounding the requested allocation to ipc_mni so the checkpoint/restore path fails once the valid range is exhausted. Link: https://lore.kernel.org/cover.1778336914.git.linpu5433@gmail.com Link: https://lore.kernel.org/2eebe949bfa7d1f6e13b5be6a92c64c850ce9d45.1778336914.git.linpu5433@gmail.com Fixes: `03f5956680` ("ipc: add sysctl to specify desired next object id") Signed-off-by: Linpu Yu <linpu5433@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Cc: Kees Cook <kees@kernel.org> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:11 -07:00
Lorenzo Stoakes	83f9efcce9	Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare" This reverts commit `ea52cb24cd` ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare") with conflict resolution to account for changes in commit `ea52cb24cd` ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare"). The patch incorrectly handled hugetlb VMA lock allocation at the mmap_prepare stage, where a failed allocation occurring after mmap_prepare is called might result in the lock leaking. There is no risk of a merge causing a similar issues, as VMA_DONTEXPAND_BIT is set for hugetlb mappings. As a first step in addressing this issue, simply revert the change so we can rework how we do this having corrected the underlying issues. We maintain the VMA flags changes as best we can, accounting for the fact that we were working with a VMA descriptor previously and propagating like-for-like changes for this. Note that we invoke vma_set_flags() and do not call vma_start_write() as vm_flags_set() does. This is OK as it's being done in an .mmap hook where the VMA is not yet linked into the tree so nobody else can be accessing it. Link: https://lore.kernel.org/20260512160643.266960-1-ljs@kernel.org Fixes: `ea52cb24cd` ("mm/hugetlbfs: update hugetlbfs to use mmap_prepare") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reported-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Closes: https://lore.kernel.org/linux-mm/20260425070700.562229-1-25181214217@stu.xidian.edu.cn/ Acked-by: Muchun Song <muchun.song@linux.dev> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:11 -07:00
Ian Ray	83ec6eeb74	MAINTAINERS: .mailmap: update after GEHC spin-off Update my email address from @ge.com to @gehealthcare.com after GE HealthCare was spun-off from GE. Link: https://lore.kernel.org/20260506063335.3-1-ian.ray@gehealthcare.com Signed-off-by: Ian Ray <ian.ray@gehealthcare.com> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Cc: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-21 19:06:10 -07:00
Marco Elver	e90ef85ada	nios2: Implement _THIS_IP_ using inline asm Both GCC [1] and Clang [2] consider the generic version of _THIS_IP_ to be broken: #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) In particular, the address of a label is only expected to be used with a computed goto. While the generic version more or less works today, it is known to be brittle and may break with current and future optimizations. For example, Clang -O2 always returns 1 when this function is inlined: static inline unsigned long get_ip(void) { return ({ __label__ __here; __here: (unsigned long)&&__here; }); } Fix it by overriding _THIS_IP_ in <asm/linkage.h> (which is included by <linux/instruction_pointer.h>) using an architecture-specific inline asm version. Additionally, avoiding taking the address of a label prevents compilers from emitting spurious indirect branch targets (e.g. ENDBR or BTI) under control-flow integrity schemes. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120071 [1] Link: https://github.com/llvm/llvm-project/issues/138272 [2] Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>	2026-05-21 18:50:46 -05:00
Simon Schuster	7734b168ca	MAINTAINERS: arch/nios2: Add Simon Schuster as co-maintainer Add Simon Schuster as a co-maintainer for the nios2 architecture and mark it as supported. Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>	2026-05-21 18:48:24 -05:00
ChenXiaoSong	4ec9c8e023	smb/server: promote S_DEL_ON_CLS to S_DEL_PENDING when close Reproducer: 1. server: systemctl start ksmbd 2. client: mount -t cifs //${server_ip}/export /mnt 3. client: C program: openat(AT_FDCWD, "/mnt", O_RDWR \| O_TMPFILE, 0600) Do not treat `FILE_DELETE_ON_CLOSE_LE` as delete pending while files remain open. This patch fixes xfstests generic/004. Cc: stable@vger.kernel.org Link: https://chenxiaosong.com/en/smb-xfstests-generic-004.html Co-developed-by: Huiwen He <hehuiwen@kylinos.cn> Signed-off-by: Huiwen He <hehuiwen@kylinos.cn> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Tested-by: Steve French <stfrench@microsoft.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-05-21 18:20:24 -05:00

... 4 5 6 7 8 ...

1446513 Commits