linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-04 20:46:48 +02:00

Author	SHA1	Message	Date
Breno Leitao	a7488f089b	workqueue: Release PENDING in __queue_work() drain/destroy reject path The caller of __queue_work() owns WORK_STRUCT_PENDING, won via test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The state machine documented above __queue_work() requires that owner to either hand the token to a pwq (insert_work() -> set_work_pwq()), hand it to a timer, or release it via set_work_pool_and_clear_pending(). try_to_grab_pending() relies on this: when it observes "PENDING && off-queue" it busy-loops, trusting the current owner to make progress. The (__WQ_DESTROYING \| __WQ_DRAINING) early-return path violates that contract. It WARN_ONCE()s and bare-returns, leaving work->data with PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty. The path is reachable without explicit API abuse: queue_delayed_work() arms a timer with PENDING set; if drain_workqueue() runs while the timer is still pending, delayed_work_timer_fn() -> __queue_work() in softirq context hits the WARN, current is not a wq worker so is_chained_work() is false, and the work is silently dropped with PENDING leaked. Mirror what clear_pending_if_disabled() already does on its analogous reject path: unpack the off-queue data and call set_work_pool_and_clear_pending() to release the token before returning. I was able to reproduce this by queueing several slow works on a max_active=1 wq, arm a delayed_work whose timer fires while drain_workqueue() is blocked, then call cancel_delayed_work_sync(). Without this patch the cancel livelocks at 100% CPU; with it the cancel returns immediately. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-08 07:59:27 -10:00
Linus Torvalds	81d6f78075	seven client fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmn+DfMACgkQiiy9cAdy T1FlwQv/bOScs7kYk5M5cCUf8kvA3kHBBmXcewSXYVzEaspJFd49IOrbejh07UXR KmfJ4zgX3usbFNzXkmm8AKrax9ZJd8vmdey7/+ELxuBoYiyyDTATZ/VG+yDae0Cu zU7pZNv99LppFkkxQM+7hpBtbazRUTZu3VYprFZ+UCWPupKZs/fQm9huBzJPf2bn dMkojp/AAOGmhuRok3DWA1fu/BvFgslXPk4QohIfWxd0zRGVXQLRkOXvVI34bhR2 IOLH1PohkFsajqWClEyikCaFjhW8ZpmmHVl2t+NZer/wYoq2Mp2Ad9NkILmfrWR1 w4NSxh73emsllZpDkXYULlM9voxnjIXpvg/wPP+DA4yhuThwluJyCgsEkoInMw6X mLM8JiD4EMQhxKiZwtrO4gd/TshSBhm01ly0a6VwvV2p1mvW2cJH2VAZyoC+xN8d CabEmVnJuiwh4SPwKwsJN3bePwvjp30j1oVRspQthTQRrunyY4hkXr3z2Hpo6TNb tMudF/Qh =A8aI -----END PGP SIGNATURE----- Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Fix for two ACL issues (security fix to validate dacloffset better and chmod fix) - Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for symlinks) - Two Kerberos fixes including an important one when AES-256 encryption chosen - Fix open_cached_dir problem when directory leases disabled * tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: validate dacloffset before building DACL pointers smb/client: fix out-of-bounds read in smb2_compound_op() smb/client: fix out-of-bounds read in symlink_data() smb: client: Zero-pad short GSS session keys per MS-SMB2 smb: client: Use FullSessionKey for AES-256 encryption key derivation smb: client: use kzalloc to zero-initialize security descriptor buffer cifs: abort open_cached_dir if we don't request leases	2026-05-08 10:24:35 -07:00
Linus Torvalds	8bb44576c5	spi: Fixes for v7.1 There's two main serieses here, fixing issues that came up in the Microchip QSPI and Freescale i.MX drivers. Both of those could result in some quite noticable issues if they were encountered in production. We also have one minor documentation fix in the ch341 driver. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmn96gUACgkQJNaLcl1U h9Aivgf9HIKL9pbWczYHRKRMNsEZd2+0gVGoX/PS38dSmBxp6amtVZD4T1/z+uny ZcILZvRwpzzKyXXq4hnkDZ+WCFu+ho0iizf0C5dHQVLVmT/npAzmTSJ4tqc1NKif IOgecK1tC0/VT+bS+Q0u5bhGWn2xVQUstCQ0yyvngTiAfLj+kHl/QaYomEWrxoNz +mkYaHslHNyGdSXsFknftN2L9J8E5C9JYh2i/FByvcthfOL60xk+eCIRRY0414gc xXHhqBN5NIUpLKsm1viOytxMKFqhN5cFbY6ANWUhNVy1C51hzaovkNNxjk+M+zwP VnL9YVZ0M9u+DeeEaCsQrtkFOMi4zw== =HTaz -----END PGP SIGNATURE----- Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "There's two main series here, fixing issues that came up in the Microchip QSPI and Freescale i.MX drivers. Both of those could result in some quite noticable issues if they were encountered in production. We also have one minor documentation fix in the ch341 driver" * tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: ch341: correct company name in MODULE_DESCRIPTION spi: microchip-core-qspi: remove some inline markings spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations spi: microchip-core-qspi: control built-in cs manually spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer() spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare() spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()	2026-05-08 10:14:51 -07:00
Linus Torvalds	4bdbce450f	regulator: Fix for v7.1 A straightforward fix for an incorrect description of one of the regulators on the Qualcomm PMH0101. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmn96QMACgkQJNaLcl1U h9DnpQf8CJAq7WpzlNXX4AcsToidElq6PnbK34iTxsrMSnf5CvGxqaL72DIx6GTW AlL8T4HTOJUjVdP5W/atft1XQu7N5MWo3EoblaY7Soi5PPfTSTVlElE1bw1tupi/ GJxShzjUzPCKUL7vPRj6oEsz9iIoJL337uTd4R8ZVTTrra4LA/fQP46DXR5SnjZj PC512B60aNnLa+siz9bgliyVdToUyYyUxrgexDHBjd7unNGXt5z9NMOEph9fbqmQ 7BvEKbh8EDOBJLPBDNb6RTQjqpzvLTJ/5sWiEbqvfNfntDh80+F0TDsDzya6F/zL kF9kp1uDtUZzzm1ZvOBS0UPPz8szyQ== =nWMq -----END PGP SIGNATURE----- Merge tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "A straightforward fix for an incorrect description of one of the regulators on the Qualcomm PMH0101" * tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: qcom-rpmh: Fix index for pmh0101 ldo16	2026-05-08 10:07:59 -07:00
Kuniyuki Iwashima	481c226528	bpf: tcp: Fix type confusion in bpf_tcp_sock(). bpf_tcp_sock() only checks if sk->sk_protocol is IPPROTO_TCP, but RAW socket can bypass it: socket(AF_INET, SOCK_RAW, IPPROTO_TCP) Calling bpf_setsockopt() in SOCKOPT prog triggers out-of-bounds access to another slab object. [0] Let's use sk_is_tcp(). [0]: BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt (net/core/filter.c:5519) Read of size 8 at addr ffff88801083d760 by task test_progs/1259 CPU: 1 UID: 0 PID: 1259 Comm: test_progs Tainted: G OE 7.0.0-11175-gb5c111f4967b #1 PREEMPT(full) Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120) print_report (mm/kasan/report.c:378 mm/kasan/report.c:482) kasan_report (mm/kasan/report.c:595) sol_tcp_sockopt (net/core/filter.c:5519) __bpf_getsockopt (net/core/filter.c:5633) bpf_sk_getsockopt (net/core/filter.c:5654) bpf_prog_629ba00a1601e9f2__setsockopt+0x86/0x22c __cgroup_bpf_run_filter_setsockopt (./include/linux/bpf.h:1402 ./include/linux/filter.h:722 ./include/linux/filter.h:729 kernel/bpf/cgroup.c:81 kernel/bpf/cgroup.c:2026) do_sock_setsockopt (net/socket.c:2363) __x64_sys_setsockopt (net/socket.c:2406) do_syscall_64 (arch/x86/entry/syscall_64.c:63) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) RIP: 0033:0x7f85f82fe7de Code: 55 48 63 c9 48 63 ff 45 89 c9 48 89 e5 48 83 ec 08 6a 2c e8 34 69 f7 ff c9 c3 66 90 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 e1 RSP: 002b:00007ffe59dcecd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f85f82fe7de RDX: 000000000000001c RSI: 0000000000000006 RDI: 000000000000000d RBP: 00007ffe59dcef20 R08: 000000000000003c R09: 0000000000000000 R10: 00007ffe59dcef00 R11: 0000000000000202 R12: 00007ffe59dcf268 R13: 0000000000000003 R14: 00007f85f9da5000 R15: 000055b2f3201400 </TASK> The buggy address belongs to the object at ffff88801083d280 which belongs to the cache RAW of size 1792 The buggy address is located 1248 bytes inside of allocated 1792-byte region [ffff88801083d280, ffff88801083d980) Fixes: `655a51e536` ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com	2026-05-08 09:55:32 -07:00
Linus Torvalds	51d24842ac	drm fixes for 7.1-rc3 core: - fix race condition in handle change ioctl fb-helper: - fix clipping rust: - fix unsound initialization - fix GEM state cleanup - fix wrong ARef import ttm: - update GPU MM stats on pool shrinking i915: - Re-enable ccs modifiers on dg2 nova: - fix mailing list xe: - Add NULL check for media_gt in intel_hdcp_gsc_check_status - Fix EAGAIN sign in pf_migration_consume - Fix MMIO access using PF view instead of VF view during migration - Exclude indirect ring state page from ADS engine state size amdgpu: - GFX9 fixes - Hawaii SMU fixes - SDMA4 fix - GART fix - Userq fixes amdkfd: - GPUVM TLB flush fix - Hotplug fix radeon: - Hawaii SMU fixes bochs: - fix managed cleanup bridge: - tda998x: fix sparse warnings on type correctness etnaviv: - schedule armed jobs exynos: - managed bridge cleanup ivpu: - disallow reexport of GEM buffer objects noveau: - revert support for GA100 panel: - boe-tv101wum-nl16: use correct MIPI_DSI mode - feyjang-fy07024di26a30d: fix error reporting - himax-hx83102: use correct MIPI_DSI mode - himax-hx83121a: fix error checks - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER qaic: - fix RAS message handling qxl: - clean up polling sti: - managed bridge cleanup -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmn9mggACgkQDHTzWXnE hr5UkQ//Yb4HiE1jN3Ji0ulR+VlYuyXQj3rrIbZMiGafL7xK32qM0hWSwXUGIZI2 Hbe6fgQGc9yDt9svd56xVjCWgU72gwxVcd2x9nAj3UbzMreBol3A0s19TLmKyuU2 7O5HSQ4dcLJc42rlJ18vqSnsErynhKTjnFqF0y3Mz12DOqakDSK0a0FxNit5qJlu 3t6O5bJohIRlwTVRkgM3YxtY0Mx8sg0XHz2JhlzerwfiRe3eXIOX5GmIWODjfGPy sjGICWTaO/zCTBbgVtGKfIKJkhbCvm1LkZJUXQypcT0EpGaHMFkOxj3cyG88QwnQ 8DQfQKuWO5H7jMle+K6nuJNiEYK7TJ7NUr7IzL5ksqGtN/Vo414BFnqbx0nY57Xb kaYUth/Zka4eHi9wRueKXb5i/Vh2NJUbj+zdEawW0gbgIexMl5CD14LrxaNRL0xP a/Tf4FbG81Y/0+3/ohjE/n149UgcWC2Td/+cLvHxAfp3yFyGMQAo0sKUxOC0E4wW IPRfVHZIkeZgdMBUI0KNg83aABLZTtXNDKzjNTa8pCrrHbVlpg7egQaLUGgEtlK2 fuy9gTMVISAAu309acrGcE26o6Dv9MpkLab74W98ynfNLB+WSifbA9xhD4MIeg2h k5OS2r7g0zoLSOhDhG0tzCvZT7Ky6bFkapWVD3ooCMgSQya13ZE= =t9m/ -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Weekly fixes, lots of them but all pretty small, amdgpu and xe are the usual but then a large amount of fixes all over. core: - fix race condition in handle change ioctl fb-helper: - fix clipping rust: - fix unsound initialization - fix GEM state cleanup - fix wrong ARef import ttm: - update GPU MM stats on pool shrinking i915: - Re-enable ccs modifiers on dg2 nova: - fix mailing list xe: - Add NULL check for media_gt in intel_hdcp_gsc_check_status - Fix EAGAIN sign in pf_migration_consume - Fix MMIO access using PF view instead of VF view during migration - Exclude indirect ring state page from ADS engine state size amdgpu: - GFX9 fixes - Hawaii SMU fixes - SDMA4 fix - GART fix - Userq fixes amdkfd: - GPUVM TLB flush fix - Hotplug fix radeon: - Hawaii SMU fixes bochs: - fix managed cleanup bridge: - tda998x: fix sparse warnings on type correctness etnaviv: - schedule armed jobs exynos: - managed bridge cleanup ivpu: - disallow reexport of GEM buffer objects noveau: - revert support for GA100 panel: - boe-tv101wum-nl16: use correct MIPI_DSI mode - feyjang-fy07024di26a30d: fix error reporting - himax-hx83102: use correct MIPI_DSI mode - himax-hx83121a: fix error checks - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER qaic: - fix RAS message handling qxl: - clean up polling sti: - managed bridge cleanup * tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits) drm: Set old handle to NULL before prime swap in change_handle drm/bochs: Drop manual put on probe error path drm/xe/guc: Exclude indirect ring state page from ADS engine state size drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration drm/xe/pf: Fix EAGAIN sign in pf_migration_consume() drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status() drm/exynos: remove bridge when component_add fails drm/amdgpu: nuke amdgpu_userq_fence_slab v2 drm/amdgpu/userq: fix access to stale wptr mapping drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count drm/amdgpu: zero-initialize GART table on allocation drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission drm/radeon: add missing revision check for CI drm/amdgpu/pm: align Hawaii mclk workaround with radeon drm/amdgpu/pm: add missing revision check for CI drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ drm/amdkfd: Make all TLB-flushes heavy-weight drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds drm/panel: feiyang-fy07024di26a30d: return display-on error ...	2026-05-08 08:23:06 -07:00
Thomas Hellström	b2ed01e7ad	drm/ttm: Fix ttm_bo_swapout() infinite LRU walk on swapout failure When ttm_tt_swapout() fails, the current code calls ttm_resource_add_bulk_move() followed by ttm_resource_move_to_lru_tail() to restore the resource's bulk_move membership. However, ttm_resource_move_to_lru_tail() places the resource at the tail of the LRU list which, relative to the walk cursor's hitch node (placed immediately after the resource when it was yielded), puts the resource in front of the the hitch. The next list_for_each_entry_continue() from the hitch finds the same resource again, causing an infinite loop. Fix by deferring del_bulk_move to the success path only. On the success path, TTM_TT_FLAG_SWAPPED has just been set by ttm_tt_swapout() but the resource is still tracked in the bulk_move range, so ttm_resource_del_bulk_move()'s !ttm_resource_unevictable() guard would incorrectly skip the removal. Introduce ttm_resource_del_bulk_move_unevictable() which bypasses that guard. Reported-by: Jatin Kataria <jkataria@netflix.com> Fixes: `fc5d96670e` ("drm/ttm: Move swapped objects off the manager's LRU list") Cc: Christian König <christian.koenig@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <dri-devel@lists.freedesktop.org> Cc: <stable@vger.kernel.org> # v6.13+ Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Boqun Feng <boqun@kernel.org> Link: https://patch.msgid.link/20260428094442.16985-1-thomas.hellstrom@linux.intel.com	2026-05-08 17:19:44 +02:00
Greg Kroah-Hartman	4fd44d47e8	USB serial device ids for 7.1-rc3 Here are some new modem device ids. This one has been in linux-next with no reported issues. -----BEGIN PGP SIGNATURE----- iJEEABYKADkWIQQHbPq+cpGvN/peuzMLxc3C7H1lCAUCaf2E2xsUgAAAAAAEAA5t YW51MiwyLjUrMS4xMiwyLDIACgkQC8XNwux9ZQhIfwD/fGmlfl2wSwq0DzyBlxn4 /4Lj3XCC4ped7qO6i0bKzYUA/2Mv3PONpNxTJ0hmqreXLYCJsP1b9mLIgCTx0aJe nMsC =6gMv -----END PGP SIGNATURE----- Merge tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB serial device ids for 7.1-rc3 Here are some new modem device ids. This one has been in linux-next with no reported issues. * tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: option: add Telit Cinterion LE910Cx compositions	2026-05-08 17:18:43 +02:00
Linus Torvalds	fa7431eb99	IOMMU Fixes for Linux 7.1-rc2: Including: - Core: Cache-flushing fix for non-x86 platforms. - AMD-Vi: Security fix when SEV-SNP is enabled. - AMD-Vi: Operator precedence fix in DTE setting. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmn9hXwACgkQK/BELZcB GuPIFRAAvnJKSnMifUBsW8FaXJmoqN1gvjOXAMGqO5ZGmR+A7in0S6klnWzV7qIa CDOQqZ7n0hGEqnLAEwf2TSDv/t/qNRA4aIbJCrKYYYxVGaACRDgS56EAxvOP0aBS qAMK04zlf5vZiv3dJilqvssEw3Y5EyRoOQCIojTe6CiO+Wt8wAmzri3MFMM9yGts uJy4fpbCzU1M/glvR29I+I/3AQPHJBbZWswwbeEj6sJrGnDh5PeC1AVP7jbwXoNa 60hp4sby+8wWTupGzbwLI1zxH1hxpbidiDywmWHD2vIzA1A+ESzexLv93S9Llj+C qDxAEowk+jDxEJRMisyIiHLiRX+gKxVVaywEOOcQ1DsP97q2EfwcHdCImCjG+RET E/pLWt7eiaI85bo1T8eWlkMTisapVUxVchicFsBI1oAHVCdC7cBJEIFMIBxrLF4S 423lJBRwKQrY5urNTRWB8eJHo5vBuT3G0VsnQ6DfVunqT9u24KEEjSx77um6eGvd gK6Wp/ti7pimXGYKMVzPe6hnLzAXiiguLE1ejEPBqFBEu0hlbIB66SxKq+aI+e6O VE2NxtDuLtY8yI1SCEmv0SAN+/k0bdLIDfzM0H+Lr/f9plpPU9LDydYM4SFg6uIl s4zi3gCMQngyRmOdmDw9w5g3CCV7oHTRGCRH9y8qCwz8XNz/JjM= =nn8a -----END PGP SIGNATURE----- Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: "Core: - Cache-flushing fix for non-x86 platforms AMD-Vi: - Security fix when SEV-SNP is enabled - Operator precedence fix in DTE setting" * tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/amd: Fix precedence order in set_dte_passthrough() iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86 iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19 iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19	2026-05-08 08:16:07 -07:00
Zqiang	ab28a0673d	sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work For built with PREEMPT_RT kernels, the scx_disable_irq_workfn() is called from per-cpu irq_work kthreads context, this means that when call the scx_dump_state() in the scx_disable_irq_workfn() to output current->comm/pid, it always output current irq_work kthread's comm/pid. this commit therefore use the IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work to make scx_disable_irq_workfn() is called from hardirq context. Fixes: `f4a6c506d1` ("sched_ext: Always bounce scx_disable() through irq_work") Signed-off-by: Zqiang <qiang.zhang@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-08 05:11:53 -10:00
Mark Rutland	411c1cf430	arm64/entry: Fix arm64-specific rseq brokenness Mathias Stearn reports that since v6.19, there are two big issues affecting rseq: (1) On arm64 specifically, rseq critical sections aren't aborted when they should be. (2) The 'cpu_id_start' field is no longer written by the kernel in all cases it used to be, including some cases where TCMalloc depends on the kernel clobbering the field. This patch fixes issue #1. This patch DOES NOT fix issue #2, which will need to be addressed by other patches. The arm64-specific brokenness is a result of commits: `2fc0e4b412` ("rseq: Record interrupt from user space") `39a167560a` ("rseq: Optimize event setting") The first commit failed to add a call to rseq_note_user_irq_entry() on arm64. Thus arm64 never sets rseq_event::user_irq to record that it may be necessary to abort an active rseq critical section upon return to userspace. On its own, this commit had no functional impact as the value of rseq_event::user_irq was not consumed. The second commit relied upon rseq_event::user_irq to determine whether or not to bother to perform rseq work when returning to userspace. As rseq_event::user_irq wasn't set on arm64, this work would be skipped, and consequently an active rseq critical section would not be aborted. Fix this by giving arm64 syscall-specific entry/exit paths, and performing the relevant logic in syscall and non-syscall paths, including calling rseq_note_user_irq_entry() for non-syscall entry. Currently arm64 cannot use syscall_enter_from_user_mode(), syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to ordering constraints with exception masking, and risk of ABI breakage for syscall tracing/audit/etc. For the moment the entry/exit logic is left as arm64-specific, directly using enter_from_user_mode() and exit_to_user_mode(), but mirroring the generic code. I intend to follow up with refactoring/cleanup, as we did for kernel mode entry paths in commit: `041aa7a853` ("entry: Split preemption from irqentry_exit_to_kernel_mode()") ... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly. Fixes: `39a167560a` ("rseq: Optimize event setting") Reported-by: Mathias Stearn <mathias@mongodb.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/regressions/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com/ Link: https://patch.msgid.link/20260508142023.3268622-1-mark.rutland@arm.com	2026-05-08 17:00:44 +02:00
David Woodhouse	786a45757d	x86/kexec: Push kjump return address even for non-kjump kexec The version of purgatory code shipped by kexec-tools attempts to look above the top of its stack to find a return address for a kjump, even in a non-kjump kexec. After the commit in Fixes: the word above the stack might not be there, leading to a fault (which is at least now caught by my exception-handling code in kexec). That commit fixed things for the actual kjump path, but no longer "gratuitously" pushes the unused return address to the stack in the non-kjump path. Put that back in the non-kjump path, to prevent purgatory from crashing when trying to access it. Fixes: `2cacf7f23a` ("x86/kexec: Fix stack and handling of re-entry point for ::preserve_context") Reported-by: Rohan Kakulawaram <rohanka@google.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Rohan Kakulawaram <rohanka@google.com> Cc: <stable@kernel.org> Link: https://patch.msgid.link/32d627134143ffd957891cb697138e839c623211.camel@infradead.org	2026-05-08 17:00:12 +02:00
DaeMyung Kang	3086c49a07	ntfs: avoid leaking uninitialised bytes in new security descriptors ntfs_sd_add_everyone() builds the on-disk security descriptor for a newly created file by kmalloc()'ing a buffer and then partially filling it in: sd = kmalloc(sd_len, GFP_NOFS); ... sd->revision = 1; sd->control = SE_DACL_PRESENT \| SE_SELF_RELATIVE; ... The buffer is then handed to ntfs_attr_add() and persisted as the SECURITY_DESCRIPTOR attribute of the new MFT record. The descriptor covers a relative security descriptor header, two SIDs (owner and group), an ACL header, and a single ACE, but several fields inside those structures are never written before the buffer is committed to disk: - struct security_descriptor_relative @alignment (1 byte) @sacl (4 bytes; SE_SACL_PRESENT is not set but the offset still reaches disk) - struct ntfs_sid (3 instances: owner, group, ACE.sid) identifier_authority.value[0..4] (5 bytes per SID, 15 total - only value[5] is set) - struct ntfs_acl @alignment1 (1 byte) @alignment2 (2 bytes) That is 23 bytes of uninitialised slab memory persisted to disk for every new file or directory the legacy ntfs driver creates. The "+ 4" trailing accounting in sd_len holds ace->sid.sub_authority[0], which the existing code does explicitly write to zero, so it is not part of the leak. Anything later able to read the SECURITY_DESCRIPTOR attribute - the same NTFS volume mounted on Windows or by another NTFS reader, an offline forensics tool, an unprivileged user that ends up with read access to the volume - can recover those bytes. The leak persists for the lifetime of the file on disk, not just the lifetime of the kernel that wrote it. Switch the allocation to kzalloc() so every byte the on-disk descriptor covers is zero before the explicit initialisations run. While there, replace the bare "return -1" allocation-failure path with a proper -ENOMEM so the error reaches userspace as a meaningful errno instead of an unrelated -EPERM. Found by inspection while auditing fs/ntfs new-inode paths. Fixes: `af0db57d42` ("ntfs: update inode operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:51:13 +09:00
DaeMyung Kang	79629b748a	ntfs: fix out-of-bounds write in ntfs_index_walk_down() ntfs_index_walk_down() used to update the index traversal depth directly before writing parent_pos[] and parent_vcn[]. A malformed directory index with too many child-node levels can therefore advance pindex past MAX_PARENT_VCN and write past the fixed arrays in struct ntfs_index_context, corrupting context state used by later index traversal. Use ntfs_icx_parent_inc() for walk-down transitions so the existing depth limit is enforced before the arrays are updated. Make the helper check the limit before incrementing pindex so failed callers do not leave the context at an out-of-range depth. This is reachable by iterating a crafted NTFS directory after the volume has been mounted, including read-only mounts. The reproducer uses getdents64() on an index root that points to an excessively deep chain of child index blocks. A crafted directory index with a chain of child-node entries reproduced UBSAN array-index-out-of-bounds reports in ntfs_index_walk_down() and subsequent KASAN reports in ntfs_index_walk_up(). With this change, the same image is rejected with "Index is over 32 level deep" and no KASAN or UBSAN report is emitted. Fixes: `0a8ac0c1fa` ("ntfs: update directory operations") Suggested-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:51:10 +09:00
DaeMyung Kang	11816f7131	ntfs: fix out-of-bounds write in ntfs_rl_collapse_range() merge path ntfs_rl_collapse_range() merges the run on the left of the collapsed region with the run on its right when they are contiguous. The contiguous check chooses a clamped index when @new_1st_cnt is 0: i = new_1st_cnt == 0 ? 1 : new_1st_cnt; if (ntfs_rle_lcn_contiguous(&new_rl[i - 1], &new_rl[i])) { but the merge itself uses the unclamped value: s_rl = &new_rl[new_1st_cnt - 1]; s_rl->length += s_rl[1].length; When @new_1st_cnt is 0 this computes &new_rl[-1] and writes 8 bytes before the kvcalloc() runlist buffer. The path is reachable through fallocate(FALLOC_FL_COLLAPSE_RANGE) starting at vcn 0 against an attribute whose first run after the collapsed region and the following run are holes. In that case ntfs_rle_lcn_contiguous() returns true because both checked entries are LCN_HOLE, so the merge path is entered with @new_1st_cnt still 0. Such consecutive holes do not occur on a well-formed runlist (NTFS keeps runlists coalesced in memory), so this OOB path is only reachable from a crafted volume. A normal runlist has no element to the left of vcn 0, so the left/right merge is not valid when @new_1st_cnt is 0. Require @new_1st_cnt to be positive before checking or performing the merge. This skips the merge entirely in that case instead of clamping the merge target. The out-of-bounds write can corrupt an adjacent slab object. On a non-KASAN kernel, it is reachable after a crafted NTFS volume has been mounted read-write with the legacy fs/ntfs driver, by a local user that has write access to the crafted file. Fixes: `11ccc9107d` ("ntfs: update runlist handling and cluster allocator") Suggested-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:51:07 +09:00
Namjae Jeon	c37d9e68b6	ntfs: fix variable dereferenced before check ni in ntfs_attr_open() Smatch warnings: ntfs_attr_open() warn: variable dereferenced before check 'ni' Moves the ntfs_debug() call after the NULL pointer checks to ensure safe access to the structure members. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:51:05 +09:00
DaeMyung Kang	11f7a6d9d7	ntfs: fix default_upcase refcount underflow and UAF on fs_context teardown ntfs_init_fs_context() allocates a fresh ntfs_volume with vol->upcase left as NULL. ntfs_free_fs_context() unconditionally calls ntfs_volume_free() during fs_context teardown, even when ntfs_fill_super() never ran or already cleaned up. ntfs_volume_free() then executes: mutex_lock(&ntfs_lock); if (vol->upcase == default_upcase) { ntfs_nr_upcase_users--; vol->upcase = NULL; } When the global default_upcase is also NULL (very first mount attempt, or all prior mounts have released the table), the comparison is NULL == NULL, and ntfs_nr_upcase_users is decremented even though this volume never claimed a reference. ntfs_nr_upcase_users is unsigned long, so the decrement wraps to ULONG_MAX. A subsequent successful mount can then free the shared table while the mounted volume still points at it: 1. ntfs_fill_super() does the temporary ntfs_nr_upcase_users++ at the "Generate the global default upcase table if necessary" block. With the prior wraparound this brings the counter back to 0. 2. If the volume's $UpCase matches the default, the match path does ntfs_nr_upcase_users++ and sets vol->upcase = default_upcase. The counter is now 1. 3. On the success path, !--ntfs_nr_upcase_users evaluates true and default_upcase is kvfree()'d while vol->upcase still points at it. Subsequent upcase comparisons through that mount touch freed memory. This was reproduced with KASAN by closing a fresh fsopen("ntfs") context, then mounting an NTFS image whose $UpCase table matches generate_default_upcase(), and finally doing a case-insensitive lookup. KASAN reports the dangling vol->upcase access: BUG: KASAN: use-after-free in ntfs_collate_names+0x3b4/0x420 Read of size 2 at addr ffff888008d40048 by task init/1 ntfs_collate_names+0x3b4/0x420 ntfs_lookup_inode_by_name+0x1921/0x3130 ntfs_lookup+0x193/0xc40 vfs_statx+0xc7/0x190 vfs_fstatat+0x4b/0xa0 __do_sys_newfstatat+0x92/0xf0 The same QEMU reproducer was rerun after this change with KASAN enabled. It reached "reproducer finished", and the log contained no KASAN, use-after-free, Oops, or panic signatures. Guard each comparison with an explicit vol->upcase non-NULL check so a volume that never took a reference cannot decrement the global users counter. Apply the same guard to the other default_upcase release sites so all cleanup paths follow the same ownership rule: only volumes that actually hold a default_upcase reference may drop one. Fixes: `1e9ea7e044` ("Revert "fs: Remove NTFS classic"") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:51:01 +09:00
Hyunchul Lee	de08874bae	ntfs: match ntfs_resident_attr_min_value_length with $AttrDef Update ntfs_resident_attr_min_value_length() to align with $AttrDef. The $VOLUME_NAME is allowed to have the size of 0. The Windows 11 $AttrDef values are as follows: Attribute Name (ID) Size (Min-Max) Flags $STANDARD_INFORMATION (16) 48-72 Resident $ATTRIBUTE_LIST (32) No Limit Non-resident $FILE_NAME (48) 68-578 Resident, Index $OBJECT_ID (64) 0-256 Resident $SECURITY_DESCRIPTOR (80) No Limit Non-resident $VOLUME_NAME (96) 2-256 Resident $VOLUME_INFORMATION (112) 12-12 Resident $DATA (128) No Limit (None) $INDEX_ROOT (144) No Limit Resident $INDEX_ALLOCATION (160) No Limit Non-resident $BITMAP (176) No Limit Non-resident $REPARSE_POINT (192) 0-16384 Non-resident $EA_INFORMATION (208) 8-8 Resident $EA (224) 0-65536 (None) $LOGGED_UTILITY_STREAM (256) 0-65536 Non-resident Reported-by: woot000 <woot000@woot000.com> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:50:59 +09:00
DaeMyung Kang	6c30af0b20	ntfs: avoid use-after-free of index inode in ntfs_inode_sync_filename() ntfs_inode_sync_filename() walks every FILE_NAME attribute and, for each one that points at a different parent, opens the parent index inode with ntfs_iget() and locks index_ni->mrec_lock. All three error branches (NInoBeingDeleted, ntfs_index_ctx_get failure, ntfs_index_lookup failure) drop the parent reference before unlocking: iput(index_vi); mutex_unlock(&index_ni->mrec_lock); continue; index_ni is NTFS_I(index_vi), so the ntfs_inode (and its mrec_lock) is embedded in the inode allocation. If the parent directory is not held outside the icache - no open dentry, recently evicted from dcache, no other concurrent lookup - ntfs_iget() returns with i_count == 1 and our iput() drops the last reference. evict_inode() then runs and destroy_inode() schedules the slab object for RCU free, while mutex_unlock() on the next line is still touching index_ni->mrec_lock. Swap the order so the mutex is dropped while index_vi is still alive, matching the success path at the bottom of the loop which already unlocks before iput(). Reproduced under KASAN with a debug build that forces ntfs_index_ctx_get() to fail when the parent index inode has been opened with i_count == 1. KASAN reports a slab-use-after-free read on the parent's mrec_lock from mutex_unlock() on the writeback worker: BUG: KASAN: slab-use-after-free in __mutex_unlock_slowpath+0xb5/0x970 Read of size 8 at addr ffff8880014b7598 by task kworker/u8:0/12 Workqueue: writeback wb_workfn (flush-253:0) Call Trace: mutex_unlock ntfs_inode_sync_filename __ntfs_write_inode ntfs_write_inode __writeback_single_inode Allocated by task 103: ntfs_alloc_big_inode ntfs_iget ntfs_lookup __x64_sys_mkdir Freed by task 12: ntfs_free_big_inode i_callback rcu_do_batch Last potentially related work creation: call_rcu destroy_inode evict dispose_list evict_inodes ntfs_inode_sync_filename __ntfs_write_inode The buggy address belongs to the object at ffff8880014b7440 which belongs to the cache ntfs_big_inode_cache of size 1800 The freed object is the parent directory inode itself: allocated by mkdir(2) via ntfs_iget(), then released through call_rcu(i_callback) that destroy_inode() scheduled when evict_inodes() ran from inside ntfs_inode_sync_filename(). Re-running the same workload with mutex_unlock() moved before iput() runs cleanly under KASAN. Fixes: `af0db57d42` ("ntfs: update inode operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:50:57 +09:00
DaeMyung Kang	f3c8cd8a63	ntfs: fix copy length in ntfs_bdev_write() for non-page-aligned start This is not a normal data I/O hot path. The single in-tree caller is the $LogFile emptying path used during read-write mount/remount, and the bug only becomes visible on NTFS volumes whose cluster_size is strictly smaller than the kernel's PAGE_SIZE (typically 4 KiB on x86_64). Per Microsoft's format command documentation, NTFS supports allocation unit sizes starting at 512 bytes, so 512 B, 1 KiB and 2 KiB clusters are uncommon but valid on-disk configurations. When cluster_size >= PAGE_SIZE every "start" passed in is page-aligned and the buggy "from != 0" path is never taken. ntfs_bdev_write() splits the write across one or more block-device folios. Inside the loop, "to" is computed as the end byte offset within the current page (0..PAGE_SIZE), and "from" is the start byte offset within the page (reset to 0 from the second iteration onward). The copy length should therefore be "to - from", but the current code uses "to" directly: to = min_t(u32, end - offset, PAGE_SIZE); memcpy_to_folio(folio, from, buf + buf_off, to); buf_off += to; When "from != 0" (i.e. "start" is not page-aligned) memcpy_to_folio() copies "from" extra bytes: - it reads "from" bytes past the source buffer into kernel heap; - it writes "from" bytes past the requested range into the next part of the block-device page (or, if "from + to > PAGE_SIZE", past the folio boundary entirely, which trips the VM_BUG_ON inside memcpy_to_folio() on CONFIG_DEBUG_VM=y kernels). "buf_off" is then advanced by the wrong amount, so every subsequent iteration also reads the source buffer at the wrong offset and writes the wrong content to disk. ntfs_empty_logfile() calls ntfs_bdev_write(sb, empty_buf, NTFS_CLU_TO_B(vol, lcn), vol->cluster_size); with empty_buf sized to vol->cluster_size. On a sub-PAGE_SIZE-cluster volume, any $LogFile run whose LCN is not aligned to PAGE_SIZE / cluster_size reaches the non-page-aligned path. The over-copy can read beyond empty_buf and overwrite the sectors following the requested cluster in the block-device page with unrelated kernel heap contents while $LogFile is being emptied. A userspace reducer of the same arithmetic and copy loop confirms the bug under AddressSanitizer: ASan reports a heap-buffer-overflow read past the source buffer for the buggy length, and the fixed version is ASan-clean. Compute the copy length as "to - from" and advance buf_off by the same amount. Fixes: `5218cd102a` ("ntfs: update misc operations") Link: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/format Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:50:53 +09:00
DaeMyung Kang	563d0d4c2c	ntfs: wait for sync mft writes to complete ntfs_sync_mft_mirror() and write_mft_record_nolock() with @sync set are both documented as synchronous, but neither actually waits for the bio they submit nor inspects bi_status. write_inode() can return success while dirty mft record bytes are still in flight, and bio errors are silently dropped: the volume is not marked with errors and the inode is not redirtied. This breaks fsync()/sync metadata durability. Switch ntfs_sync_mft_mirror() and the @sync path of write_mft_record_nolock() to submit_bio_wait() and propagate the returned error to the caller. Capture ntfs_sync_mft_mirror()'s return value at its call sites in write_mft_record_nolock() so a mirror write failure surfaces too. The @sync parameter only controls the main MFT bio. The !@sync main submission is therefore unchanged and still uses ntfs_bio_end_io() to drop the folio reference taken before submission. The mirror call has always been documented as performing synchronous I/O regardless of @sync, so making it actually block restores the originally intended contract for both @sync and !@sync callers. Note this only fixes the synchronous mirror/main paths reachable from write_mft_record_nolock(). The main MFT write submitted from ntfs_write_mft_block() (the .writepages path) still does not wait for completion or check bi_status; that requires a larger restructuring and is left to a follow-up patch. Fixes: `115380f9a2` ("ntfs: update mft operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:50:51 +09:00
DaeMyung Kang	618c991cdf	ntfs: capture mft mirror sync errors in ntfs_write_mft_block() After ntfs_sync_mft_mirror() became able to return real I/O errors, ntfs_write_mft_block() still discards its return value at the call site inside the per-record loop. A failed $MFTMirr write therefore leaves the volume looking clean from the writeback path even though the on-disk mirror is now stale. Capture the return value and feed it into the function's existing @err variable using the same "first error wins" pattern already used on other failure paths. The error is propagated to the caller and, via the existing tail of the function, sets NVolErrors so umount and chkdsk see the volume as inconsistent. Fixes: `115380f9a2` ("ntfs: update mft operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:50:49 +09:00
DaeMyung Kang	49c12bee2b	ntfs: redirty folio when ntfs_write_mft_block() runs out of memory ntfs_write_mft_block() is called by writeback_iter() with the folio locked. When the per-call allocations for @locked_nis or @ref_inos fail, the function returns -ENOMEM directly without unlocking the folio. Any later task that needs the folio's lock then stalls, and the folio's dirty state is silently lost from the writeback iterator's point of view. Use folio_redirty_for_writepage() so the folio remains dirty for a subsequent writeback pass, unlock it, and only then return -ENOMEM so the caller can propagate the error to fsync()/sync_filesystem(). Fixes: `f462fdf3d6` ("ntfs: reduce stack usage in ntfs_write_mft_block()") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:50:47 +09:00
DaeMyung Kang	47773fa85e	ntfs: use base mft_no when looking up base inode for extent record When the mft record is an extent record, ntfs_may_write_mft_record() looks up its base inode in the icache. The hash key passed to find_inode_nowait() must be the base inode's mft number (na.mft_no, set just above to MREF_LE(m->base_mft_record)), but the code passes @mft_no, the extent record's own number. find_inode_nowait() uses its second argument as the hashval, so the lookup lands in the wrong bucket and almost always returns NULL. ntfs_may_write_mft_record() then returns false and the writeback path (ntfs_write_mft_block()) skips that extent record, leaving the on-disk copy permanently out of sync with the in-memory one. The original ilookup5_nowait() call this conversion replaced used na.mft_no. Restore that. Fixes: `115380f9a2` ("ntfs: update mft operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-08 23:50:45 +09:00
Arnd Bergmann	1fcf414941	RISC-V devicetrees fixes for v7.1-rc3 Microchip: Fix a pinctrl misconfiguration caused by a erratum fixed between engineering sample and production silicon, that causes settings for one to not apply to the other. Starfive: Remove nodes relating to the "camss" video device that has been deleted entirely from staging. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCafzQ+gAKCRB4tDGHoIJi 0vneAQChWWjM6HD33ufZ+aSP6us0W3WzMTSEVOvaalVq829n4gD7BAY255ndmnfU O4Ns0a+JGkytUxX6cLOdaMBWKR3xfwY= =2noe -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmn95jMACgkQmmx57+YA GNmeSw//ZcTvigboQ8rhwhAluAWLgWMxNeYj8VOStRFbfT+zPq2qsJOgTha15aa0 YDgZv802l7kISY/IWwQJS/aPhenwIY7RR6GwdfgIc4CAyLfIdaIerbScw4H2be7H +Kvd42y27iSGBYwg4XaIMd1J32t7f2OJ15uSzr1eeF6bSWW21SRzBskWK6h2bvCQ FcAc2xSy6aSd3XLo8XUiMGa7ye8lGLZjBBAxuqxJxbNM4xASkxWmSGp7Wjkj2xZv nLJnPY1gVADanMvZA4gggVVyZ9UQjtaOCR8r9Wfi6VUEs5eudSLtl4a2m3RO6EyG zqq0vdQJ9jgAuVSIKuWLN3UCJ1wiX0aOtQak2vQC2l3Tzn2hgCHqueY86lFIqib8 xLbG6danLCfeoqR/ql+nlhOWA4Hjo6rvCwjGJv/8lxG8UJbs8nMmQ6E/69AomfS5 MXHaxIQggKmeTj4NGO9c1Vnp83Rqn3wupty00uSkyFbEndR+t28HYKtRaRIuYECy m2HeY5XydSKStOrsqNheB1nxnlSBdct6qyOPRkd4ROvPfc9tLA8sP4Okx251hOba FtFj3Ji6vyZwXGoSBnpj1MZ0ouoPnTUHGQF8TZGTXsZK0YNyvd7Y/tYA89MKAjfZ dUcGpi5VFAcC0aW557OEJ+0jhlBa3ZC7OPFU4aiZi08U70SnXdk= =toZl -----END PGP SIGNATURE----- Merge tag 'riscv-dt-fixes-for-v7.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes RISC-V devicetrees fixes for v7.1-rc3 Microchip: Fix a pinctrl misconfiguration caused by a erratum fixed between engineering sample and production silicon, that causes settings for one to not apply to the other. Starfive: Remove nodes relating to the "camss" video device that has been deleted entirely from staging. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-dt-fixes-for-v7.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: riscv: dts: microchip: fix icicle i2c pinctrl configuration riscv: dts: starfive: jh7110: Drop CAMSS node Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2026-05-08 15:33:36 +02:00
Ming Lei	f7700a4415	ublk: fix use-after-free in ublk_cancel_cmd() When ublk_reset_ch_dev() clears io->cmd via ublk_queue_reinit() concurrently with ublk_cancel_cmd(), ublk_cancel_cmd() can read a stale pointer and pass it to io_uring_cmd_done(), causing a use-after-free. Fix by synchronizing the two paths with ubq->cancel_lock: - ublk_cancel_cmd(): read and clear io->cmd under cancel_lock, then call io_uring_cmd_done() on the saved local copy outside the lock. - ublk_reset_ch_dev(): hold cancel_lock across ublk_queue_reinit() so that io->cmd and io->flags are cleared atomically with respect to ublk_cancel_cmd(). Fixes: `216c8f5ef0` ("ublk: replace monitor with cancelable uring_cmd") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260508123746.242018-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-05-08 06:44:42 -06:00
Sven Eckelmann	ba9d20ee90	batman-adv: bla: put backbone reference on failed claim hash insert When batadv_bla_add_claim() fails to insert a new claim into the hash, it leaked a reference to the backbone_gw for which the claim was intended. Call batadv_backbone_gw_put() on the error path to release the reference and avoid leaking the backbone_gw object. Cc: stable@kernel.org Fixes: `3db0decf11` ("batman-adv: Fix non-atomic bla_claim::backbone_gw access") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-08 14:29:02 +02:00
Sven Eckelmann	cf6b604011	batman-adv: bla: only purge non-released claims When batadv_bla_purge_claims() goes through the list of claims, it is only traversing the hash list with an rcu_read_lock(). Due to a potential parallel batadv_claim_put(), it can happen that it encounters a claim which was actually in the process of being released+freed by batadv_claim_release(). In this case, backbone_gw is set to NULL before the delayed RCU kfree is started. Calling batadv_bla_claim_get_backbone_gw() is then no longer allowed because it would cause a NULL-ptr derefence. To avoid this, only claims with a valid reference counter must be purged. All others are already taken care of. Cc: stable@kernel.org Fixes: `23721387c4` ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-08 14:28:56 +02:00
Sven Eckelmann	4ae1709a31	batman-adv: bla: prevent use-after-free when deleting claims When batadv_bla_del_backbone_claims() removes all claims for a backbone, it does this by dropping the link entry in the hash list. This list entry itself was one of the references which need to be dropped at the same time via batadv_claim_put(). But the batadv_claim_put() must not be done before the last access to the claim object in this function. Otherwise the claim might be freed already by the batadv_claim_release() function before the list entry was dropped. Cc: stable@kernel.org Fixes: `23721387c4` ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-08 14:28:51 +02:00
Sven Eckelmann	ce425dd05d	batman-adv: tp_meter: fix tp_num leak on kmalloc failure When batadv_tp_start() or batadv_tp_init_recv() fail to allocate a new tp_vars object, the previously incremented bat_priv->tp_num counter is never decremented. This causes tp_num to drift upward on each allocation failure. Since only BATADV_TP_MAX_NUM sessions can be started and the count is never reduced for these failed allocations, it causes to an exhaustion of throughput meter sessions. In worst case, no new throughput meter session can be started until the mesh interface is removed. The error handling must decrement tp_num releasing the lock and aborting the creation of an throughput meter session Cc: stable@kernel.org Fixes: `33a3bb4a33` ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-08 14:28:44 +02:00
Jiexun Wang	f03e858353	batman-adv: stop caching unowned originator pointers in BAT IV BAT IV keeps the last-hop neighbor address in each neigh_node, but some paths also cache an originator pointer derived from a temporary lookup. That pointer is not owned by the neigh_node and may no longer refer to a live originator entry after purge handling runs. Stop storing the auxiliary originator pointer in the BAT IV neighbor state. When BAT IV needs the neighbor originator data, resolve it from the stored neighbor address and drop the reference again after use. Fixes: `c6c8fea297` ("net: Add batman-adv meshing protocol") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> [sven: avoid bonding logic for outgoing OGM] Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-08 14:28:40 +02:00
Francis, David	5e28b7b944	drm: Set old handle to NULL before prime swap in change_handle There was a potential race condition in change_handle. The ioctl briefly had a single object with two idr entries; a concurrent gem_close could delete the object and remove one of the handles while leaving the other one dangling, which could subsequently be dereferenced for a use-after-free. To fix this, do the same dance that gem_close itself does. (`f6cd7daecf` drm: Release driver references to handle before making it available again) First idr_replace the old handle to NULL. Later, if the prime operations are successful, actually close it. create_tail required a similar dance to avoid a similar problem. (`bd46cece51` drm/gem: Fix race in drm_gem_handle_create_tail()) It idr_allocs the new handle with NULL, then swaps in the correct object later to avoid races. We don't need to do that here, since the only operations that could race are drm_prime, and change_handle holds the prime lock for the entire duration. v2: cleanups of error paths Signed-off-by: David Francis <David.Francis@amd.com> Co-authored-by: Dave Airlie <airlied@gmail.com> Reported-by: Puttimet Thammasaeng <pwn8official@gmail.com> Tested-by: Vitaly Prosyak <Vitaly.Prosyak@amd.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: stable@vger.kernel.org Cc: Christian Koenig <Christian.Koenig@amd.com> Fixes: `53096728b8` ("drm: Add DRM prime interface to reassign GEM handle") Signed-off-by: Dave Airlie <airlied@redhat.com>	2026-05-08 17:53:59 +10:00
Dave Airlie	d8a70292c3	amd-drm-fixes-7.1-2026-05-06: amdgpu: - GFX9 fixes - Hawaii SMU fixes - SDMA4 fix - GART fix - Userq fixes amdkfd: - GPUVM TLB flush fix - Hotplug fix radeon: - Hawaii SMU fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaftgvwAKCRC93/aFa7yZ 2BhxAP44Fl4OVx4MFa8oVxwsGhMsLny+7a5vIFk+2JXWKm07/AD+O0mRb4F43sgP CladoJJ/K7pSDfjr3O7HeTG+/dQllwc= =gLjO -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-7.1-2026-05-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.1-2026-05-06: amdgpu: - GFX9 fixes - Hawaii SMU fixes - SDMA4 fix - GART fix - Userq fixes amdkfd: - GPUVM TLB flush fix - Hotplug fix radeon: - Hawaii SMU fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260506154631.1733034-1-alexander.deucher@amd.com	2026-05-08 17:50:12 +10:00
John Walker	7666dbb1ba	wifi: cfg80211: advance loop vars in cfg80211_merge_profile() cfg80211_merge_profile() reassembles a Multi-BSSID non-transmitted BSS profile that has been split across multiple consecutive MBSSID elements. Its while-loop calls cfg80211_get_profile_continuation(ie, ielen, mbssid_elem, sub_elem) but never advances mbssid_elem or sub_elem inside the body. Each iteration therefore searches for a continuation that follows the same fixed pair; the helper returns the same next_mbssid; and the same next_sub bytes are memcpy()'d into merged_ie at a growing offset until the buffer fills. Advance both mbssid_elem and sub_elem to the just-consumed continuation so the next call to cfg80211_get_profile_continuation() searches for a further continuation beyond it (or returns NULL when none exists). A specially-crafted malicious beacon can take advantage of this bug to cause the kernel to spend an excessive amount of time in cfg80211_merge_profile (up to as much as 2ms per beacon received), which could theoretically be abused in some way. Cc: stable@vger.kernel.org Fixes: `fe806e4992` ("cfg80211: support profile split between elements") Signed-off-by: John Walker <johnwalker0@gmail.com> Link: https://patch.msgid.link/20260507230720.64783-1-johnwalker0@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-05-08 09:20:03 +02:00
K Prateek Nayak	f9f16835d4	cpufreq/amd-pstate-ut: Drop policy reference before driver switch Recent changes to the EPP unit test tries to perform a driver switch with a cpufreq_policy reference held when the driver is loaded into anything but the active mode which leads to a circular dependency and the unit test hanging indefinitely. Drop the reference before driver switch and grab it back once the driver mode is stabilized for the test. The EPP writes are only possible with CPUFREQ_POLICY_POWERSAVE policy. Temporarily switch the cpudata->policy (while holding the write end of the policy->rwsem) to CPUFREQ_POLICY_POWERSAVE and restore the original policy once tests are done. To ensure the final EPP is correct in case the driver started with CPUFREQ_POLICY_PERFORMANCE, EPP performance is tested last. The __free() based cleanup for cpufreq_policy is lost in the process. Reported-by: Kalpana Shetty <kalpana.shetty@amd.com> Fixes: `7e173bc310` ("cpufreq/amd-pstate-ut: Add a unit test for raw EPP") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20260508051748.10484-7-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-05-08 00:30:50 -05:00
K Prateek Nayak	caa822d312	cpufreq/amd-pstate: Use "epp_default_dc" as default when dynamic_epp is disabled If "dynamic_epp" is disabled, the driver initialization and the default EPP selection from sysfs currently sets the EPP based on the power supply state of the system at that time but there is no power supply callbacks registered to toggle it when the power supply state changes. This can lead to faster battery drain on platforms that start off while being plugged to the wall but later move to battery power since the EPP stays at AMD_CPPC_EPP_PERFORMANCE. Use "epp_default_dc" as the default EPP selection when dynamic_epp is disabled, restoring older behavior. On servers, this defaults to AMD_CPPC_EPP_PERFORMANCE and on other platforms, it defaults to AMD_CPPC_EPP_BALANCE_PERFORMANCE. Fixes: `e30ca6dd53` ("cpufreq/amd-pstate: Add dynamic energy performance preference") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20260508051748.10484-6-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-05-08 00:30:50 -05:00
K Prateek Nayak	f3acf7ff11	cpufreq/amd-pstate: Reorder notifier unregistration and floor perf reset An active power supply notifier can race with amd_pstate_epp_cpu_exit() trying to reset the floor perf and can overwrite the floor perf set in MSR_AMD_CPPC_REQ. Unregister the notifier before setting the floor perf to prevent the rare race. Fixes: `e30ca6dd53` ("cpufreq/amd-pstate: Add dynamic energy performance preference") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20260508051748.10484-5-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-05-08 00:30:50 -05:00
K Prateek Nayak	c5eed6ddc7	cpufreq/amd-pstate: Allow writes to dynamic_epp when state isn't modified Writing the current "dynamic_epp" state to sysfs fails with -EINVAL even though the desired result was achieved. Allow writes to "dynamic_epp" that does not modify the state. Fixes: `e30ca6dd53` ("cpufreq/amd-pstate: Add dynamic energy performance preference") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20260508051748.10484-4-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-05-08 00:30:50 -05:00
K Prateek Nayak	87d2a8dec0	cpufreq/amd-pstate: Return -ENOMEM on failure to allocate profile_name Failure to allocate profile name will return -EINVAL from platform_profile_register() while in fact, it is a failure to allocate memory for the profile_name string. Return -ENOMEM when kasprintf() fails to allocate profile_name string. Fixes: `e30ca6dd53` ("cpufreq/amd-pstate: Add dynamic energy performance preference") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20260508051748.10484-3-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-05-08 00:30:50 -05:00
K Prateek Nayak	9228169d2a	cpufreq/amd-pstate: Grab "amd_pstate_driver_lock" when toggling dynamic_epp Concurrently changing driver mode and dynamic_epp with: echo passive > /sys/devices/system/cpu/amd_pstate/status& echo disable > /sys/devices/system/cpu/amd_pstate/dynamic_epp& hits the WARN_ON_ONCE() in static_key_disable_cpuslocked() and hangs the system since both sysfs writes are trying to do amd_pstate_change_driver_mode() without any synchronization. Grab the "amd_pstate_driver_lock" mutex when modifying "dynamic_epp" to prevent the two paths from racing with each other. Add a lockdep assertion for "amd_pstate_driver_lock" in amd_pstate_change_driver_mode() to formalize the dependency. Since "cppc_mode" is stable under "amd_pstate_driver_lock", only reload the driver when in "AMD_PSTATE_ACTIVE" mode and reject all writes when in passive or guided mode, or if the driver is not loaded, since only active mode operates on EPP. Fixes: `e30ca6dd53` ("cpufreq/amd-pstate: Add dynamic energy performance preference") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20260508051748.10484-2-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-05-08 00:30:50 -05:00
Dave Airlie	765e717dfb	Short summary of fixes pull: bochs: - fix managed cleanup bridge: - tda998x: fix sparse warnings on type correctness etnaviv: - schedule armed jobs exynos: - managed bridge cleanup fb-helper: - fix clipping ivpu: - disallow reexport of GEM buffer objects noveau: - revert support for GA100 panel: - boe-tv101wum-nl16: use correct MIPI_DSI mode - feyjang-fy07024di26a30d: fix error reporting - himax-hx83102: use correct MIPI_DSI mode - himax-hx83121a: fix error checks - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER qaic: - fix RAS message handling qxl: - clean up polling sti: - managed bridge cleanup ttm: - update GPU MM stats on pool shrinking -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmn8fKsbFIAAAAAABAAO bWFudTIsMi41KzEuMTIsMiwyAAoJEGgNwR1TC3oj8z0IAJXVTUYQRwu9tazwt2co blbmVz8l14tHXjUwaroKncAPyofs4MQ8Hbb32TwLU3LykFJH5EkTBNysNmRNFzUp UXtuZh8qRlPi3GkUfbE2fP778LR0LvIMYE+gpVB5HYtZdrzCZXgmFPPr4l+61RzV m893w8hGe65aA8HSaFtp5XtTfppYTdCT17zeFc5J1hbarZcPNgHUY6Ikhv93cWLm suf8gsaLYvx86MLtgCjpAnwDVCFoyRQvjDNRVlf27qUjC/a+5mzyJMBff4APCTai GDMTutJGqxozIN+WdwPoTXdfoN1SnMjfzlUZF2J/nOHX0pFcBCQ7sQC62hbr6KyC N8w= =YdY+ -----END PGP SIGNATURE----- Merge tag 'drm-misc-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: bochs: - fix managed cleanup bridge: - tda998x: fix sparse warnings on type correctness etnaviv: - schedule armed jobs exynos: - managed bridge cleanup fb-helper: - fix clipping ivpu: - disallow reexport of GEM buffer objects noveau: - revert support for GA100 panel: - boe-tv101wum-nl16: use correct MIPI_DSI mode - feyjang-fy07024di26a30d: fix error reporting - himax-hx83102: use correct MIPI_DSI mode - himax-hx83121a: fix error checks - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER qaic: - fix RAS message handling qxl: - clean up polling sti: - managed bridge cleanup ttm: - update GPU MM stats on pool shrinking Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260507115213.GA206508@linux.fritz.box	2026-05-08 12:11:16 +10:00
Hui Wang	41337097f2	riscv: cpufeature: Use pre-defined ISA ext macros to index isa2hwcap We have pre-defined ISA extension macros, here use those macros to replace a magic number for isa2hwcap definition and some array indexing for isa2hwcap access. This doesn't change the original functionality, just improve the code maintainability and readability. Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://patch.msgid.link/20260506132152.53239-1-hui.wang@canonical.com Signed-off-by: Paul Walmsley <pjw@kernel.org>	2026-05-07 19:38:38 -06:00
Martin Kaiser	ef5581bb30	test_kprobes: clear kprobes between test runs Running the kprobes sanity tests twice makes all tests fail and eventually crashes the kernel. [root@martin-riscv-1 ~]# echo 1 > /sys/kernel/debug/kunit/kprobes_test/run ... # Totals: pass:5 fail:0 skip:0 total:5 ok 1 kprobes_test [root@martin-riscv-1 ~]# echo 1 > /sys/kernel/debug/kunit/kprobes_test/run ... # test_kprobe: EXPECTATION FAILED at lib/tests/test_kprobes.c:64 Expected 0 == register_kprobe(&kp), but register_kprobe(&kp) == -22 (0xffffffffffffffea) ... Unable to handle kernel paging request ... The testsuite defines several kprobes and kretprobes as static variables that are preserved across test runs. After register_kprobe and unregister_kprobe, a kprobe contains some leftover data that must be cleared before the kprobe can be registered again. The tests are setting symbol_name to define the probe location. Address and flags must be cleared. The existing code clears some of the probes between subsequent tests, but not between two test runs. The leftover data from a previous test run makes the registrations fail in the next run. Move the cleanups for all kprobes into kprobes_test_init, this function is called before each single test (including the first test of a test run). Link: https://lore.kernel.org/all/20260507134615.1010905-1-martin@kaiser.cx/ Fixes: `e44e81c5b9` ("kprobes: convert tests to kunit") Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-05-08 10:03:44 +09:00
Jianpeng Chang	307abfac04	kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist() When kprobe_add_area_blacklist() iterates through a section like .kprobes.text, the start address may not correspond to a named symbol. On ARM64 with CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y (introduced by commit `baaf553d3b` ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS")), the compiler flag -fpatchable-function-entry=4,2 inserts 2 NOPs before each function entry point for ftrace call_ops. These pre-function NOPs sit at the section base address, before the first named function symbol. The compiler emits a $x mapping symbol at offset 0x00 to mark the start of code, but find_kallsyms_symbol() ignores mapping symbols. Without CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS (e.g. defconfig), no pre-function NOPs are inserted, the first function starts at offset 0x00, and the bug does not trigger. This only affects modules that have a .kprobes.text section (i.e. those using the __kprobes annotation). Modules using NOKPROBE_SYMBOL() instead (like kretprobe_example.ko) blacklist exact function addresses via the _kprobe_blacklist section and are not affected. For kprobe_example.ko on ARM64 with -fpatchable-function-entry=4,2, the .kprobes.text section layout is: offset 0x00: $x + 2 NOPs (mapping symbol + ftrace preamble) offset 0x08: handler_post (64 bytes) offset 0x50: handler_pre (68 bytes) kprobe_add_area_blacklist() starts iterating from the section base address (offset 0x00), which only has the $x mapping symbol. kprobe_add_ksym_blacklist() then calls kallsyms_lookup_size_offset() for this address, which goes through: kallsyms_lookup_size_offset() -> module_address_lookup() -> find_kallsyms_symbol() find_kallsyms_symbol() scans all module symbols to find the closest preceding symbol. Since no named text symbol exists at offset 0x00, find_kallsyms_symbol() picks __UNIQUE_ID_vermagic (a .modinfo symbol whose address is in the temporary image) as the "best" match. The computed "size" = next_text_symbol - modinfo_symbol spans across these two unrelated memory regions, creating a blacklist entry with a bogus range of tens of terabytes. Whether this causes a visible failure depends on address randomization, here is what happens on Raspberry Pi 4/5: - On RPi5, the bogus size was ~35 TB. start + size stayed within 64-bit range, so the blacklist entry covered the entire kernel text. register_kprobe() in the module's own init function failed with -EINVAL. - On RPi4, the bogus size was ~75 TB. start + size overflowed 64 bits and wrapped to a small address near zero. The range check (addr >= start && addr < end) then failed because end wrapped around, so the bogus entry was accidentally harmless and kprobes worked by luck. The same bug exists on both machines, but randomization determines whether the integer overflow masks it or not. Fix this by adding notrace to the __kprobes macro. Functions in .kprobes.text are kprobe infrastructure handlers that should never be traced by ftrace. With notrace, the compiler stops inserting them and the non-symbol gap at the section start disappears entirely. Link: https://lore.kernel.org/all/20260506012706.2785785-1-jianpeng.chang.cn@windriver.com/ Fixes: `baaf553d3b` ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS") Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-05-08 10:03:44 +09:00
Linus Torvalds	917719c412	selinux/stable-7.1 PR 20260507 -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmn8tJEUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXN/Qw//XAX9lG6yOCVR/JnuhNOjgENvqANu 2aY33ylr+nqqQ3bDR7uJKrxZ74Vu+r4+SKFF2mHmVJkIkOie2XsFBYh68Q2QiIQd ZROxTnPYeP/eEsL1Px6ZjyILqD4gkNUtSQvH3FoydFcF784VcMwvPoTeLENzqZkV RDcVypiYNiROfVS4cDXYiVZKZuQeMg7/9IX5ZVDsdYv2DwlYgj5LsfM+HdB95vf1 MoysATSwOo6VUUvri+i9BNVC3ZYeSpEvC9NZazb3QKWH/cpvDrzZh0Qz+Mn2TFVt X2o9syY2K8nm/ZxUTVmbni2Y8xB71a3WRJjx3vdc/ZpsOQ38hMMnLfxCCB/xNvQ1 Z8/qUOVT0fiI4AfJdQjVZkO7viKQfRlQAtkT+nAVIESnsZdBckluT7uFSH4nvzcC JNK6Uzvhs4lhrYyMA9rEQOpswspcKxyeYPZN5rFtlz3zUzGeYG0pTNZ6B9Tjl8DK xnqcm/ySTGx8m6/w29xDdubOjj+J2U+oGOkaGAWyXDxGbJ2vWPvqQrbRfFrMuzPh sO5IYKcQ6MbEIzcUErlASXVxJj2KfCjEnbTDpocOVEY6vczAPtQuZme3WyQ80GuV wUYXDEBYEJG27vjHXtN5ClZs6NRY2niyphq3yh+gWoJFens76ord9zU6Pq8IF4xd XcPbqMhY1Q65i3c= =l+qi -----END PGP SIGNATURE----- Merge tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fixes from Paul Moore: - Allow for multiple opens of /sys/fs/selinux/policy Prevent a single process from blocking others from reading the SELinux policy loaded in the kernel. This does have the side effect of potentially allowing userspace to trigger additional kernel memory allocations as part of the open/read operation, but this is mitigated by requiring the SELinux security/read_policy permission. - Reduce the critical sections where the SELinux policy mutex is held This includes the patch to the policy loader code where we move the permission checks and an allocation outside the mutex as well as the the patch to checkreqprot which drops the code/lock entirely. While the checkreqprot code had effectively been dropped in an earlier release, portions of the code still remained that would have triggered the mutex to perform an IMA measurement. This finally drops all of that while preserving the user visible behavior. - Eliminate potential sources of log spamming There were a few areas where processes could flood the system logs and hide other, more critical events. The previously disabled checkreqprot and runtime disable knobs in selinuxfs were two such areas that have now been greatly simplified and a pr_err() replaced with a pr_err_once(). The third such place is the /sys/fs/selinux/user file, which hasn't been used by a userspace release since 2020 and was scheduled for removal after 2025; this effectively disables this functionality, but similar to checkreqprot, it is done in a way that should not break old userspace. * tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: shrink critical section in sel_write_load() selinux: allow multiple opens of /sys/fs/selinux/policy selinux: prune /sys/fs/selinux/user selinux: prune /sys/fs/selinux/disable selinux: prune /sys/fs/selinux/checkreqprot	2026-05-07 17:26:43 -07:00
Tejun Heo	1f91d0d582	sched_ext: Fix !CONFIG_EXT_SUB_SCHED build warnings W=1 with CONFIG_EXT_SUB_SCHED=n flags 'err_msg' uninitialized and 'err_free_lb_resched' unused. Initialize err_msg and gate the label. Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-07 14:16:59 -10:00
Li Xiasong	19f94b6fee	netfilter: nft_ct: fix missing expect put in obj eval nft_ct_expect_obj_eval() allocates an expectation and may call nf_ct_expect_related(), but never drops its local reference. Add nf_ct_expect_put(exp) before return to balance allocation. Fixes: `857b46027d` ("netfilter: nft_ct: add ct expectations support") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-08 01:30:17 +02:00
Li Xiasong	eb6317739b	netfilter: nf_conntrack_sip: get helper before allocating expectation process_register_request() allocates an expectation and then checks whether a conntrack helper is available. If helper lookup fails, the function returns early and the allocated expectation is left behind. Reorder the code to fetch and validate helper before calling nf_ct_expect_alloc(). This keeps the logic simpler and removes the leak path while preserving existing behavior. Fixes: `e14575fa75` ("netfilter: nf_conntrack: use rcu accessors where needed") Cc: stable@vger.kernel.org Signed-off-by: Li Xiasong <lixiasong1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-08 01:30:17 +02:00
Pablo Neira Ayuso	d8ef54c83a	netfilter: ctnetlink: check tuple and mask in expectations created via nfqueue Ensure the expectation tuple and mask attributes are present in netlink message, otherwise null-ptr-deref is possible. Fixes: `bd07793705` ("netfilter: nfnetlink_queue: allow to attach expectations to conntracks") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-08 01:30:17 +02:00
Pablo Neira Ayuso	dcb0f9aefd	netfilter: nf_conntrack_expect: restore helper propagation via expectation A recent series to fix expectations broke helper propagation via expectation, this mechanism is used by the sip and h323 helper. This also propagates the conntrack helper to expected connections. I changed semantics of exp->helper which now tells us the actual helper that created the expectation. Add an explicit assign_helper field to expectations for this purpose and update helpers to use it. Restore this feature for userspace conntrack helper via ctnetlink nfqueue integration so it is again possible to attach a helper to an expectation, where it makes sense. This is not restored via ctnetlink expectation creation as there is no client for such feature. Use the expectation layer 4 protocol number for the helper lookup for consistency. Make sure the expectation using this helper propagation mechanism also go away when the helper is unregistered. Fixes: `9c42bc9db9` ("netfilter: nf_conntrack_expect: honor expectation helper field") Fixes: `917b61fa20` ("netfilter: ctnetlink: ignore explicit helper on new expectations") Reported-by: Ilya Maximets <i.maximets@ovn.org> Tested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-05-08 01:30:17 +02:00

... 30 31 32 33 34 ...

1446786 Commits