linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-07 22:14:04 +02:00

Author	SHA1	Message	Date
Tiezhu Yang	1829419bc3	LoongArch: Handle CONFIG_32BIT in syscall_get_arch() If CONFIG_32BIT is set, it should return AUDIT_ARCH_LOONGARCH32 instead of AUDIT_ARCH_LOONGARCH64 in syscall_get_arch(). Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2026-04-22 15:45:11 +08:00
Huacai Chen	8b81576c16	LoongArch: Add HIGHMEM (PKMAP and FIX_KMAP) support Add HIGHMEM (High Memory) support for LoongArch, mostly needed by 32BIT kernel because the size of kernel virtual memory space is only 512MB and the size of usable physical memory is only 256MB in this case. HIGHMEM adds permanent kernel mapping (PKMAP) and fixed kernel mapping (FIX_KMAP), which increase usable physical memory up to 2.25GB (2304MB). We can just use the generic copy_user_highpage(), so remove the custom version. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2026-04-22 15:44:54 +08:00
Huacai Chen	3d9aba6618	LoongArch: Adjust build infrastructure for 32BIT/64BIT Adjust build infrastructure (Kconfig, Makefile and ld scripts) to let us enable both 32BIT/64BIT kernel build. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>	2026-04-22 15:44:26 +08:00
Jork Loeser	5170a82e89	x86/hyperv: Skip LP/VP creation on kexec After a kexec the logical processors and virtual processors already exist in the hypervisor because they were created by the previous kernel. Attempting to add them again causes either a BUG_ON or corrupted VP state leading to MCEs in the new kernel. Add hv_lp_exists() to probe whether an LP is already present by calling HVCALL_GET_LOGICAL_PROCESSOR_RUN_TIME. When it succeeds the LP exists and we skip the add-LP and create-VP loops entirely. Also add hv_call_notify_all_processors_started() which informs the hypervisor that all processors are online. This is required after adding LPs (fresh boot) and is a no-op on kexec since we skip that path. Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Co-developed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Signed-off-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2026-04-22 06:23:25 +00:00
Jork Loeser	f7ce370b52	x86/hyperv: move stimer cleanup to hv_machine_shutdown() Move hv_stimer_global_cleanup() from vmbus's hv_kexec_handler() to hv_machine_shutdown() in the platform code. This ensures stimer cleanup happens before the vmbus unload, which is required for root partition kexec to work correctly. Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2026-04-22 06:23:25 +00:00
Jork Loeser	3c42b33433	Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing vmbus_alloc_synic_and_connect() declares a local 'int hyperv_cpuhp_online' that shadows the file-scope global of the same name. The cpuhp state returned by cpuhp_setup_state() is stored in the local, leaving the global at 0 (CPUHP_OFFLINE). When hv_kexec_handler() or hv_machine_shutdown() later call cpuhp_remove_state(hyperv_cpuhp_online) they pass 0, which hits the BUG_ON in __cpuhp_remove_state_cpuslocked(). Remove the local declaration so the cpuhp state is stored in the file-scope global where hv_kexec_handler() and hv_machine_shutdown() expect it. Fixes: `2647c96649` ("Drivers: hv: Support establishing the confidential VMBus connection") Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2026-04-22 06:23:25 +00:00
Stanislav Kinsburskii	cfc42685e5	mshv: Add tracepoint for GPA intercept handling Provide visibility into GPA intercept operations for debugging and performance analysis of Microsoft Hypervisor guest memory management. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>	2026-04-22 06:23:25 +00:00
Sangyun Kim	68637b68af	pwm: atmel-tcb: Cache clock rates and mark chip as atomic atmel_tcb_pwm_apply() holds tcbpwmc->lock as a spinlock via guard(spinlock)() and then calls atmel_tcb_pwm_config(), which calls clk_get_rate() twice. clk_get_rate() acquires clk_prepare_lock (a mutex), so this is a sleep-in-atomic-context violation. On CONFIG_DEBUG_ATOMIC_SLEEP kernels every pwm_apply_state() that enables or reconfigures the PWM triggers a "BUG: sleeping function called from invalid context" warning. Acquire exclusive control over the clock rates with clk_rate_exclusive_get() at probe time and cache the rates in struct atmel_tcb_pwm_chip, then read the cached rates from atmel_tcb_pwm_config(). This keeps the spinlock-based mutual exclusion introduced in commit `37f7707077` ("pwm: atmel-tcb: Fix race condition and convert to guards") and removes the sleeping calls from the atomic section. With no sleeping calls left in .apply() and the regmap-mmio bus already running with fast_io=true, also mark the chip as atomic so consumers can use pwm_apply_atomic() from atomic context. Fixes: `37f7707077` ("pwm: atmel-tcb: Fix race condition and convert to guards") Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr> Link: https://patch.msgid.link/20260419080838.3192357-1-sangyun.kim@snu.ac.kr [ukleinek: Ensure .clk is enabled before calling clk_get_rate on it.] Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>	2026-04-22 07:24:33 +02:00
Greg Kroah-Hartman	d0be8884f5	io_uring: take page references for NOMMU pbuf_ring mmaps Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel virtual address of the io_mapped_region's backing pages directly; the user's VMA aliases the kernel allocation. io_uring_mmap() then just returns 0 -- it takes no page references. The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on each inserted page. Those references are released when the VMA is torn down (zap_pte_range -> put_page). io_free_region() -> release_pages() drops the io_uring-side references, but the pages survive until munmap drops the VMA-side references. Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring -> io_put_bl -> io_free_region -> release_pages drops the only references and the pages return to the buddy allocator while the user's VMA still has vm_start pointing into them. The user can then write into whatever the allocator hands out next. Mirror the MMU lifetime: take get_page references in io_uring_mmap() and release them via vm_ops->close. NOMMU's delete_vma() calls vma_close() which runs ->close on munmap. This also incidentally addresses the duplicate-vm_start case: two mmaps of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer. With page refs taken per mmap, the second mmap takes its own refs and the pages survive until both mmaps are closed. The nommu rb-tree BUG_ON on duplicate vm_start is a separate mm/nommu.c concern (it should share the existing region rather than BUG), but the page lifetime is now correct. Cc: Jens Axboe <axboe@kernel.dk> Reported-by: Anthropic Assisted-by: gkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh [axboe: get rid of region lookup, just iterate pages in vma] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 20:14:39 -06:00
Linus Torvalds	beaba8bfbb	Probes fixes for v7.1 - fprobe: Fixed several bugs, which includes followings: . Prevention of re-registration: Added an earlier check to reject re-registering an already active fprobe before its state is modified during the initialization phase. . Robustness in failure paths: - Ensured fprobes are correctly removed from all internal tables and properly RCU-freed during registration failure. - Modified unregister_fprobe() to proceed with unregistration even if temporary memory allocation fails. . RCU safety in module unloading: Avoided a potential "sleep in RCU" warning by removing a kcalloc() call in the module notifier path. This also tries to remove fprobe_hash_node even if memory allocation fails. . Type-aware unregistration: Fixed a bug where unregistering an fprobe did not account for different types (entry-only vs. entry-exit) at the same address, which previously left "junk" entries in the underlying ftrace/fgraph ops. . Unregistration of empty ftrace_ops: Avoided unneeded performance overhead because of making registered ftrace_ops empty (means trace all functions.) This counts remaining entries and unregister ftrace_ops when it becomes empty. - ftracetest: In addition to the fix, two new selftests to check above fixes, are included: . Module Unloading Test: Specifically verifies that fprobe events on a module are correctly cleaned up and do not trigger "trace-all" behavior when the module is removed. . Multiple Fprobe Events Test: Ensures that having multiple fprobes on the same function correctly manages the ftrace hash map during removal. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmnoGFUbHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bvR8H/3f+lFlflidmqgjW+dy5 cB73jVVznarkvVr2AdR1KpNv3V9qjLCWAiUM6sDI79mbMNbOKh44EkEX9P5actNq 3e4NofYr8TyfnJfD6tnrevjhZspCbvRWKVc2+gY9EIi9XUxSRjGuv/5VkIa5HMAN 3Zg++U0mYBTEAn59nGR++GakMuF/UJChOC5eVKpVYjtRLQ+TzWe6nwBhXj6PrhVT OjZN4LHLH/Eze+Suk2duBDt0RQhfPqiz8yW7eP6TzxkS/DN+hmgoQ/zqFjndPsmc 1TnJ1iJLJ5vo0SqtuCKDv9OTc1oclMCYYdzCxEOnrfFXX4BXxGjzcEd7sjINHMKd kBY= =9oav -----END PGP SIGNATURE----- Merge tag 'probes-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: "fprobe bug fixes: - Prevent re-registration Add an earlier check to reject re-registering an already active fprobe before its state is modified during the initialization phase - Robustness in failure paths: - Ensure fprobes are correctly removed from all internal tables and properly RCU-freed during registration failure - Make unregister_fprobe() proceed with unregistration even if temporary memory allocation fails - RCU safety in module unloading Avoid a potential "sleep in RCU" warning by removing a kcalloc() call in the module notifier path. This also tries to remove fprobe_hash_node even if memory allocation fails. - Type-aware unregistration Fix a bug where unregistering an fprobe did not account for different types (entry-only vs entry-exit) at the same address, which previously left "junk" entries in the underlying ftrace/fgraph ops - Unregistration of empty ftrace_ops Avoid unneeded performance overhead due to making registered ftrace_ops empty - which means 'trace all functions'. This counts remaining entries and unregister ftrace_ops when it becomes empty. Two new selftests to check above fixes: - Module Unloading Test: Specifically verifies that fprobe events on a module are correctly cleaned up and do not trigger 'trace-all' behavior when the module is removed. - Multiple Fprobe Events Test: Ensure that having multiple fprobes on the same function correctly manages the ftrace hash map during removal" * tag 'probes-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: selftests/ftrace: Add a testcase for multiple fprobe events selftests/ftrace: Add a testcase for fprobe events on module tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading tracing/fprobe: Check the same type fprobe on table as the unregistered one tracing/fprobe: Avoid kcalloc() in rcu_read_lock section tracing/fprobe: Remove fprobe from hash in failure path tracing/fprobe: Unregister fprobe even if memory allocation fails tracing/fprobe: Reject registration of a registered fprobe before init	2026-04-21 19:05:09 -07:00
Jens Axboe	1967f0b1ca	io_uring/poll: ensure EPOLL_ONESHOT is propagated for EPOLL_URING_WAKE Commit: `aacf2f9f38` ("io_uring: fix req->apoll_events") fixed an issue where poll->events and req->apoll_events weren't synchronized, but then when the commit referenced in Fixes got added, it didn't ensure the same thing. If we mask in EPOLLONESHOT in the regular EPOLL_URING_WAKE path, then ensure it's done for both. Including a link to the original report below, even though it's mostly nonsense. But it includes a reproducer that does show that IORING_CQE_F_MORE is set in the previous CQE, while no more CQEs will be generated for this request. Just ignore anything that pretends this is security related in any way, it's just the typical AI nonsense. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/io-uring/CAM0zi7yQzF3eKncgHo4iVM5yFLAjsiob_ucqyWKs=hyd_GqiMg@mail.gmail.com/ Reported-by: Azizcan Daştan <azizcan.d@mileniumsec.com> Fixes: `4464853277` ("io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 19:18:34 -06:00
Dave Airlie	0389aa7009	amd-drm-next-7.1-2026-04-17: amdgpu: - SMU 14 fixes - Partition fixes - SMUIO 15.x fix - SR-IOV fixes - JPEG fix - PSP 15.x fix - NBIF fix - Devcoredump fixes - DPC fix - RAS fixes - Aldebaran smu fix - IP discovery fix - SDMA 7.1 fix - Runtime pm fix - MES 12.1 fix - DML2 fixes - DCN 4.2 fixes - YCbCr fixes - Freesync fixes - ISM fixes - Overlay cursor fix - DC FP fixes - UserQ locking fixes amdkfd: - Fix memory clear handling -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaeK4eQAKCRC93/aFa7yZ 2Co4AQCujOgX1TufJnBGDCZFrWApIfmFSi1VKAel5aRbm+LkhgD/UrUCiS+tGuoe 52rM30nMWgY7fBlaoAapW8rEixI4gw8= =cRHL -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-7.1-2026-04-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-7.1-2026-04-17: amdgpu: - SMU 14 fixes - Partition fixes - SMUIO 15.x fix - SR-IOV fixes - JPEG fix - PSP 15.x fix - NBIF fix - Devcoredump fixes - DPC fix - RAS fixes - Aldebaran smu fix - IP discovery fix - SDMA 7.1 fix - Runtime pm fix - MES 12.1 fix - DML2 fixes - DCN 4.2 fixes - YCbCr fixes - Freesync fixes - ISM fixes - Overlay cursor fix - DC FP fixes - UserQ locking fixes amdkfd: - Fix memory clear handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260417225351.8714-1-alexander.deucher@amd.com	2026-04-22 11:15:07 +10:00
Carlos Bilbao	2f3835771d	scsi: target: iscsi: reject invalid size Extended CDB AHS If ecdb_ahdr->ahslength is zero, two bugs follow: kmalloc(be16_to_cpu(ecdb_ahdr->ahslength) + 15, ...) allocates 15 bytes, but the immediately following memcpy writes ISCSI_CDB_SIZE (16) bytes into it, a one-byte heap overflow. Also: memcpy(cdb + ISCSI_CDB_SIZE, ecdb_ahdr->ecdb, be16_to_cpu(ecdb_ahdr->ahslength) - 1); (u16)0 - 1 promotes to (int)-1 which converts to SIZE_MAX as size_t, causing a massive out-of-bounds write. Reject ahslength == 0 with ISCSI_REASON_PROTOCOL_ERROR before the kmalloc. Also reject ahslength values that exceed the actual AHS buffer advertised. Fixes: `8f1f7d297b` ("scsi: target: iscsi: Add support for extended CDB AHS") Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://patch.msgid.link/20260415040728.187680-1-carlos.bilbao@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2026-04-21 21:08:25 -04:00
Wang Shuaiwei	b06cf63d83	scsi: ufs: core: Fix bRefClkFreq write failure in HS-LSS mode According to the UFS spec, the bRefClkFreq attribute can only be written when both sub-links are in LS-MODE. However, in HS LSS mode with resetmode = HS_MODE, if the UFS device's default bRefClkFreq value differs from the host controller's dev_ref_clk_freq setting, the write operation will fail. To fix this issue, introduce ufshcd_get_op_mode() function to detect the current link operational mode. Call ufshcd_set_dev_ref_clk() only when both sub-links are in LS-MODE to ensure the attribute can be written successfully. Signed-off-by: Wang Shuaiwei <wangshuaiwei1@xiaomi.com> Link: https://patch.msgid.link/20260414033718.1459540-1-wangshuaiwei1@xiaomi.com Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2026-04-21 20:58:06 -04:00
Linus Torvalds	6596a02b20	drm for 7.1-rc1 (part 2) drm: - Add support for AMD VSDB parsing to drm_edid dma-buf: - fix documentation formatting i915: - add support for reordered pipes to support joined pipes better - Fix VESA backlight possible check condition - Verify the correct plane DDB entry amdgpu: - Audio regression fix - Use drm edid parser for AMD VSDB - Misc cleanups - VCE cs parse fixes - VCN cs parse fixes - RAS fixes - Clean up and unify vram reservation handling - GPU Partition updates - system_wq cleanups - Add CONFIG_GCOV_PROFILE_AMDGPU kconfig option - SMU vram copy updates - SMU 13/14/15 fixes - UserQ fixes - Replace pasid idr with an xarray - Dither handling fix - Enable amdgpu by default for CIK APUs - Add IBs to devcoredump amdkfd: - system_wq cleanups radeon: - system_wq cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmnn8KcACgkQDHTzWXnE hr6pOQ/+O2ELjqV53pgkdL+PVj3s0Rxe3i+dizO3j9eTtYfYsZjSWF1yasis1LpL sglFJtep2K18uI6nnPrti61tPFZviq3rYCBqKRi6A6AT518ZgelBn2v9qHwodiRa 8JSygyFymnfjEXf0JVM076e2l7ZC0/LEdaYb0iKC8htp5D+yQcasRB2algKAiztE RNIOUk4CciNCNHPZwzc6epFGBn5a+kxYZi3KVdjmBsovw0zOIbfe7PJe/VSmXocR RfuDW2ZyWWH8MwlOkrTA68VBLK9lz/sXrWyF5TEBfu73oxKCeuoW/ZQGEQ4tjjT5 yad9MtfQZPaPtDxMBBhlKOd2eGs3xpeoiWKSc/C1Xup0c6IZSpU4e5tB98w3cw96 WEfGKcPwkl27uzGwUaG/xTUTjZtqnmv1tYdtq3on6pDSTp/K9SVX9VtEkywvVDya dAznCqY22CtsBWyb7CJ7Qhj+plu5tJ6ajFCqxNTbH7/mxil8dUobMPyzt3Nu3e8L cNDDnqC3l20N7SSi06m1abyGbvfhnS+2tDJ1xpK6maMCU8tLfebWapns0RfJtTa6 ZytUOED2lkBXhx7jwP0Qyx12RV9Wr3XHmtBLqM1Es97qeRS7HsHUWci9KzxT2Tb2 YUr/y3wGG3C28FPWh1MVnnsSUzl7hTbHtZZE+FXHe80JpkxvYK8= =NmlO -----END PGP SIGNATURE----- Merge tag 'drm-next-2026-04-22' of https://gitlab.freedesktop.org/drm/kernel Pull more drm updates from Dave Airlie: "This is a followup which is mostly next material with some fixes. Alex pointed out I missed one of his AMD MRs from last week, so I added that, then Jani sent the pipe reordering stuff, otherwise it's just some minor i915 fixes and a dma-buf fix. drm: - Add support for AMD VSDB parsing to drm_edid dma-buf: - fix documentation formatting i915: - add support for reordered pipes to support joined pipes better - Fix VESA backlight possible check condition - Verify the correct plane DDB entry amdgpu: - Audio regression fix - Use drm edid parser for AMD VSDB - Misc cleanups - VCE cs parse fixes - VCN cs parse fixes - RAS fixes - Clean up and unify vram reservation handling - GPU Partition updates - system_wq cleanups - Add CONFIG_GCOV_PROFILE_AMDGPU kconfig option - SMU vram copy updates - SMU 13/14/15 fixes - UserQ fixes - Replace pasid idr with an xarray - Dither handling fix - Enable amdgpu by default for CIK APUs - Add IBs to devcoredump amdkfd: - system_wq cleanups radeon: - system_wq cleanups" * tag 'drm-next-2026-04-22' of https://gitlab.freedesktop.org/drm/kernel: (62 commits) drm/i915/display: change pipe allocation order for discrete platforms drm/i915/wm: Verify the correct plane DDB entry drm/i915/backlight: Fix VESA backlight possible check condition drm/i915: Walk crtcs in pipe order drm/i915/joiner: Make joiner "nomodeset" state copy independent of pipe order dma-buf: fix htmldocs error for dma_buf_attach_revocable drm/amdgpu: dump job ibs in the devcoredump drm/amdgpu: store ib info for devcoredump drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault drm/amdgpu: Use amdgpu by default for CIK APUs too drm/amd/display: Remove unused NUM_ELEMENTS macros drm/amd/display: Replace inline NUM_ELEMENTS macro with ARRAY_SIZE drm/amdgpu: save ring content before resetting the device drm/amdgpu: make userq fence_drv drop explicit in queue destroy drm/amdgpu: rework userq fence driver alloc/destroy drm/amdgpu/userq: use dma_fence_wait_timeout without test for signalled drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled drm/amdgpu/userq: add the return code too in error condition drm/amdgpu/userq: fence wait for max time in amdgpu_userq_wait_for_signal drm/amd/display: Change dither policy for 10 bpc output back to dithering ...	2026-04-21 17:39:21 -07:00
Masami Hiramatsu (Google)	453553e1ed	selftests/ftrace: Add a testcase for multiple fprobe events Add a testcase for multiple fprobe events on the same function so that it clears ftrace hash map correctly when removing the events. Link: https://lore.kernel.org/all/177669370353.132053.16801520791509406141.stgit@mhiramat.tok.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-22 09:26:46 +09:00
Masami Hiramatsu (Google)	132001e9f9	selftests/ftrace: Add a testcase for fprobe events on module Add a testcase for fprobe events on module, which unloads a kernel module on which fprobe events are probing and ensure the ftrace hash map is cleared correctly. Link: https://lore.kernel.org/all/177669369564.132053.623527664540176496.stgit@mhiramat.tok.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-22 09:26:37 +09:00
Masami Hiramatsu (Google)	476c5bbae6	tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading Fix fprobe to unregister ftrace_ops if corresponding type of fprobe does not exist on the fprobe_ip_table and it is expected to be empty when unloading modules. Since ftrace thinks that the empty hash means everything to be traced, if we set fprobes only on the unloaded module, all functions are traced unexpectedly after unloading module. e.g. # modprobe xt_LOG.ko # echo 'f:test log_tg*' > dynamic_events # echo 1 > events/fprobes/test/enable # cat enabled_functions log_tg [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 log_tg_check [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 log_tg_destroy [xt_LOG] (1) tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490 # rmmod xt_LOG # wc -l enabled_functions 34085 enabled_functions Link: https://lore.kernel.org/all/177669368776.132053.10042301916765771279.stgit@mhiramat.tok.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2026-04-22 09:24:13 +09:00
Alex Markuze	b1137e0b3d	ceph: add subvolume metrics collection and reporting Add complete infrastructure for per-subvolume I/O metrics collection and reporting to the MDS. This enables administrators to monitor I/O patterns at the subvolume granularity, which is useful for multi-tenant CephFS deployments. This patch adds: - CEPHFS_FEATURE_SUBVOLUME_METRICS feature flag for MDS negotiation - CEPH_SUBVOLUME_ID_NONE constant (0) for unknown/unset state - Red-black tree based metrics tracker for efficient per-subvolume aggregation with kmem_cache for entry allocations - Wire format encoding matching the MDS C++ AggregatedIOMetrics struct - Integration with the existing CLIENT_METRICS message - Recording of I/O operations from file read/write and writeback paths - Debugfs interfaces for monitoring (metrics/subvolumes, metrics/metric_features) Metrics tracked per subvolume include: - Read/write operation counts - Read/write byte counts - Read/write latency sums (for average calculation) The metrics are periodically sent to the MDS as part of the existing metrics reporting infrastructure when the MDS advertises support for the SUBVOLUME_METRICS feature. CEPH_SUBVOLUME_ID_NONE enforces subvolume_id immutability. Following the FUSE client convention, 0 means unknown/unset. Once an inode has a valid (non-zero) subvolume_id, it should not change during the inode's lifetime. Signed-off-by: Alex Markuze <amarkuze@redhat.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:23 +02:00
Alex Markuze	4a1c543479	ceph: parse subvolume_id from InodeStat v9 and store in inode Add support for parsing the subvolume_id field from InodeStat v9 and storing it in the inode for later use by subvolume metrics tracking. The subvolume_id identifies which CephFS subvolume an inode belongs to, enabling per-subvolume I/O metrics collection and reporting. This patch: - Adds subvolume_id field to struct ceph_mds_reply_info_in - Adds i_subvolume_id field to struct ceph_inode_info - Parses subvolume_id from v9 InodeStat in parse_reply_info_in() - Adds ceph_inode_set_subvolume() helper to propagate the ID to inodes - Initializes i_subvolume_id in inode allocation and clears on destroy Signed-off-by: Alex Markuze <amarkuze@redhat.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:23 +02:00
Alex Markuze	e58103caff	ceph: handle InodeStat v8 versioned field in reply parsing Add forward-compatible handling for the new versioned field introduced in InodeStat v8. This patch only skips the field without using it, preparing for future protocol extensions. The v8 encoding adds a versioned sub-structure that needs to be properly decoded and skipped to maintain compatibility with newer MDS versions. Signed-off-by: Alex Markuze <amarkuze@redhat.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:23 +02:00
Raphael Zimmer	1c439de70b	libceph: Fix slab-out-of-bounds access in auth message processing If a (potentially corrupted) message of type CEPH_MSG_AUTH_REPLY contains a positive value in its result field, it is treated as an error code by ceph_handle_auth_reply() and returned to handle_auth_reply(). Thereafter, an attempt is made to send the preallocated message of type CEPH_MSG_AUTH, where the returned value is interpreted as the size of the front segment to send. If the result value in the message is greater than the size of the memory buffer allocated for the front segment, an out-of-bounds access occurs, and the content of the memory region beyond this buffer is sent out. This patch fixes the issue by treating only negative values in the result field as errors. Positive values are therefore treated as success in the same way as a zero value. Additionally, a BUG_ON is added to __send_prepared_auth_request() comparing the len parameter to front_alloc_len to prevent sending the message if it exceeds the bounds of the allocation and to make it easier to catch any logic flaws leading to this. Cc: stable@vger.kernel.org Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:23 +02:00
Dawei Feng	d1fef92e41	rbd: fix null-ptr-deref when device_add_disk() fails do_rbd_add() publishes the device with device_add() before calling device_add_disk(). If device_add_disk() fails after device_add() succeeds, the error path calls rbd_free_disk() directly and then later falls through to rbd_dev_device_release(), which calls rbd_free_disk() again. This double teardown can leave blk-mq cleanup operating on invalid state and trigger a null-ptr-deref in __blk_mq_free_map_and_rqs(), reached from blk_mq_free_tag_set(). Fix this by following the normal remove ordering: call device_del() before rbd_dev_device_release() when device_add_disk() fails after device_add(). That keeps the teardown sequence consistent and avoids re-entering disk cleanup through the wrong path. The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. We reproduced the bug on v7.0 with a real Ceph backend and a QEMU x86_64 guest booted with KASAN and CONFIG_FAILSLAB enabled. The reproducer confines failslab injections to the __add_disk() range and injects fail-nth while mapping an RBD image through /sys/bus/rbd/add_single_major. On the unpatched kernel, fail-nth=4 reliably triggered the fault: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 UID: 0 PID: 273 Comm: bash Not tainted 7.0.0-01247-gd60bc1401583 #6 PREEMPT(lazy) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:__blk_mq_free_map_and_rqs+0x8c/0x240 Code: 00 00 48 8b 6b 60 41 89 f4 49 c1 e4 03 4c 01 e5 45 85 ed 0f 85 0a 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 e9 48 c1 e9 03 <80> 3c 01 00 0f 85 31 01 00 00 4c 8b 6d 00 4d 85 ed 0f 84 e2 00 00 RSP: 0018:ff1100000ab0fac8 EFLAGS: 00000246 RAX: dffffc0000000000 RBX: ff1100000c4806a0 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: ff1100000c4806f4 RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000189001b R10: ff1100000c4800df R11: ff1100006cf37be0 R12: 0000000000000000 R13: 0000000000000000 R14: ff1100000c480700 R15: ff1100000c480004 FS: 00007f0fbe8fe740(0000) GS:ff110000e5851000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe53473b2e0 CR3: 0000000012eef000 CR4: 00000000007516f0 PKRU: 55555554 Call Trace: <TASK> blk_mq_free_tag_set+0x77/0x460 do_rbd_add+0x1446/0x2b80 ? __pfx_do_rbd_add+0x10/0x10 ? lock_acquire+0x18c/0x300 ? find_held_lock+0x2b/0x80 ? sysfs_file_kobj+0xb6/0x1b0 ? __pfx_sysfs_kf_write+0x10/0x10 kernfs_fop_write_iter+0x2f4/0x4a0 vfs_write+0x98e/0x1000 ? expand_files+0x51f/0x850 ? __pfx_vfs_write+0x10/0x10 ksys_write+0xf2/0x1d0 ? __pfx_ksys_write+0x10/0x10 do_syscall_64+0x115/0x690 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f0fbea15907 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffe22346ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000058 RCX: 00007f0fbea15907 RDX: 0000000000000058 RSI: 0000563ace6c0ef0 RDI: 0000000000000001 RBP: 0000563ace6c0ef0 R08: 0000563ace6c0ef0 R09: 6b6435726d694141 R10: 5250337279762f78 R11: 0000000000000246 R12: 0000000000000058 R13: 00007f0fbeb1c780 R14: ff1100000c480700 R15: ff1100000c480004 </TASK> With this fix applied, rerunning the reproducer over fail-nth=1..256 yields no KASAN reports. [ idryomov: rename err_out_device_del -> err_out_device ] Cc: stable@vger.kernel.org Fixes: `27c97abc30` ("rbd: add add_disk() error handling") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:23 +02:00
Viacheslav Dubeyko	3a2e519cd4	crush: cleanup in crush_do_rule() method Commit `41ebcc0907` ("crush: remove forcefeed functionality") from May 7, 2012 (linux-next), leads to the following Smatch static checker warning: net/ceph/crush/mapper.c:1015 crush_do_rule() warn: iterator 'j' not incremented Before commit `41ebcc0907` ("crush: remove forcefeed functionality"), we had this logic: j = 0; if (osize == 0 && force_pos >= 0) { o[osize] = force_context[force_pos]; if (recurse_to_leaf) c[osize] = force_context[0]; j++; /* <-- this was the only increment, now gone / force_pos--; } / then crush_choose_(..., o+osize, j, ...) / Now, the variable j is dead code — a variable that is set and never meaningfully varied. This patch simply removes the dead code. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:23 +02:00
Max Kellermann	cc56430954	ceph: clear s_cap_reconnect when ceph_pagelist_encode_32() fails This MDS reconnect error path leaves s_cap_reconnect set. send_mds_reconnect() sets the bit at the beginning of the reconnect, but the first failing operation after that, ceph_pagelist_encode_32(), can jump to `fail:` without clearing it. __ceph_remove_cap() consults that flag to decide whether cap releases should be queued. A reconnect-preparation failure therefore leaves the session in reconnect mode from the cap-release path's point of view and can strand release work until some later state transition repairs it. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:23 +02:00
Max Kellermann	803447f93d	ceph: only d_add() negative dentries when they are unhashed Ceph can call d_add(dentry, NULL) on a negative dentry that is already present in the primary dcache hash. In the current VFS that is not safe. d_add() goes through __d_add() to __d_rehash(), which unconditionally reinserts dentry->d_hash into the hlist_bl bucket. If the dentry is already hashed, reinserting the same node can corrupt the bucket, including creating a self-loop. Once that happens, __d_lookup() can spin forever in the hlist_bl walk, typically looping only on the d_name.hash mismatch check and eventually triggering RCU stall reports like this one: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=25003319/25003319 fqs=829 rcu: (t=2101 jiffies g=79058445 q=698988 ncpus=192) CPU: 87 UID: 2952868916 PID: 3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023 RIP: 0010:__d_lookup+0x46/0xb0 Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f RSP: 0018:ff745a70c8253898 EFLAGS: 00000282 RAX: ff26e470054cb208 RBX: ff26e470054cb208 RCX: 000000006e958966 RDX: ff26e48267340000 RSI: ff745a70c82539b0 RDI: ff26e458f74655c0 RBP: 000000006e958966 R08: 0000000000000180 R09: 9cd08d909b919a89 R10: ff26e458f74655c0 R11: 0000000000000000 R12: ff26e458f74655c0 R13: ff745a70c82539b0 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f FS: 00007f5770896980(0000) GS:ff26e482c5d88000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5764de50c0 CR3: 000000a72abb5001 CR4: 0000000000771ef0 PKRU: 55555554 Call Trace: <TASK> lookup_fast+0x9f/0x100 walk_component+0x1f/0x150 link_path_walk+0x20e/0x3d0 path_lookupat+0x68/0x180 filename_lookup+0xdc/0x1e0 vfs_statx+0x6c/0x140 vfs_fstatat+0x67/0xa0 __do_sys_newfstatat+0x24/0x60 do_syscall_64+0x6a/0x230 entry_SYSCALL_64_after_hwframe+0x76/0x7e This is reachable with reused cached negative dentries. A Ceph lookup or atomic_open can be handed a negative dentry that is already hashed, and fs/ceph/dir.c then hits one of two paths that incorrectly assume "negative" also means "unhashed": - ceph_finish_lookup(): MDS reply is -ENOENT with no trace -> d_add(dentry, NULL) - ceph_lookup(): local ENOENT fast path for a complete directory with shared caps -> d_add(dentry, NULL) Both paths can therefore re-add an already-hashed negative dentry. Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn) is true. Fix both fs/ceph/dir.c sites the same way: only call d_add() for a negative dentry when it is actually unhashed. If the negative dentry is already hashed, leave it in place and reuse it as-is. This preserves the existing behavior for unhashed dentries while avoiding d_hash list corruption for reused hashed negatives. Cc: stable@vger.kernel.org Fixes: `2817b000b0` ("ceph: directory operations") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:23 +02:00
kexinsun	eff0e55f90	libceph: update outdated comment in ceph_sock_write_space() The function try_write() was renamed to ceph_con_v1_try_write() in commit `566050e17e` ("libceph: separate msgr1 protocol implementation") and subsequently moved to net/ceph/messenger_v1.c in commit `2f713615dd` ("libceph: move msgr1 protocol implementation to its own file"). Update the comment in ceph_sock_write_space() accordingly. [ idryomov: account for msgr2 in the updated comment as well ] Signed-off-by: kexinsun <kexinsun@smail.nju.edu.cn> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:22 +02:00
Eric Biggers	c7aac00c2c	libceph: Remove obsolete session key alignment logic Since the call to crypto_shash_setkey() was replaced with hmac_sha256_preparekey() which doesn't allocate memory regardless of the alignment of the input key, remove the session key alignment logic from process_auth_done(). Also remove the inclusion of crypto/hash.h, which is no longer needed since crypto_shash is no longer used. [ idryomov: rewrap comment ] Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:22 +02:00
Sam Edwards	a0d9555bf9	ceph: fix num_ops off-by-one when crypto allocation fails move_dirty_folio_in_page_array() may fail if the file is encrypted, the dirty folio is not the first in the batch, and it fails to allocate a bounce buffer to hold the ciphertext. When that happens, ceph_process_folio_batch() simply redirties the folio and flushes the current batch -- it can retry that folio in a future batch. However, if this failed folio is not contiguous with the last folio that did make it into the batch, then ceph_process_folio_batch() has already incremented `ceph_wbc->num_ops`; because it doesn't follow through and add the discontiguous folio to the array, ceph_submit_write() -- which expects that `ceph_wbc->num_ops` accurately reflects the number of contiguous ranges (and therefore the required number of "write extent" ops) in the writeback -- will panic the kernel: BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops); This issue can be reproduced on affected kernels by writing to fscrypt-enabled CephFS file(s) with a 4KiB-written/4KiB-skipped/repeat pattern (total filesize should not matter) and gradually increasing the system's memory pressure until a bounce buffer allocation fails. Fix this crash by decrementing `ceph_wbc->num_ops` back to the correct value when move_dirty_folio_in_page_array() fails, but the folio already started counting a new (i.e. still-empty) extent. The defect corrected by this patch has existed since 2022 (see first `Fixes:`), but another bug blocked multi-folio encrypted writeback until recently (see second `Fixes:`). The second commit made it into 6.18.16, 6.19.6, and 7.0-rc1, unmasking the panic in those versions. This patch therefore fixes a regression (panic) introduced by `cac190c767`. Cc: stable@vger.kernel.org Fixes: `d55207717d` ("ceph: add encryption support to writepage and writepages") Fixes: `cac190c767` ("ceph: fix write storm on fscrypted files") Signed-off-by: Sam Edwards <CFSworks@gmail.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:22 +02:00
Raphael Zimmer	5199c125d2	libceph: Prevent potential null-ptr-deref in ceph_handle_auth_reply() If a message of type CEPH_MSG_AUTH_REPLY contains a zero value for both protocol and result, this is currently not treated as an error. In case of ac->negotiating == true and ac->protocol > 0, this leads to setting ac->protocol = 0 and ac->ops = NULL. Thereafter, the check for ac->protocol != protocol returns false, and init_protocol() is not called. Subsequently, ac->ops->handle_reply() is called, which leads to a null pointer dereference, because ac->ops is still NULL. This patch changes the check for ac->protocol != protocol to !ac->protocol, as this also includes the case when the protocol was set to zero in the message. This causes the message to be treated as containing a bad auth protocol. Cc: stable@vger.kernel.org Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-04-22 01:40:22 +02:00
Dave Hansen	932d922285	x86/cpu: Disable FRED when PTI is forced on FRED and PTI were never intended to work together. No FRED hardware is vulnerable to Meltdown and all of it should have LASS anyway. Nevertheless, if you boot a system with pti=on and fred=on, the kernel tries to do what is asked of it and dies a horrible death on the first attempt to run userspace (since it never switches to the user page tables). Disable FRED when PTI is forced on, and print a warning about it. A quick brain dump about what a FRED+PTI implementation would look like is below. I'm not sure it would make any sense to do it, but never say never. All I know is that it's way too complicated to be worth it today. <brain dump> The SWITCH_TO_USER/KERNEL_CR3 bits are simple to fix (or at least we have the assembly tools to do it already), as is sticking the FRED entry text in .entry.text (it's not in there today). The nasty part is the stacks. Today, the CPU pops into the kernel on MSR_IA32_FRED_RSP0 which is normal old kernel memory and not mapped to userspace. The hardware pushes gunk on to MSR_IA32_FRED_RSP0, which is currently the task stacks. MSR_IA32_FRED_RSP0 would need to point elsewhere, probably cpu_entry_stack(). Then, start playing games with stacks on entry/exit, including copying gunk to and from the task stack. While I'd like to have PTI everywhere, I'm not sure it's worth mucking up the FRED code with PTI kludges. If a user wants fast entry/exit, they use FRED. If you want PTI (and sekuritay), you certainly don't care about fast entry and FRED isn't going to help you all that much, so you can just stay with the IDT. Plus, FRED hardware should have LASS which gives you a similar security profile to PTI without the CR3 munging. </brain dump> Reported-by: Gayatri Kammela <Gayatri.Kammela@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Cc:stable@vger.kernel.org Link: https://patch.msgid.link/20260421163136.E7C6788A@davehans-spike.ostc.intel.com	2026-04-21 15:11:40 -07:00
Linus Torvalds	d46dd0d883	f2fs-for-7.1-rc1 In this round, the changes primarily focus on resolving race conditions, memory safety issues (UAF), and improving the robustness of garbage collection (GC), and folio management. Enhancement: - add page-order information for large folio reads in iostat - add defrag_blocks sysfs node Bug fix: - fix uninitialized kobject put in f2fs_init_sysfs() - disallow setting an extension to both cold and hot - fix node_cnt race between extent node destroy and writeback - fix to preserve previous reserve_{blocks,node} value when remount - fix to freeze GC and discard threads quickly - fix false alarm of lockdep on cp_global_sem lock - fix data loss caused by incorrect use of nat_entry flag - fix to skip empty sections in f2fs_get_victim - fix inline data not being written to disk in writeback path - fix fsck inconsistency caused by FGGC of node block - fix fsck inconsistency caused by incorrect nat_entry flag usage - call f2fs_handle_critical_error() to set cp_error flag - fix fiemap boundary handling when read extent cache is incomplete - fix use-after-free of sbi in f2fs_compress_write_end_io() - fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io() - fix incorrect file address mapping when inline inode is unwritten - fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled - fix to avoid memory leak in f2fs_rename() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmnn7+kACgkQQBSofoJI UNIknA//ScYLuOhOmJJNBfmkEoUe5es04YRRq1OOBAvOCGw+Z/qg9unel9Qpneqg 0xQ35rLKL6q7Y592ZOgWyipFTGhDBEbdJNP6eI9avBURoj9sFjDhFlmkVuUhjsns IgOSVgWSWqijWZOcBQbJGEm+N/W81Ktee1RUIDkcti66/uYIS+roTLDLbIyEhvkT DhsmUnYwoMy9cB5ag9rZuSWvEa8TI7UbelH78Oi/TqRYJu6ax+D99s6PzOFBH1EE FwNGoEMn3r1+2gqPVzDmtrz7A/cYtHVigaUT9d8/n2yygZhGaQ8whd0QoIlikgcW 9n7Ymo3sns/yLEJURFqkB6Q5yFcZ30jRJZJb5CMNeqtuHQFoLjtcpEWqiQKGzzKY uUATMoG7F3QSn8AOVt6GaxnpvNb/NiVZ1Fsvt1Cgq8hUjxf1v2AhHZnvcK0EDAqa PvEYSriB56Qtnt1UfbNqydxSiviDDjtaHDprFIvAyEavDCs2F7gzrHEW7IHzG2XR Io9hnaBNUJs065zU8qWHyetIZCjPySnPOkZ42eaMEsDMhDtlC3WDOB3ZkmFnh9u2 2K/SaIpQInGyP2LGLzNB/khWhDcZ4aGciCd7b5Ul9WkrfZTzrN9XI/F2w7dr0R6q tE6xJThraGk7NjO67xUq/M2KnVAHN5gTPRY9OmEboEdTO+6pC5w= =0oeQ -----END PGP SIGNATURE----- Merge tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, the changes primarily focus on resolving race conditions, memory safety issues (UAF), and improving the robustness of garbage collection (GC), and folio management. Enhancements: - add page-order information for large folio reads in iostat - add defrag_blocks sysfs node Bug fixes: - fix uninitialized kobject put in f2fs_init_sysfs() - disallow setting an extension to both cold and hot - fix node_cnt race between extent node destroy and writeback - preserve previous reserve_{blocks,node} value when remount - freeze GC and discard threads quickly - fix false alarm of lockdep on cp_global_sem lock - fix data loss caused by incorrect use of nat_entry flag - skip empty sections in f2fs_get_victim - fix inline data not being written to disk in writeback path - fix fsck inconsistency caused by FGGC of node block - fix fsck inconsistency caused by incorrect nat_entry flag usage - call f2fs_handle_critical_error() to set cp_error flag - fix fiemap boundary handling when read extent cache is incomplete - fix use-after-free of sbi in f2fs_compress_write_end_io() - fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io() - fix incorrect file address mapping when inline inode is unwritten - fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled - avoid memory leak in f2fs_rename()" * tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (35 commits) f2fs: add page-order information for large folio reads in iostat f2fs: do not support mmap write for large folio f2fs: fix uninitialized kobject put in f2fs_init_sysfs() f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show() f2fs: disallow setting an extension to both cold and hot f2fs: fix node_cnt race between extent node destroy and writeback f2fs: allow empty mount string for Opt_usr\|grp\|projjquota f2fs: fix to preserve previous reserve_{blocks,node} value when remount f2fs: invalidate block device page cache on umount f2fs: fix to freeze GC and discard threads quickly f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer f2fs: fix false alarm of lockdep on cp_global_sem lock f2fs: fix data loss caused by incorrect use of nat_entry flag f2fs: fix to skip empty sections in f2fs_get_victim f2fs: fix inline data not being written to disk in writeback path f2fs: fix fsck inconsistency caused by FGGC of node block f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally f2fs: refactor node footer flag setting related code f2fs: refactor f2fs_move_node_folio function ...	2026-04-21 14:50:04 -07:00
Len Brown	3ae6bafa10	tools/power turbostat: Fix AMD RAPL regression on big systems turbostat.c:8688: rapl_perf_init: Assertion `next_domain < num_domains' failed. The initial fix for this regression was incomplete, as it did not handle multi-package systems with sparse core ids. Fixes: `ef0e60083f` ("tools/power turbostat: Fix AMD RAPL regression") Signed-off-by: Len Brown <len.brown@intel.com>	2026-04-21 17:35:40 -04:00
Linus Torvalds	bb0bc49a1c	dax changes for 7.1 The new FUSE file system requires some DAX changes. * dax/fsdev: fix uninitialized kaddr in fsdev_dax_zero_page_range() * dax: export dax_dev_get() * dax: Add fs_dax_get() func to prepare dax for fs-dax usage * dax: Add dax_set_ops() for setting dax_operations at bind time * dax: Add dax_operations for use by fs-dax on fsdev dax * dax: Save the kva from memremap * dax: add fsdev.c driver for fs-dax on character dax * dax: Factor out dax_folio_reset_order() helper * dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQSgX9xt+GwmrJEQ+euebuN7TNx1MQUCaeeclRQcaXJhLndlaW55 QGludGVsLmNvbQAKCRCebuN7TNx1MZvgAQCTVqx7CbsR4qWpdXCreaetqJJUjwI5 iCvJJvdF1zLlngD9FAvnFv1/o/KktCnfZw1CAWadFrdOtyYDASYWS0mgGA8= =8YBs -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax updates from Ira Weiny: "The series adds DAX support required for the upcoming fuse/famfs file system.[1] The support here is required because famfs is backed by devdax rather than pmem. This all lays the groundwork for using shared memory as a file system" Link: https://lore.kernel.org/all/0100019d43e5f632-f5862a3e-361c-4b54-a9a6-96c242a8f17a-000000@email.amazonses.com/ [1] * tag 'libnvdimm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax/fsdev: fix uninitialized kaddr in fsdev_dax_zero_page_range() dax: export dax_dev_get() dax: Add fs_dax_get() func to prepare dax for fs-dax usage dax: Add dax_set_ops() for setting dax_operations at bind time dax: Add dax_operations for use by fs-dax on fsdev dax dax: Save the kva from memremap dax: add fsdev.c driver for fs-dax on character dax dax: Factor out dax_folio_reset_order() helper dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c	2026-04-21 14:12:01 -07:00
Timur Kristóf	11b31549b6	drm/amd/display: Disable 10-bit truncation and dithering on DCE 6.x DCE 6.x doesn't support 10-bit truncation and 10-bit dithering because the following fields are 1-bit only: FMT_TEMPORAL_DITHER_DEPTH FMT_SPATIAL_DITHER_DEPTH FMT_TRUNCATE_DEPTH Programming these fields to "2" will program them as if the dithering option was 6-bit, resulting in sub-par picture quality and an ugly "color banding" effect. Note that a recent commit changed the default 10-bit dithering option to DITHER_OPTION_SPATIAL10 which improves the picture quality because it happens to look better, but is still not actually supported by DCE 6.x versions. When the color depth is 10-bit or more, just disable any kind of dithering options on DCE 6.x. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5151 Fixes: `529cad0f94` ("drm/amd/display: Add function to set dither option") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6be8ced880dfe29ce38c2d5e74489822da5c250e)	2026-04-21 17:03:33 -04:00
Siwei He	778bf584f2	drm/amdgpu: OR init_pte_flags into invalid leaf PTE updates Invalid leaf clears that only set AMDGPU_PTE_EXECUTABLE match the old GMC9 fault-priority workaround but omit adev->gmc.init_pte_flags. On GFX12 that includes AMDGPU_PTE_IS_PTE; without it, some cleared PTEs can fault as no-retry and bypass the SVM/XNACK handler when a VA is reused after a BO unmap. Apply init_pte_flags in amdgpu_vm_pte_update_flags() alongside EXECUTABLE so range-driven clears (e.g. amdgpu_vm_clear_freed) match amdgpu_vm_pt_clear() for leaf templates. Signed-off-by: Siwei He <siwei.he@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9d47b2c36b9a6c6b844c33cab407a5d7ad102234)	2026-04-21 17:03:25 -04:00
Linus Torvalds	c94faa7cc4	coda cleanups and fixes Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaefXWAAKCRBZ7Krx/gZQ 6xunAP4oQllf7EXF3pGmLcVSqG8O08FXvvgaTrIG5s89cpXL5QEAz8//D6i7lsj9 MyfQ6RiL4v2AOQm90SxCtJPHZ1LJYAM= =ZVnL -----END PGP SIGNATURE----- Merge tag 'pull-coda' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull coda dcache updates from Al Viro: "Coda dcache-related cleanups and fixes" * tag 'pull-coda' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coda_flag_children(): fix a UAF sanitize coda_dentry_delete() coda: is_bad_inode() is always false there	2026-04-21 14:03:10 -07:00
Mario Limonciello	0e48f27d13	drm/amd: Adjust ASPM support quirk to cover more Intel hosts Some of the same issues identified in commit `c770ef1967` ("drm/amd/amdgpu: disable ASPM in some situations") also affect Tiger Lake systems with GFX11 connected over USB4. Widen the net to also match these hosts. Fixes: `d9b3a066df` ("drm/amd: Exclude dGPUs in eGPU enclosures from DPM quirks") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5145 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0a214d888485b9f35fe03882a92962e6d5697849)	2026-04-21 17:03:01 -04:00
Leo Li	49ed6f4540	drm/amd/display: Undo accidental fix revert in amdgpu_dm_ism.c [Why] Pausing DPM power profiles during static screen caused a bunch of audio/performance/clock issues that were addressed in this fix: 'commit `1412482b71` ("Revert "drm/amd/display: pause the workload setting in dm"")' This logic in function amdgpu_dm_crtc_vblank_control_worker() was moved to amdgpu_dm_ism.c, but the fix was lost in the process. [How] Reapply the fix to amdgpu_dm_ism.c Fixes: `754003486c` ("drm/amd/display: Add Idle state manager(ISM)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bc621e91d6fc004cfae9148c5a91acad19ada3e4)	2026-04-21 17:02:45 -04:00
Linus Torvalds	e2683c8868	Crypto library fix and documentation update for 7.1 - Fix an integer underflow in the mpi library - Improve the crypto library documentation -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaefDABQcZWJpZ2dlcnNA a2VybmVsLm9yZwAKCRDzXCl4vpKOKy6mAQDjR6/V4IukSFyMaWYJBGGOZNRUwLda DwHkFLpfAKJP4AD6AovNzxLksG2oms6pkcvPiDVtT2maRr+MiVlc0CpKJQg= =DkWb -----END PGP SIGNATURE----- Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull more crypto library updates from Eric Biggers: "Crypto library fix and documentation update: - Fix an integer underflow in the mpi library - Improve the crypto library documentation" * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: docs: Add rst documentation to Documentation/crypto/ docs: kdoc: Expand 'at_least' when creating parameter list lib/crypto: mpi: Fix integer underflow in mpi_read_raw_from_sgl()	2026-04-21 11:46:22 -07:00
Pavel Begunkov	770594e78c	io_uring/zcrx: warn on freelist violations The freelist is appropriately sized to always be able to take a free niov, but let's be more defensive and check the invariant with a warning. That should help to catch any double-free issues. Suggested-by: Kai Aizen <kai@snailsploit.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/2f3cea363b04649755e3b6bb9ab66485a95936d5.1776760901.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:19:11 -06:00
Pavel Begunkov	4f02cc4071	io_uring/zcrx: clear RQ headers on init It might be unexpected to users if the RQ head/tail after a ring creation are not zeroed, fix that. Cc: stable@vger.kernel.org Fixes: `6f377873cb` ("io_uring/zcrx: add interface queue and refill queue") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/331f94663c3e8f021ffa3cb770ca2844a07d4855.1776760911.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:19:11 -06:00
Pavel Begunkov	0fcccfd871	io_uring/zcrx: fix user_struct uaf io_free_rbuf_ring() usees a struct user_struct, which io_zcrx_ifq_free() puts it down before destroying the ring. Cc: stable@vger.kernel.org Fixes: `5c686456a4` ("io_uring/zcrx: add user_struct and mm_struct to io_zcrx_ifq") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/e560ae00960d27a810522a7efc0e201c82dff351.1776760917.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:19:11 -06:00
Jens Axboe	45cd95763e	io_uring/register: fix ring resizing with mixed/large SQEs/CQEs The ring resizing only properly handles "normal" sized SQEs or CQEs, if there are pending entries around a resize. This normally should not be the case, but the code is supposed to handle this regardless. For the mixed SQE/CQE cases, the current copying works fine as they are indexed in the same way. Each half is just copied separately. But for fixed large SQEs and CQEs, the iteration and copy need to take that into account. Cc: stable@kernel.org Fixes: `79cfe9e59c` ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:19:08 -06:00
Jens Axboe	7faaa6812a	io_uring/futex: ensure partial wakes are appropriately dequeued If a FUTEX_WAITV vectored operation is only partially woken, we should call __futex_wake_mark() on the queue to account for that. If not, then a later wakeup will wake the same entry, rather than the next one in line. Fixes: `8f350194d5` ("io_uring: add support for vectored futex waits") Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:19:06 -06:00
Jens Axboe	7996883455	io_uring/rw: add defensive hardening for negative kbuf lengths No real bug here, just being a bit defensive in ensuring that whatever gets passed into io_put_kbuf() is always >= 0 and not some random error value. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:19:03 -06:00
Jens Axboe	02b8d41c17	io_uring/rsrc: use kvfree() for the imu cache Currently anything that requires kvmalloc_flex() for allocations will not get re-cached, and hence the cache freeing path is correct in that it always uses kfree() to free the allocated memory. But this seems a bit fragile as it's something that could get mix should that situation change, so switch io_free_imu() and io_alloc_cache_free() to use kvfree as the desctructor. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:19:01 -06:00
Jens Axboe	53262c91f7	io_uring/rsrc: unify nospec indexing for direct descriptors For file updates, the node reset isn't capping the value via array_index_nospec() like the other paths do. Ensure it's all sane and have the update path do the proper capping as well. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:18:54 -06:00
Jens Axboe	8e1f412b5b	io_uring: fix spurious fput in registered ring path Fix an issue with io_uring_ctx_get_file() not gating fput() on whether or not the file descriptor is a registered/direct one or not. Fixes: `c5e9f6a96b` ("io_uring: unify getting ctx from passed in file descriptor") Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-04-21 12:18:44 -06:00
Linus Torvalds	6fdca3c5ab	Changes since the last update: - Fix dirent nameoff handling to avoid out-of-bound reads out of crafted images - Fix two type truncation issues on 32-bit platforms -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmnnueoRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qq96Q/9E0jTTtPQiuMe9vL0+vDftVMDSQXprjaW J6Ax14gt1VAFfvTT04tf9BYXtdilJP5S8uP/mwi9Z6mHmE50cRbGzmRmvBhL1F9b cgJvYQYABGpQ8mSYyUQ02lT/ZpjuUVorgwYzcYvYL6ZDPdKq/ANgBC7ZfJHZG0Kh uEZyCbujmn6wFSmsPkpMWBoG2uwuD/zkZIr/+/V8CMlp7BhpcBWYnIi0cziF4QnY IGhzLj75e5jsyOFOxmF26ubDh1vCSkkXkE7h2UP+7H11RK4kh4tZ01Fgc6A8sK8z umXHKFb2N9naAFIkJuOJsYjqIQgVWSdEbu0IGarfxH+rd9D22SS0U0mt++CGlcPE a1lhmCNh77jdcgqzny+KUhga/FmGNpPw9S/2MvRc3ybUCOCzpcT7YwKlLyYF9u8F 0eXrLLtrveibmIriVyYHzt3xJnRM+MaJqULTiCs+AsgM6HY2ZjU1hbpfjBeiAj24 PqmSXAxRleACQJv/qwyJ8h7/Fm53ikjOlRnhKjQxreIDW7F0/3FYTwE9nbfkPSaR fXPUR8YXzIO9rpBXOplX+HHkQMHE0SAH64nU9bMOXzo+FV4kh9RfmZtudQCilegW /k5IGNmnXybRVkgafZdrz5iBTCw+k0OM9gE145bJfevNwQNS+SsJD/HO38H1asy7 Jr1NL9qQ8Io= =sbRm -----END PGP SIGNATURE----- Merge tag 'erofs-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Fix dirent nameoff handling to avoid out-of-bound reads out of crafted images - Fix two type truncation issues on 32-bit platforms * tag 'erofs-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: unify lcn as u64 for 32-bit platforms erofs: fix offset truncation when shifting pgoff on 32-bit platforms erofs: fix the out-of-bounds nameoff handling for trailing dirents	2026-04-21 11:16:04 -07:00

... 63 64 65 66 67 ...

1447039 Commits