linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-04 20:46:48 +02:00

Author	SHA1	Message	Date
Christian König	44e5bc73bd	drm/amdgpu: rework amdgpu_userq_signal_ioctl v3 This one was fortunately not looking so bad as the wait ioctl path, but there were still a few things which could be fixed/improved: 1. Allocating with GFP_ATOMIC was quite unnecessary, we can do that before taking the userq_lock. 2. Use a new mutex as protection for the fence_drv_xa so that we can do memory allocations while holding it. 3. Starting the reset timer is unnecessary when the fence is already signaled when we create it. 4. Cleanup error handling, avoid trying to free the queue when we don't even got one. v2: fix incorrect usage of xa_find, destroy the new mutex on error v3: cleanup ref ordering Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1609eb0f81a609d350169839128cecf298c84e7a)	2026-05-11 17:46:43 -04:00
Christian König	d5971c5c34	drm/amdgpu: remove deadlocks from amdgpu_userq_pre_reset The purpose of a GPU reset is to make sure that fence can be signaled again and the signal and resume workers can make progress again. So waiting for the resume worker or any fence in the GPU reset path is just utterly nonsense. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fcd5f065eab46993af43442fd77ee8d9eb9c5bdf)	2026-05-11 17:46:34 -04:00
Matthew Auld	155a372a1c	drm/xe/dma-buf: fix UAF with retry loop Retry doesn't work here, since bo will be freed on error, leading to UAF. However, now that we do the alloc & init before the attach, we can now combine this as one unit and have the init do the alloc for us. This should make the retry safe. Reported by Sashiko. v2: Fix up the error unwind (CI) Closes: https://sashiko.dev/#/patchset/20260506184332.86743-2-matthew.auld%40intel.com Fixes: `eb289a5f6c` ("drm/xe: Convert xe_dma_buf.c for exhaustive eviction") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.18+ Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260508102635.149172-4-matthew.auld@intel.com (cherry picked from commit 479669418253e0f27f8cf5db01a731352ea592e7) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-05-11 16:46:18 -04:00
Matthew Auld	981bedbbe6	drm/xe/dma-buf: handle empty bo and UAF races There look to be some nasty races here when triggering the invalidate_mappings hook: 1) We do xe_bo_alloc() followed by the attach, before the actual full bo init step in xe_dma_buf_init_obj(). However the bo is visible on the attachments list after the attach. This is bad since exporter driver, say amdgpu, can at any time call back into our invalidate_mappings hook, with an empty/bogus bo, leading to potential bugs/crashes. 2) Similar to 1) but here we get a UAF, when the invalidate_mappings hook is triggered. For example, we get as far as xe_bo_init_locked() but this fails in some way. But here the bo will be freed on error, but we still have it attached from dma-buf pov, so if the invalidate_mappings is now triggered then the bo we access is gone and we trigger UAF and more bugs/crashes. To fix this, move the attach step until after we actually have a fully set up buffer object. Note that the bo is not published to userspace until later, so not sure what the comment "Don't publish the bo until we have a valid attachment", is referring to. We have at least two different customers reporting hitting a NULL ptr deref in evict_flags when importing something from amdgpu, followed by triggering the evict flow. Hit rate is also pretty low, which would hint at some kind of race, so something like 1) or 2) might explain this. v2: - Shuffle the order of the ops slightly (no functional change) - Improve the comment to better explain the ordering (Matt B) Assisted-by: Gemini:gemini-3 #debug Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7903 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/4055 Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260508102635.149172-3-matthew.auld@intel.com (cherry picked from commit af1f2ad0c59fe4e2f924c526f66e968289d77971) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-05-11 16:46:12 -04:00
Matt Roper	4568dfbb83	drm/xe: Make decision to use Xe2-style blitter instructions a feature flag The blitter engines' MEM_COPY and MEM_SET instructions were added as part of the same hardware change that introduced service copy engines (i.e., BCS1-BCS8) which is why the driver checks for service copy engine presence when deciding whether to use these instructions or the older XY_* instructions. However when making this decision the driver should consider which engines are part of the hardware architecture, not which engines are present/usable on the current device. For graphics IP versions that architecturally include service copy engines (i.e., everything Xe2 and later, plus PVC's Xe_HPC) we should use MEM_SET and MEM_COPY even in if all of the service copy engines wind up getting fused off. I.e., we need to decide based on whether the platform's graphics descriptor contains these engines, rather than whether the usable engine mask contains them. This logic got broken when gt->info.__engine_mask was removed, although in practice that mistake has been harmless so far because there haven't been any hardware SKUs that fuse off all of the service copy engines yet. Replace the incorrect has_service_copy_support() function with a GT feature flag that tracks more accurately whether the new blitter instructions are usable. In addition to fixing incorrect logic if all service copies are fused off, the flag also makes it more obvious what the calling code is trying to do; previously it wasn't terribly obvious why "has service copy engines" was being used as the condition for using different instructions on all copy engine types. The new feature flag is named 'has_xe2_blt_instructions' because we expect this flag to be set for all Xe2 and later platforms (i.e., everything officially supported by the Xe driver). Technically there's also one Xe1-era platform (PVC) that supports these engines/instructions and will set this flag, but this still seems to be the most clear and understandable name for the flag. Fixes: `61549a2ee5` ("drm/xe: Drop __engine_mask") Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patch.msgid.link/20260507-xe2_copy-v1-1-26506381b821@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 09b399842907565a64e351fb22da790b4c673ffb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-05-11 16:46:06 -04:00
Arvind Yadav	92f3403a7c	drm/xe/madvise: Track purgeability with BO-local counters xe_bo_recompute_purgeable_state() walks all VMAs of a BO to determine whether the BO can be made purgeable. This makes VMA create/destroy and madvise updates O(n) in the number of mappings. Replace the walk with BO-local counters protected by the BO dma-resv lock: - vma_count tracks the number of VMAs mapping the BO. - willneed_count tracks active WILLNEED holders, including WILLNEED VMAs and active dma-buf exports for non-imported BOs. A DONTNEED BO is promoted back to WILLNEED on a 0->1 transition of willneed_count. A BO is demoted to DONTNEED on a 1->0 transition only when it still has VMAs, preserving the previous behaviour where a BO with no mappings keeps its current madvise state. PURGED remains terminal, preserving the existing "once purged, always purged" rule. Fixes: `4f44961eab` ("drm/xe/vm: Prevent binding of purged buffer objects") v2: - Use early return for imported BOs in all four helpers to avoid nesting (Matt B). - Group purgeability state into a purgeable sub-struct on struct xe_bo (Matt B). - Reword xe_bo_willneed_put_locked() kernel-doc to explain that a 1->0 transition means all remaining active VMAs are DONTNEED (Matt B). v3: - Move DONTNEED/PURGED reject from vma_lock_and_validate() into xe_vma_create(), gated on attr->purgeable_state == WILLNEED. Fixes vm_bind bypass and partial-unbind rejection on DONTNEED BOs (Matt B). - Drop .check_purged from MAP and REMAP; keep it for PREFETCH and add a comment why (Matt B). - Skip BO validation in vma_lock_and_validate() for non-WILLNEED VMA remnants so cleanup/remap paths do not repopulate DONTNEED/PURGED BOs. Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260506132027.2556046-1-arvind.yadav@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> (cherry picked from commit 23fb2ea56cb4fa2587bc072b04e4e698687a48e4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-05-11 16:46:00 -04:00
Guopeng Zhang	5dd74441cb	cgroup/cpuset: Reserve DL bandwidth only for root-domain moves cpuset_can_attach() currently adds the bandwidth of all migrating SCHED_DEADLINE tasks to sum_migrate_dl_bw. If the source and destination cpuset effective CPU masks do not overlap, the whole sum is then reserved in the destination root domain. set_cpus_allowed_dl(), however, subtracts bandwidth from the source root domain only when the affinity change really moves the task between root domains. A DL task can move between cpusets that are still in the same root domain, so including that task in sum_migrate_dl_bw can reserve destination bandwidth without a matching source-side subtraction. Share the root-domain move test with set_cpus_allowed_dl(). Keep nr_migrate_dl_tasks counting all migrating deadline tasks for cpuset DL task accounting, but add to sum_migrate_dl_bw only for tasks that need a root-domain bandwidth move. Keep using the destination cpuset effective CPU mask and leave the broader can_attach()/attach() transaction model unchanged. Fixes: `2ef269ef1a` ("cgroup/cpuset: Free DL BW in case can_attach() fails") Cc: stable@vger.kernel.org # v6.10+ Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2026-05-11 10:27:14 -10:00
Frank Li	63d2059cd6	pinctrl: imx1: Allow parsing DT without function nodes The old format to define pinctrl settings for imx in DT has two hierarchy levels. The first level are function device nodes. The second level are pingroups which contain a property fsl,pins. The original ntention was to define all pin functions in a single dtsi file and just reference the correct ones in the board files. The commit ("5fcdf6a7ed95e pinctrl: imx: Allow parsing DT without function nodes") already make moden i.MX chip support flatten layout. Make legacy chipes (more than 15 years) support this flatten layout also. Fixes: `e948cbdc41` ("ARM: dts: imx: remove redundant intermediate node in pinmux hierarchy") Tested-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Linus Walleij <linusw@kernel.org>	2026-05-11 21:55:23 +02:00
Raphael Zimmer	35d0ed82d0	libceph: Fix potential out-of-bounds access in osdmap_decode() When decoding osd_state and osd_weight from an incoming osdmap in osdmap_decode(), both are decoded for each osd, i.e., map->max_osd times. The ceph_decode_need() check only accounts for sizeof(map->osd_weight) once. This can potentially result in an out-of-bounds memory access if the incoming message is corrupted such that the max_osd value exceeds the actual content of the osdmap message. This patch fixes the issue by changing the corresponding part in the ceph_decode_need() check to account for map->max_osdsizeof(*map->osd_weight). Cc: stable@vger.kernel.org Fixes: `dcbc919a5d` ("libceph: switch osdmap decoding to use ceph_decode_entity_addr") Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-05-11 20:53:53 +02:00
Sven Eckelmann	77098e4bea	batman-adv: tp_meter: fix tp_vars reference leak in receiver shutdown The receiver shutdown timer handler, batadv_tp_receiver_shutdown(), is responsible for releasing the tp_vars reference it holds. However, the existing logic for coordinating this release with batadv_tp_stop_all() was flawed. timer_shutdown_sync() guarantees the timer will not fire again after it returns, but it returns non-zero only when the timer was pending at the time of the call. If the timer had already expired (and batadv_tp_stop_all() would unsucessfully try to rearm itself), batadv_tp_stop_all() skips its batadv_tp_vars_put(), and batadv_tp_receiver_shutdown() fails to put its own reference as well. Fix this by introducing a new atomic variable receiving that is set to 1 when the receiver is initialized and cleared atomically with atomic_xchg() by whichever side claims it first. Only the side that observes the transition from 1 to 0 is responsible for releasing the tp_vars timer reference, eliminating the uncertainty. Cc: stable@kernel.org Fixes: `3d3cf6a731` ("batman-adv: stop tp_meter sessions during mesh teardown") Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-11 19:54:49 +02:00
Luxiao Xu	94f3b13316	batman-adv: fix tp_meter counter underflow during shutdown batadv_tp_sender_shutdown() unconditionally decrements the "sending" atomic counter. If multiple paths (e.g. timeout, user cancel, and normal finish) call this function, the counter can underflow to -1. Since the sender logic treats any non-zero value as "still sending", a negative value causes the sender kthread to loop indefinitely. This leads to a use-after-free when the interface is removed while the zombie thread is still active. Fix this by using atomic_xchg() to ensure the counter only transitions from 1 to 0 once. Fixes: `33a3bb4a33` ("batman-adv: throughput meter implementation") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Luxiao Xu <rakukuip@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> [sven: added missing change in batadv_tp_send] Signed-off-by: Sven Eckelmann <sven@narfation.org>	2026-05-11 19:52:40 +02:00
Jens Axboe	a65855ec34	io_uring: hold uring_lock across io_kill_timeouts() in cancel path io_uring_try_cancel_requests() dropped ctx->uring_lock before calling io_kill_timeouts(), which walks each timeout's link chain via io_match_task() to test REQ_F_INFLIGHT. With chain mutation now serialized by ctx->uring_lock, that walk needs the lock too. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-05-11 11:14:38 -06:00
Jens Axboe	49ae66eb8c	io_uring: defer linked-timeout chain splice out of hrtimer context io_link_timeout_fn() is the hrtimer callback that fires when a linked timeout expires. It currently calls io_remove_next_linked(prev) under ctx->timeout_lock to splice the timeout request out of the link chain. This is the only chain-mutation site that runs without ctx->uring_lock, because hrtimer callbacks cannot take a mutex. Defer the splicing until the task_work callback. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-05-11 11:14:34 -06:00
Jens Axboe	20c39819a2	io_uring: hold uring_lock when walking link chain in io_wq_free_work() io_wq_free_work() calls io_req_find_next() from io-wq worker context, which reads and clears req->link without holding any lock. This can potentially race with other paths that mutate the same chain under ctx->uring_lock. Take ctx->uring_lock around the io_req_find_next() call. Only requests with IO_REQ_LINK_FLAGS reach this path, which is not the hot path. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-05-11 11:14:29 -06:00
Maurizio Lombardi	37953cec77	nvme: fix race condition between connected uevent and STARTED_ONCE flag When a controller connects, nvme_start_ctrl() emits the "NVME_EVENT=connected" uevent and sets the NVME_CTRL_STARTED_ONCE flag. Currently, the uevent is emitted before the flag is set. This creates a race condition for userspace tools (like udev rules) that might rely on the "connected" event to configure other attributes. Swap the order of operations in nvme_start_ctrl() so that the NVME_CTRL_STARTED_ONCE flag is set before the uevent is sent. This guarantees that the admin_timeout can already be changed when userspace is notified. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-05-11 10:07:20 -07:00
Rafael J. Wysocki	e4865a56d0	ACPI: driver: Check ACPI_COMPANION() against NULL during probe Since every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), platform drivers that rely on the existence of a device's ACPI companion object should verify its presence. Accordingly, add requisite ACPI_COMPANION() or ACPI_HANDLE() checks against NULL to 13 platform drivers handling core ACPI devices. Also change the value returned by the ACPI thermal zone driver when the device's ACPI companion is not present to -ENODEV for consistency with the other drivers. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/4516068.ejJDZkT8p0@rafael.j.wysocki Cc: 7.0+ <stable@vger.kernel.org> # 7.0+	2026-05-11 18:50:06 +02:00
Jann Horn	c1fa0bb633	exit: prevent preemption of oopsing TASK_DEAD task When an already-exiting task oopses, make_task_dead() currently calls do_task_dead() with preemption enabled. That is forbidden: do_task_dead() calls __schedule(), which has a comment saying "WARNING: must be called with preemption disabled!". If an oopsing task is preempted in do_task_dead(), between becoming TASK_DEAD and entering the scheduler explicitly, bad things happen: finish_task_switch() assumes that once the scheduler has switched away from a TASK_DEAD task, the task can never run again and its stack is no longer needed; but that assumption apparently doesn't hold if the dead task was preempted (the SM_PREEMPT case). This means that the scheduler ends up repeatedly dropping references on the dead task's stack, which can lead to use-after-free or double-free of the entire task stack; in other words, two tasks can end up running on the same stack, resulting in various kinds of memory corruption. (This does not just affect "recursively oopsing" tasks; it is enough to oops once during task exit, for example in a file_operations::release handler) Fixes: `7f80a2fd7d` ("exit: Stop poorly open coding do_task_dead in make_task_dead") Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-05-11 08:55:11 -07:00
Alexei Starovoitov	d25e65a8b8	Merge branch 'bpf-fix-call-offset-truncation-and-oob-read-in-bpf_patch_call_args' Yazhou Tang says: ==================== bpf: Fix call offset truncation and OOB read in bpf_patch_call_args() From: Yazhou Tang <tangyazhou518@outlook.com> This patchset addresses a silent truncation bug in the BPF verifier that occurs when a bpf-to-bpf call involves a massive relative jump offset. Additionally, it fixes a pre-existing out-of-bounds (OOB) read issue in the interpreter fallback path. Because the BPF instruction set utilizes a 32-bit imm field for bpf-to-bpf calls, implicitly downcasting it to the 16-bit insn->off in bpf_patch_call_args() causes incorrect call targets or subprog ID resolution for large BPF programs. While fixing this by swapping the imm and off fields, it was discovered that the original code also had a load-time OOB read vulnerability when the stack depth exceeds MAX_BPF_STACK during JIT fallback. Patch 1/3 fixes the pre-existing OOB read in bpf_patch_call_args(). It changes the function to return an int and explicitly rejects the JIT fallback if the stack depth exceeds MAX_BPF_STACK, preventing a potential stack buffer overflow. Patch 2/3 fixes the s16 truncation bug. 1. Keep the original imm field unchanged and use the off field to store the interpreter function index. 2. Adjust the JMP_CALL_ARGS case in ___bpf_prog_run() accordingly. 3. Restore the legacy xlated dump layout in bpf_insn_prepare_dump(). Patch 3/3 introduces a selftest for this fix. --- Change log: v10: 1. Make the error log in patch 1/3 more clear. (Kuohai) 2. Drop bpftool and disasm_helpers.c changes, and instead restore the legacy xlated dump layout in bpf_insn_prepare_dump(). This avoids requiring bpftool compatibility handling. (Quentin and Alexei) v9: https://lore.kernel.org/bpf/20260429171904.107244-1-tangyazhou@zju.edu.cn/ 1. Modify the selftest in patch 3/3: use __clobber_all in inline asm. (Sashiko AI reviewer) v8: https://lore.kernel.org/bpf/20260429105608.92741-1-tangyazhou@zju.edu.cn/ 1. Update cfg_partition_funcs() in bpftool to use insn->imm for call target calculation. (Sashiko AI reviewer) 2. Modify the selftest in patch 3/3: add a large padding before the call instruction, preventing the kernel panic on kernel without the fix. (Sashiko AI reviewer) 3. Modify the selftest in patch 3/3: make it more clear. v7: https://lore.kernel.org/bpf/20260421144504.823756-1-tangyazhou@zju.edu.cn/ 1. Rebase the patchset to the bpf-next tree to resolve the apply conflict. (Alexei) 2. Add Patch 1/3 to properly fix a pre-existing OOB read in bpf_patch_call_args(). (Sashiko AI reviewer) v6: https://lore.kernel.org/bpf/20260412170334.716778-1-tangyazhou@zju.edu.cn/ 1. Use a different but clearer approach to resolve this issue: keeping the original imm field unchanged and using the off field to store the interpreter function index. (Kuohai) 2. Update the related dumper code and remove a previous workaround in the selftests disasm helpers, which is no longer needed after this fix. v5: https://lore.kernel.org/bpf/20260326090133.221957-1-tangyazhou@zju.edu.cn/ 1. Some minor changes in commit messages. (AI Reviewer) v4: https://lore.kernel.org/bpf/20260326063329.10031-1-tangyazhou@zju.edu.cn/ 1. Remove some redundant commit messages of patch 2/3. (Emil) 2. Change the number of instructions in padding_subprog() from 200,000 to 32,765, which is the minimum number of instructions required to trigger the verifier failure. (Emil) v3: https://lore.kernel.org/bpf/20260323122254.98540-1-tangyazhou@zju.edu.cn/ 1. Resend to fix a typo in v2 and add "Fixes" tag. The rest of the changes are identical to v2. v2 (incorrect): https://lore.kernel.org/bpf/20260323081748.106603-1-tangyazhou@zju.edu.cn/ 1. Move the s16 boundary check from fixup_call_args() to bpf_patch_call_args(), and change the return type of bpf_patch_call_args() to int. (Emil) 2. Add Patch 3/3 to fix the incorrect subprog ID in dumped bpf_pseudo_call instructions, which is caused by the same truncation issue. (Puranjay) 3. Refine the new selftest for clarity and add detailed comments explaining the test design. (Emil) v1: https://lore.kernel.org/bpf/20260316190220.113417-1-tangyazhou@zju.edu.cn/ ==================== Link: https://patch.msgid.link/20260506094714.419842-1-tangyazhou@zju.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-11 08:27:02 -07:00
Yazhou Tang	344a00712c	selftests/bpf: Add test for large offset bpf-to-bpf call Add a selftest to verify the verifier and JIT behavior when handling bpf-to-bpf calls with relative jump offsets exceeding the s16 boundary. The test utilizes an inline assembly block with ".rept 32765" to generate a massive dummy subprogram. By placing this padding between the main program and the target subprogram, it forces the verifier to process a bpf-to-bpf call where the imm field exceeds the s16 range. - When JIT is enabled, it asserts that the program is successfully loaded and executes correctly to return the expected value. Since the fix does not change the JIT behavior, the test passes whether the fix is applied or not. - When JIT is disabled, it also asserts that the program is successfully loaded and executes correctly to return the expected value 3. - Before the fix, the verifier rewrites the call instruction with a truncated offset (here 32768 -> -32768) and lets it pass. When the program is executed, the call instruction will go to a wrong target (the landing pad) instead of the intended subprogram, then return -1 and fail. - After the fix, the verifier correctly handles the large offset and allows it to pass. The program then executes correctly to return the expected value 3. Co-developed-by: Tianci Cao <ziye@zju.edu.cn> Signed-off-by: Tianci Cao <ziye@zju.edu.cn> Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260506094714.419842-4-tangyazhou@zju.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-11 08:27:02 -07:00
Yazhou Tang	58a8f3e250	bpf: Fix s16 truncation for large bpf-to-bpf call offsets Currently, the BPF instruction set allows bpf-to-bpf calls (or internal calls, pseudo calls) to use a 32-bit imm field to represent the relative jump offset. However, when JIT is disabled or falls back to the interpreter, the verifier invokes bpf_patch_call_args() to rewrite the call instruction. In this function, the 32-bit imm is downcast to s16 and stored in the off field. void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth) { stack_depth = max_t(u32, stack_depth, 1); insn->off = (s16) insn->imm; insn->imm = interpreters_args[(round_up(stack_depth, 32) / 32) - 1] - __bpf_call_base_args; insn->code = BPF_JMP \| BPF_CALL_ARGS; } If the original imm exceeds the s16 range (i.e., a jump offset greater than 32767 instructions), this downcast silently truncates the offset, resulting in an incorrect call target. Fix this by: 1. In bpf_patch_call_args(), keeping the imm field unchanged and using the off field to store the index of the interpreter function. 2. In ___bpf_prog_run() for the JMP_CALL_ARGS case, retrieving the interpreter function pointer from the interpreters_args array using the off field as the index, and passing the original imm to calculate the last argument of the interpreter function. After these changes, the truncation issue is resolved, and __bpf_call_base_args is also no longer needed and can be removed, which makes the code cleaner. Performance: In ___bpf_prog_run() for the JMP_CALL_ARGS case, changing the retrieval of the interpreter function pointer from pointer addition to direct array indexing improves performance. The possible reason is that the latter has better instruction-level parallelism. See the v5 discussion [1] for more details. [1] https://lore.kernel.org/bpf/f120c3c4-6999-414a-b514-518bb64b4758@zju.edu.cn/ To avoid requiring bpftool changes, keep the new imm/off encoding internal and restore the legacy xlated dump layout in bpf_insn_prepare_dump(). For bpf-to-bpf call offsets that do not fit in s16, export off as 0 instead of a truncated and misleading value. Fixes: `1ea47e01ad` ("bpf: add support for bpf_call to interpreter") Fixes: `7105e828c0` ("bpf: allow for correlation of maps and helpers in dump") Suggested-by: Xu Kuohai <xukuohai@huaweicloud.com> Suggested-by: Puranjay Mohan <puranjay@kernel.org> Co-developed-by: Tianci Cao <ziye@zju.edu.cn> Signed-off-by: Tianci Cao <ziye@zju.edu.cn> Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com> Link: https://lore.kernel.org/r/20260506094714.419842-3-tangyazhou@zju.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-11 08:27:02 -07:00
Yazhou Tang	4314a44564	bpf: Fix out-of-bounds read in bpf_patch_call_args() The interpreters_args array only accommodates stack depths up to MAX_BPF_STACK (512 bytes). However, do_misc_fixups() may allow a larger stack depth if JIT is requested. If JIT compilation later fails and falls back to the interpreter, the verifier invokes bpf_patch_call_args() with this oversized stack depth. This causes a load-time out-of-bounds (OOB) read when calculating the interpreter function pointer index. Fix this by changing bpf_patch_call_args() to return an int and explicitly rejecting the JIT fallback (returning -EINVAL) if the stack depth exceeds MAX_BPF_STACK. Fixes: `1ea47e01ad` ("bpf: add support for bpf_call to interpreter") Co-developed-by: Tianci Cao <ziye@zju.edu.cn> Signed-off-by: Tianci Cao <ziye@zju.edu.cn> Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com> Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260506094714.419842-2-tangyazhou@zju.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-05-11 08:27:01 -07:00
Viken Dadhaniya	452d6fa37a	serial: qcom_geni: fix kfifo underflow when flush precedes DMA completion IRQ When uart_flush_buffer() runs before the DMA completion IRQ is delivered, the following race can occur (all steps serialized by uart_port_lock): 1. DMA starts: tx_remaining = N, kfifo contains N bytes 2. DMA completes in hardware; IRQ is pending but not yet delivered 3. uart_flush_buffer() acquires the port lock and calls kfifo_reset(), making kfifo_len() = 0 while tx_remaining remains N 4. uart_flush_buffer() releases the port lock 5. DMA IRQ fires; handle_tx_dma() acquires the port lock and calls uart_xmit_advance(uport, tx_remaining) on an empty kfifo uart_xmit_advance() increments kfifo->out by tx_remaining. Since kfifo_reset() already set both in and out to 0, out wraps past in, causing kfifo_len() to return UART_XMIT_SIZE - tx_remaining. The next start_tx_dma() call then submits a DMA transfer of stale buffer data. Fix this by snapshotting kfifo_len() at the start of handle_tx_dma() and skipping uart_xmit_advance() when fifo_len < tx_remaining, which indicates the kfifo was reset by a preceding flush. Fixes: `2aaa43c707` ("tty: serial: qcom-geni-serial: add support for serial engine DMA") Cc: stable <stable@kernel.org> Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Link: https://patch.msgid.link/20260506-serial-dma-stale-tx-buf-v1-1-e3ccb360d719@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-05-11 17:19:00 +02:00
Shitalkumar Gandhi	9a9254c4a2	serial: fsl_lpuart: fix rx buffer and DMA map leaks in start_rx_dma lpuart_start_rx_dma() allocates sport->rx_ring.buf with kzalloc() and then maps a scatterlist via dma_map_sg(). On three subsequent error paths the function returns directly without releasing those resources: - when dma_map_sg() returns 0 (-EINVAL): ring->buf is leaked. - when dmaengine_slave_config() fails: ring->buf and the DMA mapping are leaked. - when dmaengine_prep_dma_cyclic() returns NULL: ring->buf and the DMA mapping are leaked. The sole cleanup path, lpuart_dma_rx_free(), is only reached when lpuart_dma_rx_use is set, and the caller lpuart_rx_dma_startup() clears that flag on failure of lpuart_start_rx_dma(). So these resources are permanently leaked on every failure in this function. Repeated port open/close or termios changes under error conditions will slowly consume memory and leave stale streaming DMA mappings behind. Fix it by introducing two error labels that unmap the scatterlist and free the ring buffer as appropriate. While here, replace the misleading -EFAULT (bad userspace pointer) returned when dmaengine_prep_dma_cyclic() fails with the more accurate -ENOMEM, matching how other dmaengine users in the tree treat this failure. No functional change on the success path. Fixes: `5887ad43ee` ("tty: serial: fsl_lpuart: Use cyclic DMA for Rx") Cc: stable <stable@kernel.org> Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260420135903.2062024-1-shitalkumar.gandhi@cambiumnetworks.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-05-11 17:18:43 +02:00
AlanCui4080	40df217285	Revert "nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808" This reverts commit `7f991e3f9b` ("nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808") The incorrect implementations of SUBNQN is a known issue in a massive number of NVMe units. However, the warning "nvme nvmex: missing or invalid SUBNQN field." is usually appropriate and will not affect performance or behavior etc. That is because the support for SUBNQN is mandatory if the controller supports NVMe revision 1.2.1 or greater, and it reported itself without a SUBNQN field which breaks compliance with the specification. It should be not quirked by the Linux Kernel. Signed-off-by: Alan Cui <me@alancui.cc> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-05-11 08:07:40 -07:00
Sagi Grimberg	dbbd07d0a7	nvmet-tcp: Fix potential UAF when ddgst mismatch Shivam Kumar found via vulnerability testing: When data digest is enabled on an NVMe/TCP connection and a digest mismatch occurs on a non-final H2C_DATA PDU during an R2T-based data transfer, the digest error handler in nvmet_tcp_try_recv_ddgst() calls nvmet_req_uninit() — which performs percpu_ref_put() on the submission queue — but does NOT mark the command as completed. It does not set cqe->status, does not modify rbytes_done, and does not clear any flag. When the subsequent fatal error triggers queue teardown, nvmet_tcp_uninit_data_in_cmds() iterates all commands, checks nvmet_tcp_need_data_in() for each one, and finds that the already-uninited command still appears to need data (because rbytes_done < transfer_len and cqe->status == 0). It therefore calls nvmet_req_uninit() a second time on the same command — a double percpu_ref_put against a single percpu_ref_get. Reported-by: Shivam Kumar <kumar.shivam43666@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-05-11 08:07:40 -07:00
Chia-Lin Kao (AceLan)	b35a130367	nvme-pci: fix use-after-free in nvme_free_host_mem() nvme_free_host_mem() frees dev->hmb_sgt via dma_free_noncontiguous() but never clears the pointer afterward. This leads to a use-after-free if nvme_free_host_mem() is called twice in the same error path. This can happen during nvme_probe() when nvme_setup_host_mem() succeeds in allocating the HMB (setting dev->hmb_sgt) but nvme_set_host_mem() fails with an I/O error: nvme_setup_host_mem() nvme_alloc_host_mem_single() -> sets dev->hmb_sgt nvme_set_host_mem() -> fails with -EIO nvme_free_host_mem() -> frees hmb_sgt, but does NOT NULL it return error nvme_probe() error path: nvme_free_host_mem() -> dev->hmb_sgt is stale, use-after-free The second call dereferences the freed sgt, causing a NULL pointer dereference in iommu_dma_free_noncontiguous() when it accesses sgt->sgl->dma_address (the backing memory has been freed and zeroed). This is reproducible on Thunderbolt-attached NVMe devices (e.g., OWC Envoy Express behind a Dell WD22TB4 dock) where the device intermittently returns I/O errors during HMB setup due to PCIe link instability. BUG: kernel NULL pointer dereference, address: 0000000000000010 RIP: 0010:iommu_dma_free_noncontiguous+0x22/0x80 Call Trace: <TASK> dma_free_noncontiguous+0x3b/0x130 nvme_free_host_mem+0x30/0xf0 [nvme] nvme_probe.cold+0xcc/0x275 [nvme] local_pci_probe+0x43/0xa0 pci_device_probe+0xeea/0x290 really_probe+0xf9/0x3b0 __driver_probe_device+0x8b/0x170 driver_probe_device+0x24/0xd0 __driver_attach_async_helper+0x6b/0x110 async_run_entry_fn+0x37/0x170 process_one_work+0x1ac/0x3d0 worker_thread+0x1b8/0x360 kthread+0xf7/0x130 ret_from_fork+0x2d8/0x3a0 ret_from_fork_asm+0x1a/0x30 </TASK> Fix this by setting dev->hmb_sgt to NULL after freeing it, so the second call takes the multi-descriptor path which safely handles the already-cleaned-up state. Fixes: `63a5c7a4b4` ("nvme-pci: use dma_alloc_noncontigous if possible") Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-05-11 08:07:40 -07:00
Hannes Reinecke	a891962ae5	nvmet-auth: Do not print DH-HMAC-CHAP secrets From a security standpoint we should not allow to print out the DH-HMAC-CHAP secrets, but at the same time having them is useful for debugging authentication failures. So add a Kconfig option NVME_TARGET_AUTH_DEBUG to only enable debugging if explictly requested at build time. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-05-11 08:07:39 -07:00
Keith Busch	2279cd9c61	nvme: fix bio leak on mapping failure The local bio is always NULL, so we'd leak the bio if the integrity mapping failed. Just get it directly from the request. Fixes: `d0d1d52231` ("blk-map: provide the bdev to bio if one exists") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-05-11 08:07:39 -07:00
Keith Busch	ce28b772c3	nvme: make prp passthrough usage less scary The warning is a bit alarming, and it only prints for the very first non-sgl capable device that receives a passthrough command. Just log an informational message on initial discovery for every device. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>	2026-05-11 08:07:39 -07:00
Johan Hovold	1e5b50c78d	tty: add missing tty_driver include to tty_port.h Include the definition of struct tty_driver in tty_port.h to keep the header self-contained and avoid build breakage in case anyone includes it before tty_driver.h. Fixes: `eb3b0d92c9` ("tty: tty_port: add workqueue to flip TTY buffer") Cc: Xin Zhao <jackzxcui1989@163.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260506124323.186703-1-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-05-11 17:03:00 +02:00
Prasanna S	ca2584d841	serial: qcom-geni: fix UART_RX_PAR_EN bit position UART_RX_PAR_EN is incorrectly defined as bit 3, which triggers false framing errors (S_GP_IRQ_1_EN) and causes received data to be dropped when parity is enabled and the parity bit is 0. Define UART_RX_PAR_EN as bit 4 of the SE_UART_RX_TRANS_CFG register, as specified in the reference manual. Fixes: `c4f528795d` ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP") Cc: stable <stable@kernel.org> Signed-off-by: Prasanna S <prasanna.s@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://patch.msgid.link/20260428-serial-bit-correct-v1-1-9131ad5b97d8@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-05-11 17:02:45 +02:00
Hongling Zeng	92b1ea2245	serial: sh-sci: fix memory region release in error path The sci_request_port() function uses request_mem_region() to reserve I/O memory, but in the error path when sci_remap_port() fails, it incorrectly calls release_resource() instead of release_mem_region(). This mismatch can cause resource accounting issues. Fix it by using the correct release function, consistent with sci_release_port(). Fixes: `e265164708` ("serial: sh-sci: Handle port memory region reservations.") Cc: stable <stable@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202604032356.SzEjYkBC-lkp@intel.com/ Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20260421065737.724187-1-zenghongling@kylinos.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-05-11 17:02:32 +02:00
Zhaoyang Yu	6fe472c1bb	tty: serial: pch_uart: add check for dma_alloc_coherent() Add a check for dma_alloc_coherent() failure to prevent a potential NULL pointer dereference in dma_handle_rx(). Properly release DMA channels and the PCI device reference using a goto ladder if the allocation fails. Fixes: `3c6a483275` ("Serial: EG20T: add PCH_UART driver") Cc: stable <stable@kernel.org> Signed-off-by: Zhaoyang Yu <2426767509@qq.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/tencent_E328416B7CFD436F6029F2DF02AD7ED89C08@qq.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-05-11 17:02:07 +02:00
Maciej W. Rozycki	d15cd40cb1	serial: zs: Fix swapped RI/DSR modem line transition counting Fix a thinko in the status interrupt handler that has caused counters for the RI and DSR modem line transitions to be used for the other line each. Fixes: `8b4a40809e` ("zs: move to the serial subsystem") Cc: stable <stable@kernel.org> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://patch.msgid.link/alpine.DEB.2.21.2604101747110.29980@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-05-11 17:01:55 +02:00
DaeMyung Kang	2beaa98b46	ntfs: restore $MFT mirror contents check check_mft_mirror() still computes the number of bytes to validate in each mirrored MFT record, but the actual comparison against $MFTMirr was dropped when the superblock code was updated. As a result, mount misses a stale or inconsistent $MFTMirr as long as both records pass the structural baad-record checks. Restore the comparison and log an error when the primary $MFT record differs from its mirror copy. Returning false lets the existing mount error handling mark the volume as having NTFS errors and, with on_errors=remount-ro, continue read-only. The default on_errors=continue mount policy still allows the mount to proceed. Fixes: `6251f0b0de` ("ntfs: update super block operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2026-05-11 23:30:48 +09:00
Jiayuan Chen	91840be8f7	irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT On PREEMPT_RT, non-HARD irq_work runs in per-CPU kthreads via run_irq_workd(), so irq_work_sync() uses rcuwait() to wait for BUSY==0. After irq_work_single() clears BUSY via atomic_cmpxchg(), it still dereferences @work for irq_work_is_hard() and rcuwait_wake_up(). An irq_work_sync() caller on another CPU that enters after BUSY is cleared can observe BUSY==0 immediately, return, and free the work before those accesses complete — causing a use-after-free. Fix this by wrapping run_irq_workd() in guard(rcu)() so that the entire irq_work_single() execution is within an RCU read-side critical section. Then add synchronize_rcu() in irq_work_sync() after rcuwait_wait_event() to ensure the caller waits for the RCU grace period before returning, preventing premature frees. Fixes: `810979682c` ("irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.") Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20260330073234.303732-1-jiayuan.chen@linux.dev	2026-05-11 16:28:04 +02:00
Peter Oberparleiter	ea34567db0	s390/cio: Restore GFP_DMA for CHSC allocation Re-add GFP_DMA when allocating memory for CHSC control blocks. On some supported machines, CHSC cannot access memory outside the DMA zone, causing CHSC command failures. Cc: stable@vger.kernel.org Fixes: `a3a64a4def` ("s390/cio: remove unneeded DMA zone allocation") Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2026-05-11 16:27:25 +02:00
Paolo Pisati	2997606dd1	platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8407AA Use the existing zenbook duo keyboard quirk for the UX8407AA model too. Signed-off-by: Paolo Pisati <p.pisati@gmail.com> Reviewed-by: Denis Benato <denis.benato@linux.dev> Link: https://patch.msgid.link/20260508070956.62201-1-p.pisati@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>	2026-05-11 17:15:16 +03:00
Harshal Dev	0d5dc58181	soc: qcom: ice: Allow explicit votes on 'iface' clock for ICE Since Qualcomm inline-crypto engine (ICE) is now a dedicated driver de-coupled from the QCOM UFS driver, it explicitly votes for its required clocks during probe. For scenarios where the 'clk_ignore_unused' flag is not passed on the kernel command line, to avoid potential unclocked ICE hardware register access during probe the ICE driver should additionally vote on the 'iface' clock. Also update the suspend and resume callbacks to handle un-voting and voting on the 'iface' clock. Fixes: `2afbf43a4a` ("soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver") Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Kuldeep Singh <kuldeep.singh@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Harshal Dev <harshal.dev@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260416-qcom_ice_power_and_clk_vote-v5-2-5ccf5d7e2846@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>	2026-05-11 09:06:29 -05:00
Bjorn Andersson	e13617b594	Merge branch '20260416-qcom_ice_power_and_clk_vote-v5-1-5ccf5d7e2846@oss.qualcomm.com' into drivers-fixes-for-7.1 Merge the qcom,ice DeviceTree binding update through a topic branch to allow sharing it with the DeviceTree branch.	2026-05-11 09:05:50 -05:00
Harshal Dev	e27264daac	dt-bindings: crypto: qcom,ice: Fix missing power-domain and iface clk The DT bindings for inline-crypto engine do not specify the UFS_PHY_GDSC power-domain and iface clock. Without enabling the iface clock and the associated power-domain the ICE hardware cannot function correctly and leads to unclocked hardware accesses being observed during probe. Fix the DT bindings for inline-crypto engine to require the UFS_PHY_GDSC power-domain and iface clock for new devices (Eliza and Milos) introduced in the current release (7.1) with yet-to-stabilize ABI, while preserving backward compatibility for older devices. Fixes: `618195a7ac` ("dt-bindings: crypto: qcom,inline-crypto-engine: Document the Eliza ICE") Fixes: `85faec1e85` ("dt-bindings: crypto: qcom,inline-crypto-engine: document the Milos ICE") Reviewed-by: Kuldeep Singh <kuldeep.singh@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Harshal Dev <harshal.dev@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260416-qcom_ice_power_and_clk_vote-v5-1-5ccf5d7e2846@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>	2026-05-11 09:05:19 -05:00
Thomas Richter	99269799bf	s390/pai: Fix missing PAI counter increments under heavy load Machines with a larger number of CPUs and under heavy load sometimes loose PAI counter increments during recording using events -e CRYPTO_ÂLL or -e NNPA_ALL. Counting is not affected. This happens when several PAI crypto counters are incremented during the same cryptographic operation. During schedule out the functions paiXXX_sched_task() (with XXX either crypt or ext) +--> pai_have_samples() +--> pai_have_sample() +--> pai_copy() +--> pai_push_sample() are called to read out PAI counter values. In pai_copy() the current values of PAI counters are read from the PMU memory mapped page and compared to the values read during last schedule out operation, which have been saved in a backup page named PAI_SAVE_AREA(event). For each PAI counter a delta is calculated and when the delta is positive, that PAI counter was incremented by hardware. This positve delta is reported as raw data record attached to a sample. After all deltas have been calculated, the new PAI counter values are saved in the backup page PAI_SAVE_AREA(event). However this is done in pai_push_sample(), leaving a small window for missing hardware triggered updates. Here is one scenario: PAI counter idx: 0 1 2 3 4 5 6 7 .... N +---+---+---+---+---+---+---+---+ +---+ PAI counter page:\| \| \| X \| \| \| \| \| \|....\| Y \| +---+---+---+---+---+---+---+---+ +---+ In pai_copy() each PAI counter value is read and compared to its old value. This is done in a loop. When PAI counter indexed N is read, the hardware might increment PAI counter indexed 2 again, updating its value from X to X+1. Later pai_push_sample() simply mem-copies the complete PAI counter page to a backup page and the increment of X+1 is lost, because the backup page now contains the new value. Read each PAI counter and save this value in the backup page when there is a positive delta. This omits any time window between read and store. This also reduced the work load as only modified PAI counters are saved. Cc: stable@vger.kernel.org Fixes: `fe861b0c8d` ("s390/pai: save PAI counter value page in event structure") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2026-05-11 16:00:04 +02:00
Zhihao Cheng	725ecd8068	nsfs: fix wrong error code returned for pidns ioctls When executing NS_GET_PID_FROM_PIDNS (or similar pidns ioctls), if the target task cannot be found in the corresponding pid_ns, the error code should be ESRCH instead of ENOTTY. This bug was introduced when the extensible ioctl handling was added. Without proper return, ret would be overwritten by the default case in the extensible ioctl switch statement. Fixes: `a1d220d9da` ("nsfs: iterate through mount namespaces") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://patch.msgid.link/20260507112301.1042757-1-chengzhihao1@huawei.com Reviewed-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-11 15:59:14 +02:00
Ming Lei	1860c2f859	ublk: reject max_sectors smaller than PAGE_SECTORS in parameter validation blk_validate_limits() requires max_hw_sectors >= PAGE_SECTORS and fires a WARN_ON_ONCE if this invariant is violated. ublk_validate_params() only checked the upper bound of max_sectors against max_io_buf_bytes, allowing userspace to pass small values (including zero) that trigger the warning when blk_mq_alloc_disk() is called from ublk_ctrl_start_dev(). Before `494ea040bc`, ublk used blk_queue_max_hw_sectors() which silently clamped small values up to PAGE_SECTORS. The conversion to passing queue_limits directly to blk_mq_alloc_disk() lost that clamping and now hits blk_validate_limits()'s WARN_ON_ONCE instead. Validate that max_sectors is at least PAGE_SECTORS in ublk_validate_params() so invalid values are rejected early with -EINVAL instead of reaching the block layer. Fixes: `494ea040bc` ("ublk: pass queue_limits to blk_mq_alloc_disk") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260510144843.769031-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-05-11 07:44:20 -06:00
Maoyi Xie	3799c25709	io_uring/fdinfo: translate SqThread PID through caller's pid_ns SQPOLL stores current->pid (init_pid_ns view) in sqd->task_pid at thread creation. fdinfo prints it raw via seq_printf("SqThread:\t%d\n", sq_pid). A reader inside a non-initial pid_ns sees the host PID, not the kthread's PID in the reader's own pid_ns. The SQPOLL kthread is created with CLONE_THREAD and no CLONE_NEW*, so it lives in the submitter's pid_ns. An unprivileged user_ns + pid_ns submitter can read fdinfo and learn the host PID of a kthread whose in-namespace PID is different. Reproducer (mainline 7.0, KASAN): unshare CLONE_NEWUSER \| CLONE_NEWPID \| CLONE_NEWNS, mount a private /proc, then have a grandchild that is pid 1 in the new pid_ns open an io_uring ring with IORING_SETUP_SQPOLL. /proc/self/task lists {1, 2}; the SQPOLL kthread is pid 2. Before: fdinfo prints SqThread = <host pid>. After: SqThread = 2. Use task_pid_nr_ns() against the proc inode's pid_ns to compute sq_pid, instead of reading the stored sq->task_pid (which holds the init_pid_ns view). pidfd_show_fdinfo() in kernel/pid.c follows the same pattern. Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Link: https://patch.msgid.link/20260510084119.457578-1-maoyi.xie@ntu.edu.sg Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-05-11 07:44:09 -06:00
Pankaj Raghav	834e98acb7	fs: fix forced iversion increment on lazytime timestamp updates When updating timestamps with lazytime enabled, if only I_DIRTY_TIME is set (pure lazytime update), inode_maybe_inc_iversion() should not be forced to increment i_version. The force parameter should only be true when actual data or metadata changes require an iversion bump. The current code uses "!!dirty" which evaluates to true whenever dirty has any bits set, including the I_DIRTY_TIME bit alone. This forces an iversion increment on every lazytime timestamp update, which then sets I_DIRTY_SYNC, triggering expensive log flushes on subsequent fdatasync calls. Andres reported this issue when he noticed a perf regression[1]. Fix this by using "dirty != I_DIRTY_TIME" as the force parameter. This passes false for pure lazytime updates (allowing the I_VERSION_QUERIED optimization to work), while still forcing the increment when dirty contains other flags indicating real changes that require iversion updates. [1] https://lore.kernel.org/linux-xfs/7ys6erh3nnyeerv2nybyfvp7dmaknuxrlxv74wx56ocdothkc6@ekfiadtkfn2r/ Fixes: `85c871a02b` ("fs: add support for non-blocking timestamp updates") Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://patch.msgid.link/20260511111918.1793689-1-p.raghav@samsung.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-11 15:32:12 +02:00
Yong-Xuan Wang	cefafbd561	irqchip/riscv-imsic: Clear interrupt move state during CPU offlining Affinity changes of IMSIC interrupts have to be careful to not lose an interrupt in the process. Each vector keeps track of an affinity change in progress with two pointers in struct imsic_vector. imsic_vector::move_prev points to the previous CPU target data and imsic_vector::move_next to the designated new CPU target data. imsic_vector::move_prev on the new CPU can only be cleared after the previous CPU has cleared imsic_vector::move_next, which ususally happens in __imsic_remote_sync(). In case of CPU hot-unplug __imsic_remote_sync() is not invoked because the CPU is already marked offline. That means imsic_vector::move_prev becomes stale until the CPU is onlined again. The stale pointer prevents further affinity changes for the affected interrupts. Solve this by clearing the imsic_vector::move_prev pointers in the CPU hotplug offline path. [ tglx: Replace word salad in change log ] Fixes: `0f67911e82` ("irqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260508-imsic-v2-1-e9f08dd46cf5@sifive.com	2026-05-11 15:23:11 +02:00
Xianwei Zhao	5363b67ac8	irqchip/meson-gpio: Use the correct register in meson_s4_gpio_irq_set_type() meson_s4_gpio_irq_set_type() uses the both-edge trigger register for configuring level type and single edge mode interrupts, which is not correct. Use REG_EDGE_POL instead. Fixes: `bbd6fcc76b` ("irqchip: Add support for Amlogic A4 and A5 SoCs") Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260508-a9-gpio-irqchip-v1-1-9dc5f3e022e0@amlogic.com	2026-05-11 15:22:48 +02:00
Rosen Penev	0fa10fb770	irqchip/ath79-cpu: Remove unused function ath79_cpu_irq_init() was part of the legacy pre-OF code that got removed a while back. Remove it to get rid of a missing prototype warning, reported by the kernel test robot. [ tglx: Fix the subject prefix. Sigh ... ] Fixes: `51fa4f8912` ("MIPS: ath79: drop legacy IRQ code") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260506085522.1210143-1-rosenp@gmail.com Closes: https://lore.kernel.org/oe-kbuild-all/202412011509.kGQkDr1y-lkp@intel.com/	2026-05-11 15:06:47 +02:00
Mark Rutland	512718bbc5	genirq/chip: Don't call add_interrupt_randomness() for NMIs Recently handle_percpu_devid_irq() was changed to call add_interrupt_randomness(). This introduced a potential deadlock when handle_percpu_devid_irq() is used to handle an NMI, which can be detected with lockdep, e.g. ================================ WARNING: inconsistent lock state 7.1.0-rc2-pnmi #465 Not tainted -------------------------------- inconsistent {INITIAL USE} -> {IN-NMI} usage. perf/695 [HC1[1]:SC0[0]:HE0:SE1] takes: ffff00837dfd3a18 (&base->lock){-.-.}-{2:2}, at: lock_timer_base+0x6c/0xac {INITIAL USE} state was registered at: _raw_spin_lock_irqsave+0x68/0xb0 lock_timer_base+0x6c/0xac __mod_timer+0x100/0x32c add_timer_global+0x2c/0x40 __queue_delayed_work+0xf0/0x140 queue_delayed_work_on+0x134/0x138 mem_cgroup_css_online+0x30c/0x310 online_css+0x34/0x10c cgroup_init_subsys+0x158/0x1c8 cgroup_init+0x440/0x524 start_kernel+0x888/0x998 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&base->lock); <Interrupt> lock(&base->lock); * DEADLOCK * Call trace: _raw_spin_lock_irqsave+0x68/0xb0 lock_timer_base+0x6c/0xac add_timer_on+0x78/0x16c add_interrupt_randomness+0x124/0x134 handle_percpu_devid_irq+0xd4/0x16c handle_irq_desc+0x40/0x58 generic_handle_domain_nmi+0x28/0x50 __gic_handle_nmi.isra.0+0x4c/0xa0 gic_handle_irq+0x38/0x2bc call_on_irq_stack+0x30/0x48 do_interrupt_handler+0x80/0x98 el1_interrupt+0x90/0xac el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x80/0x84 [...] During review, Thomas pointed out it wouldn't be safe for handle_percpu_devid_irq() to call add_interrupt_randomness() if it was used to handle NMIs: https://lore.kernel.org/lkml/87bjgik042.ffs@tglx/ ... but evidently people missed that handle_percpu_devid_irq() is used for NMIs. While it might seem that NMIs should be handled with a separate handle_percpu_devid_nmi() function, for various structural reasons this was impractical, and handle_percpu_devid_irq() has been expected to be used for NMIs since commits: `21bbbc50f3` ("irqchip/gic-v3: Switch high priority PPIs over to handle_percpu_devid_irq()") `5ff78c8de9` ("genirq: Kill handle_percpu_devid_fasteoi_nmi()") Taking the above into account, avoid the deadlock by not calling add_interrupt_randomness() when handle_percpu_devid_irq() is called in an NMI context. This is consistent with other NNI handling flows, which do not call add_interrupt_randomness(). At the same time, update the kernel-doc comment to make it clear that handle_percpu_devid_irq() can be called in NMI context. The rest of handle_percpu_devid_irq() is currently NMI safe and doesn't need to change. Fixes: `fd7400cfcb` ("genirq/chip: Invoke add_interrupt_randomness() in handle_percpu_devid_irq()") Reported-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20260507110518.3128248-1-mark.rutland@arm.com	2026-05-11 14:56:04 +02:00

... 26 27 28 29 30 ...

1446786 Commits