Commit Graph

2017 Commits

Author SHA1 Message Date
Yifan Zhang
353f7430d1 drm/amdgpu: unmap all user mappings of framebuffer and doorbell before mode1 reset
During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily
inaccessible via PCIe. Any attempt to access framebuffer or MMIO registers during
this window can result in uncompleted PCIe transactions, leading to NMI panics or
system hangs.

To prevent this, Unmap all of the applications mappings of the framebuffer
and doorbell BARs before mode1 reset. Also prevent new mappings from coming in
during the reset process.

v2: remove inode in kfd_dev (Christian)
v3: correct unmap offset (Felix), remove prevent new mappings part
to avoid deadlock (Christian)

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 70cadefcc6160c575b04f763ada34c20e868d577)
2026-05-19 12:14:55 -04:00
David Francis
6dc2c49a70 drm/amdkfd: Check bounds for allocate_sdma_queue restore_sdma_id
allocate_sdma_queue has an option where the sdma queue id can be
specified (used by CRIU). We weren't bounds-checking that
value.

Confirm it's less than the maximum number of queues.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bfe9a7545b2a7be1c543f1741e16f2d5ec4116ae)
2026-05-19 12:11:43 -04:00
David Francis
a1d4b228e3 drm/amdkfd: Check bounds on allocate_doorbell
allocated_doorbell has an option to set the doorbell id
to a specific value (used by CRIU). This value was not
bounds checked.

Check to confirm it's less than KFD_MAX_NUM_OF_QUEUES_PER_PROCESS.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1f087bb8cf9e8797633da35c85435e557ef74d06)
2026-05-19 12:11:26 -04:00
Sunday Clement
48b13bfbdf drm/amdkfd: Fix OOB memory exposure in get_wave_state()
The get_wave_state() function for v9 trusts cp_hqd_cntl_stack_size and
cp_hqd_cntl_stack_offset values read directly from the MQD, which are
written by GPU microcode and fully attacker-controlled on the
CRIU-restore path (via AMDKFD_IOC_RESTORE_PROCESS with H3).

this leads to an unbounded copy_to_user() that can leak adjacent
GTT/kernel memory. If offset > size, integer underflow produces a ~4 GiB
read length, if size is set to 1 MiB against a 4 KiB allocation, we leak
1 MiB of adjacent kernel memory (other queues' MQDs, ring buffers, KASLR
pointers).

Fix by clamping both cp_hqd_cntl_stack_size to the actual allocated
buffer size (q->ctl_stack_size) and cp_hqd_cntl_stack_offset to the
clamped size before performing arithmetic and copy_to_user().

This ensures we never read beyond the allocated kernel BO regardless of
attacker-supplied MQD field values.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7ef144458f48d5589e36f1b3d83e83db2e5c5ba5)
2026-05-19 12:10:04 -04:00
Xiaogang Chen
81665e35f1 drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
During gpu hot-unplug need check if there are kfd porcesses still using the
being removed gpu before clean resources of the device. Current driver checks
if kfd_processes_table is empty. kfd processes are not terminated after
removed from kfd_processes_table immediately. They are still alive and may
access the device until kfd_process_wq work queue got ran.

Check kfd->kfd_processes_count value that is updated after kfd process got
uninitialized when its ref becomes zero.

Fixes: 6cca686dfc ("drm/amdkfd: kfd driver supports hot unplug/replug amdgpu devices")
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d12d05c4bc4c15585130af43e897923ff292df7b)
2026-05-05 10:18:23 -04:00
Felix Kuehling
9b4e3495d1 drm/amdkfd: Make all TLB-flushes heavy-weight
With only one sequence number we cannot track the need for legacy vs
heavy-weight flushes reliably. Always use heavy-weight.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c1a3ff1d327820cd9a52bc1056b98681fc088949)
Cc: stable@vger.kernel.org
2026-05-05 10:13:54 -04:00
YuanShang
d0f5711fa1 drm/amdkfd: check if vm ready in svm map and unmap to gpu
Don't map or unmap svm range to gpu if vm is not ready for updates.

Why: DRM entity may already be killed when the svm worker try to
update gpu vm.

Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 55f8e366c326980174a4f2b9501b524d8eb25135)
2026-04-24 11:12:07 -04:00
Alysa Liu
045e0ff208 drm/amdkfd: validate SVM ioctl nattr against buffer size
Validate nattr field against the buffer size, preventing
out-of-bounds buffer access via user-controlled attribute count.

Reviewed-by: Amir Shetaia <Amir.Shetaia@amd.com>
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5eca8bfdfa456c3304ca77523718fe24254c172f)
Cc: stable@vger.kernel.org
2026-04-24 11:11:29 -04:00
Alysa Liu
74b73fa56a drm/amdkfd: Add upper bound check for num_of_nodes
drm/amdkfd: Add upper bound check for num_of_nodes
in kfd_ioctl_get_process_apertures_new.

Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 98ff46a5ea090c14d2cdb4f5b993b05d74f3949f)
Cc: stable@vger.kernel.org
2026-04-23 12:54:45 -04:00
Marco Crivellari
505b1c7342 amd/amdkfd: add WQ_UNBOUND to alloc_workqueue users
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

   commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

The refactoring is going to alter the default behavior of
alloc_workqueue() to be unbound by default.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU. For more details see the Link tag below.

This specific workload has no benefit being per-cpu, so its behavior has
been changed using explicitly WQ_UNBOUND.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-04-03 13:52:33 -04:00
Donet Tom
a3e1443630 drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack size to GPU page size
The control stack size is calculated based on the number of CUs and
waves, and is then aligned to PAGE_SIZE. When the resulting control
stack size is aligned to 64 KB, GPU hangs and queue preemption
failures are observed while running RCCL unit tests on systems with
more than two GPUs.

amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4
amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues

This issue is observed on both 4 KB and 64 KB system page-size
configurations.

This patch fixes the issue by aligning the control stack size to
AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size
will not be 64 KB on systems with a 64 KB page size and queue
preemption works correctly.

Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE,
which can waste memory if the system page size is large. In this patch,
wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated
from wg_data_size and the control stack size, is aligned to PAGE_SIZE.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 15:16:39 -04:00
Pierre-Eric Pelloux-Prayer
6ab4054fda drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences
Use TTM_NUM_MOVE_FENCES as an upperbound of how many fences
ttm might need to deal with moves/evictions.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 15:16:33 -04:00
Pierre-Eric Pelloux-Prayer
ab5dd4dcc5 drm/amdgpu: allocate move entities dynamically
No functional change for now, as we always allocate a single entity.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 15:16:15 -04:00
Eric Huang
4ea64d482f drm/amdkfd: fix kernel crash on releasing NULL sysfs entry
there is an abnormal case that When a process re-opens kfd
with different mm_struct(execve() called by user), the
allocated p->kobj will be freed, but missed setting it to NULL,
that will cause sysfs/kernel crash with NULL pointers in p->kobj
on kfd_process_remove_sysfs() when releasing process, and the
similar error on kfd_procfs_del_queue() as well.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 15:14:51 -04:00
Lang Yu
054695c0e2 drm/amdkfd: Switch to dev_* printk stuff in kfd_int_process_v12_1.c
dev_* printk stuff is multi-GPU friendly.

Use dev_warn_ratelimited() for print_sq_intr_info_error() which is
consistent with previous IPs.

Use dev_dbg_ratelimited() for irrelevant node interrupt print to
avoid too much noise.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 15:13:14 -04:00
Donet Tom
31b8de5e55 drm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 64KB
Currently, AMDGPU_VA_RESERVED_TRAP_SIZE is hardcoded to 8KB, while
KFD_CWSR_TBA_TMA_SIZE is defined as 2 * PAGE_SIZE. On systems with
4K pages, both values match (8KB), so allocation and reserved space
are consistent.

However, on 64K page-size systems, KFD_CWSR_TBA_TMA_SIZE becomes 128KB,
while the reserved trap area remains 8KB. This mismatch causes the
kernel to crash when running rocminfo or rccl unit tests.

Kernel attempted to read user page (2) - exploit attempt? (uid: 1001)
BUG: Kernel NULL pointer dereference on read at 0x00000002
Faulting instruction address: 0xc0000000002c8a64
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
CPU: 34 UID: 1001 PID: 9379 Comm: rocminfo Tainted: G E
6.19.0-rc4-amdgpu-00320-gf23176405700 #56 VOLUNTARY
Tainted: [E]=UNSIGNED_MODULE
Hardware name: IBM,9105-42A POWER10 (architected) 0x800200 0xf000006
of:IBM,FW1060.30 (ML1060_896) hv:phyp pSeries
NIP:  c0000000002c8a64 LR: c00000000125dbc8 CTR: c00000000125e730
REGS: c0000001e0957580 TRAP: 0300 Tainted: G E
MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24008268
XER: 00000036
CFAR: c00000000125dbc4 DAR: 0000000000000002 DSISR: 40000000
IRQMASK: 1
GPR00: c00000000125d908 c0000001e0957820 c0000000016e8100
c00000013d814540
GPR04: 0000000000000002 c00000013d814550 0000000000000045
0000000000000000
GPR08: c00000013444d000 c00000013d814538 c00000013d814538
0000000084002268
GPR12: c00000000125e730 c000007e2ffd5f00 ffffffffffffffff
0000000000020000
GPR16: 0000000000000000 0000000000000002 c00000015f653000
0000000000000000
GPR20: c000000138662400 c00000013d814540 0000000000000000
c00000013d814500
GPR24: 0000000000000000 0000000000000002 c0000001e0957888
c0000001e0957878
GPR28: c00000013d814548 0000000000000000 c00000013d814540
c0000001e0957888
NIP [c0000000002c8a64] __mutex_add_waiter+0x24/0xc0
LR [c00000000125dbc8] __mutex_lock.constprop.0+0x318/0xd00
Call Trace:
0xc0000001e0957890 (unreliable)
__mutex_lock.constprop.0+0x58/0xd00
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x6fc/0xb60 [amdgpu]
kfd_process_alloc_gpuvm+0x54/0x1f0 [amdgpu]
kfd_process_device_init_cwsr_dgpu+0xa4/0x1a0 [amdgpu]
kfd_process_device_init_vm+0xd8/0x2e0 [amdgpu]
kfd_ioctl_acquire_vm+0xd0/0x130 [amdgpu]
kfd_ioctl+0x514/0x670 [amdgpu]
sys_ioctl+0x134/0x180
system_call_exception+0x114/0x300
system_call_vectored_common+0x15c/0x2ec

This patch changes AMDGPU_VA_RESERVED_TRAP_SIZE to 64 KB and
KFD_CWSR_TBA_TMA_SIZE to the AMD GPU page size. This means we reserve
64 KB for the trap in the address space, but only allocate 8 KB within
it. With this approach, the allocation size never exceeds the reserved
area.

Fixes: 34a1de0f79 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole")
Reviewed-by: Christian König <christian.koenig@amd.com>
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 14:38:11 -04:00
Donet Tom
998d678141 drm/amd: Fix MQD and control stack alignment for non-4K
For gfxV9, due to a hardware bug ("based on the comments in the code
here [1]"), the control stack of a user-mode compute queue must be
allocated immediately after the page boundary of its regular MQD buffer.
To handle this, we allocate an enlarged MQD buffer where the first page
is used as the MQD and the remaining pages store the control stack.
Although these regions share the same BO, they require different memory
types: the MQD must be UC (uncached), while the control stack must be
NC (non-coherent), matching the behavior when the control stack is
allocated in user space.

This logic works correctly on systems where the CPU page size matches
the GPU page size (4K). However, the current implementation aligns both
the MQD and the control stack to the CPU PAGE_SIZE. On systems with a
larger CPU page size, the entire first CPU page is marked UC—even though
that page may contain multiple GPU pages. The GPU treats the second 4K
GPU page inside that CPU page as part of the control stack, but it is
incorrectly mapped as UC.

This patch fixes the issue by aligning both the MQD and control stack
sizes to the GPU page size (4K). The first 4K page is correctly marked
as UC for the MQD, and the remaining GPU pages are marked NC for the
control stack. This ensures proper memory type assignment on systems
with larger CPU page sizes.

[1]: https://elixir.bootlin.com/linux/v6.18/source/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c#L118

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 14:36:04 -04:00
Donet Tom
b01cd158a2 drm/amdkfd: Align expected_queue_size to PAGE_SIZE
The AQL queue size can be 4K, but the minimum buffer object (BO)
allocation size is PAGE_SIZE. On systems with a page size larger
than 4K, the expected queue size does not match the allocated BO
size, causing queue creation to fail.

Align the expected queue size to PAGE_SIZE so that it matches the
allocated BO size and allows queue creation to succeed.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-30 14:35:57 -04:00
Srinivasan Shanmugam
19d4149b22 drm/amdkfd: Fix NULL pointer check order in kfd_ioctl_create_process
In kfd_ioctl_create_process(), the pointer 'p' is used before checking
if it is NULL.

The code accesses p->context_id before validating 'p'. This can lead
to a possible NULL pointer dereference.

Move the NULL check before using 'p' so that the pointer is validated
before access.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_chardev.c:3177 kfd_ioctl_create_process() warn: variable dereferenced before check 'p' (see line 3174)

Fixes: cc6b66d661 ("amdkfd: introduce new ioctl AMDKFD_IOC_CREATE_PROCESS")
Cc: Zhu Lingshan <lingshan.zhu@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-24 13:29:40 -04:00
Jesse.Zhang
15e19d832b drm/amd/amdgpu: Fix build errors due to declarations after labels
In C90 (which the kernel uses with -std=gnu89), declarations must
appear at the beginning of a block and cannot follow a label. The
switch cases in amdgpu_discovery.c and gmc_v12_1.c contained variable
declarations immediately after case labels, causing the compiler to
error:

drivers/gpu/drm/amd/amdgpu/gmc_v12_1.c:533:3: error: a label can only be
part of a statement and a declaration is not a statement

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:42:47 -04:00
YiPeng Chai
df1f11fe14 drm/amdgpu: Add poison consumption handling for gfx v12_1
Add poison consumption handling for gfx v12_1.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:33:11 -04:00
Harish Kasiviswanathan
5abc46d134 drm/amdgpu: Support forcing MTYPE_RW
Set default value of module parameter amdgpu_mtype_local to -1. This
allows to force MTYPE_RW on ASICs where MTYPE_RW is not default.

v2: Fix SDMA get_vm_pte_pde MTYPE

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:31:52 -04:00
Harish Kasiviswanathan
f6e9582a7f drm/amdgpu: Update MTYPE for GFX12.1
Update MTYPE for GFX12.1 for AID A0 and A1

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip.Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:31:49 -04:00
Harish Kasiviswanathan
e040301393 drm/amdkfd: Don't expect signal mailbox update
GFX12.1 CP to improve performance has removed updating event_id into
signal mailbox. In future, this optimization can be extended to older
ASICs. Update driver code to handle this case.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:31:39 -04:00
Philip Yang
189208d3d5 drm/amdkfd: Update queue properties for metadata ring
Metadata ring and queue ring is allocated as one buffer and map
to GPU, so update queue peoperties should add the queue metadata
size and ring size as buffer size to validate queue ring buffer.

Fixes: c51bb53d5c ("drm/amdkfd: Add metadata ring buffer for compute")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-17 10:30:22 -04:00
Dave Airlie
02e778f123 amd-drm-next-7.1-2026-03-12:
amdgpu:
 - SMU13 fix
 - SMU14 fix
 - Fixes for bring up hw testing
 - Kerneldoc fix
 - GC12 idle power fix for compute workloads
 - DCCG fixes
 - UserQ fixes
 - Move test for fbdev object to a generic helper
 - GC 12.1 updates
 - Use struct drm_edid in non-DC code
 - Include IP discovery data in devcoredump
 - SMU 13.x updates
 - Misc cleanups
 - DML 2.1 fixes
 - Enable NV12/P010 support on primary planes
 - Enable color encoding and color range on overlay planes
 - DC underflow fixes
 - HWSS fast path fixes
 - Replay fixes
 - DCN 4.2 updates
 - Support newer IP discovery tables
 - LSDMA 7.1 support
 - IH 7.1 fixes
 - SoC v1 updates
 - GC12.1 updates
 - PSP 15 updates
 - XGMI fixes
 - GPUVM locking fix
 
 amdkfd:
 - Fix missing BO unreserve in an error path
 
 radeon:
 - Move test for fbdev object to a generic helper
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCabMItwAKCRC93/aFa7yZ
 2BiwAQD3yQsA6xuspaMXduQwFE0tu6rL6bB+NSB7bfM2YYdE1AEAqi2GQPOlfohH
 DK/zb58ZPoeJfCdShPPQ85cSkPSu2wY=
 =OtzL
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-7.1-2026-03-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-7.1-2026-03-12:

amdgpu:
- SMU13 fix
- SMU14 fix
- Fixes for bring up hw testing
- Kerneldoc fix
- GC12 idle power fix for compute workloads
- DCCG fixes
- UserQ fixes
- Move test for fbdev object to a generic helper
- GC 12.1 updates
- Use struct drm_edid in non-DC code
- Include IP discovery data in devcoredump
- SMU 13.x updates
- Misc cleanups
- DML 2.1 fixes
- Enable NV12/P010 support on primary planes
- Enable color encoding and color range on overlay planes
- DC underflow fixes
- HWSS fast path fixes
- Replay fixes
- DCN 4.2 updates
- Support newer IP discovery tables
- LSDMA 7.1 support
- IH 7.1 fixes
- SoC v1 updates
- GC12.1 updates
- PSP 15 updates
- XGMI fixes
- GPUVM locking fix

amdkfd:
- Fix missing BO unreserve in an error path

radeon:
- Move test for fbdev object to a generic helper

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260312184425.3875669-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-03-16 16:50:53 +10:00
Philip Yang
c24afed7de drm/amdkfd: Unreserve bo if queue update failed
Error handling path should unreserve bo then return failed.

Fixes: 305cd109b7 ("drm/amdkfd: Validate user queue update")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-11 13:58:07 -04:00
Dave Airlie
057ad0ef4d amd-drm-next-7.1-2026-03-04:
amdgpu:
 - FAMS2 updates
 - Refactor DC I2C
 - Rework ttm handling to allow for multiple engines
 - UserQ updates
 - Ring reset improvements
 - DC DCE 6.x cleanups
 - DC support for NUTMEG and TRAVIS DP bridges
 - Enable DC by default on CIK APUs
 - Add DCN 4.2 support
 - IPS fixes
 - Overlay fixes for DCN4
 - SDMA Limit updates
 - Misc fixes
 - RAS updates
 - Register access callback rework
 - GC 12.1 updates
 
 amdkfd:
 - Misc cleanups
 
 UAPI:
 - UserQ fence IOCTL parameter size fixes.  The change is backwards compatible on LE, but not BE.
   UserQs are still not considered stable and are disabled by default.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaaikJwAKCRC93/aFa7yZ
 2CklAQDWTLGzAuei2qob6wV4dcbxi+8eIB7uy4wzppMYen7WkgEA9oChRFo8eOAo
 SAlAx59UnuGYDEdQTlQUfP/1BSLZ6AM=
 =Cvp8
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-7.1-2026-03-04' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-7.1-2026-03-04:

amdgpu:
- FAMS2 updates
- Refactor DC I2C
- Rework ttm handling to allow for multiple engines
- UserQ updates
- Ring reset improvements
- DC DCE 6.x cleanups
- DC support for NUTMEG and TRAVIS DP bridges
- Enable DC by default on CIK APUs
- Add DCN 4.2 support
- IPS fixes
- Overlay fixes for DCN4
- SDMA Limit updates
- Misc fixes
- RAS updates
- Register access callback rework
- GC 12.1 updates

amdkfd:
- Misc cleanups

UAPI:
- UserQ fence IOCTL parameter size fixes.  The change is backwards compatible on LE, but not BE.
  UserQs are still not considered stable and are disabled by default.

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260304213233.1938311-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-03-09 06:04:21 +10:00
David Francis
421c0f1904 drm/amdgpu: Check for multiplication overflow in checkpoint stack size
get_checkpoint_info() in kfd_mqd_manager_v9.c finds 32-bit value
ctl_stack_size by multiplying two 32-bit values. This can overflow to a
lower value, which could result in copying outside the bounds of
a buffer in checkpoint_mqd() in the same file.

Put in a check for the overflow, and fail with -EINVAL if detected.

v2: use check_mul_overflow()

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-06 16:33:59 -05:00
Sunil Khatri
8c78845bf9 drm/amdkfd: fix the warning for potential insecure string
Below is the warning thrown by the clang compiler:
linux/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:588:9: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
                                           stats_dir_filename);
                                           ^~~~~~~~~~~~~~~~~~
linux/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:588:9: note: treat the string as an argument to avoid this
                                           stats_dir_filename);
                                           ^
                                           "%s",
linux/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:635:18: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
                                           p->kobj, counters_dir_filename);
                                                    ^~~~~~~~~~~~~~~~~~~~~
linux/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:635:18: note: treat the string as an argument to avoid this
                                           p->kobj, counters_dir_filename);
                                                    ^
                                                    "%s",

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
CC: Philip Yang <philip.yang@amd.com>
CC: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-06 16:33:55 -05:00
Philip Yang
b2d13a41da drm/amdgpu: GFX12.1 scratch memory limit up to 57-bit
The scratch aperture or gmc private aperture in flat memory contains
57 bits of data on gfx v12.1.0 compared to the 32 bits from previous.

Add new helper kfd_init_apertures_v12 for gfx version >= v12.1.0 which
supports 57-bit VA space.

v2:
  - update pdd->scratch_limit (Yu, Lang)
  - update fixes tag (Felix Kuehling)
  - add helper kfd_init_apertures_v12

Fixes: db1882b3ff ("drm/amdkfd: Update LDS, Scratch base for 57bit address")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-03-04 11:42:26 -05:00
Alex Deucher
911e2c0525 drm/amdkfd: fix CWSR trap handler
Fix up what looks like a bad merge.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-26 11:20:10 -05:00
Andrew Martin
bfe60e539c drm/amdkfd: Removed commented line for MQD queue priority
Missed deleting the commented line in the original patch.

Fixes: 73463e26f7 ("drm/amdkfd: Disable MQD queue priority")
Signed-off-by: Andrew Martin <andrew.martin@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-25 16:28:10 -05:00
Pierre-Eric Pelloux-Prayer
2c37255725 drm/amdgpu: statically assign gart windows to ttm entities
If multiple entities share the same window we must make sure
that jobs using them are executed sequentially.

This commit gives separate windows to each entity, so jobs
from multiple entities could execute in parallel if needed.
(for now they all use the first sdma engine, so it makes no
difference yet).
The entity stores the gart window offsets to centralize the
"window id" to "window offset" in a single place.

default_entity doesn't get any windows reserved since there is
no use for them.

---
v3:
- renamed gart_window_lock -> lock (Christian)
- added amdgpu_ttm_buffer_entity_init (Christian)
- fixed gart_addr in svm_migrate_gart_map (Felix)
- renamed gart_window_idX -> gart_window_offs[]
- added amdgpu_compute_gart_address
v4:
- u32 -> u64
- added kerneldoc
v5:
- removed gtt_window_lock
- simplified gart window creation and use: entities using a
  single window now uses window #0 instead of #1
- fix dst_addr calculation in kfd_migrate.c
---

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-23 14:16:29 -05:00
Maxime Ripard
8b85987d3c
Merge drm/drm-next into drm-misc-next
Let's merge 7.0-rc1 to start the new drm-misc-next window

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2026-02-23 11:48:20 +01:00
Kees Cook
189f164e57 Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-22 08:26:33 -08:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
d4a292c5f8 drm next fixes for 7.0-rc1
pagemap:
 - drm/pagemap: pass pagemap_addr by reference
 
 amdgpu:
 - DML 2.1 fixes
 - Panel replay fixes
 - Display writeback fixes
 - MES 11 old firmware compat fix
 - DC CRC improvements
 - DPIA fixes
 - XGMI fixes
 - ASPM fix
 - SMU feature bit handling fixes
 - DC LUT fixes
 - RAS fixes
 - Misc memory leak in error path fixes
 - SDMA queue reset fixes
 - PG handling fixes
 - 5 level GPUVM page table fix
 - SR-IOV fix
 - Queue reset fix
 - SMU 13.x fixes
 - DC resume lag fix
 - MPO fixes
 - DCN 3.6 fix
 - VSDB fixes
 - HWSS clean up
 - Replay fixes
 - DCE cursor fixes
 - DCN 3.5 SR DDR5 latency fixes
 - HPD fixes
 - Error path unwind fixes
 - SMU13/14 mode1 reset fixes
 - PSP 15 updates
 - SMU 15 updates
 - Sync fix in amdgpu_dma_buf_move_notify()
 - HAINAN fix
 - PSP 13.x fix
 - GPUVM locking fix
 - Fixes for DC analog support
 - DC FAMS fixes
 - DML 2.1 fixes
 - eDP fixes
 - Misc DC fixes
 - Fastboot fix
 - 3DLUT fixes
 - GPUVM fixes
 - 64bpp format fix
 - Fix for MacBooks with switchable gfx
 
 amdkfd:
 - Fix possible double deletion of validate list
 - Event setup fix
 - Device disconnect regression fix
 - APU GTT as VRAM fix
 - Fix piority inversion with MQDs
 - NULL check fix
 
 radeon:
 - HAINAN fix
 
 i915/xe display:
 - Regresion fix for HDR 4k displays (#15503)
 - Fixup for Dell XPS 13 7390 eDP rate limit
 - Memory leak fix on ACPI _DSM handling
 - Add missing slice count check during DP mode validation
 
 xe:
 - drm/xe: Prevent VFs from exposing the CCS mode sysfs file
 - SRIOV related fixes
 - PAT cache fix
 - MMIO read fix
 - W/a fixes
 - Adjust type of xe_modparam.force_vram_bar_size
 - Wedge mode fix
 - HWMon fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmmYyNEACgkQDHTzWXnE
 hr7n0Q/+L4KscEx0ByCOBNfWJollaJKZEpdeWLQ2J9RSa/hPVOTE+k2z/TokCsYw
 jYIHCv8UBU/VvffWWHj0CHCRSRFcfwWo9bZm0J0C6vRSL77iXJPUDZOZ/EhP7Cy3
 oMZyexBcVTwNqzI2cg8PYsGkTMPpdBUxrSNxa9d15kNJbMSm5RdAMJ4pt/9Id9V4
 iCGQccsNZN8d9JdPzYU/Tay2RQOD/75l9xtywq4vb8Ktszc7gWFca+tRleYZxbtG
 WuXvMStqmZAUZJaTbSASbSQFQMEdyJXSsFq/T0cY1Mm5DSUqnJ27YqOzL5sqzQTE
 +Aeh5w5xEgq/GaLzdx9tOSxBr6mK7p251RApSunPn4nwb50iT8dBAqaYh8Zy2e0o
 vQtFgp3PLlnqwvdvEtqPoq6+oG/FtIsULLPLuMqlZtOe3EG7BzQsWeWASj793EtN
 KrTu4a9HufzZrf+t8a3ZLp2CMQJI5sCCBZBJiZ6/ImiixBEH5bJzSLPhdx7VeoMe
 //kVBbhYXD8oi8QMcJHdZg9ERp80D1yYJhhGWes330s1tdl87Gizy4Mu4pSH9Mds
 bw7u6PTS7iI7hoXvz/ITrrHkFTkxLsxVhPI2OhLAfRpIEoH9XOFgi+BCmZOEb9lT
 V5RhIDxrKpjQcgJ9d9dAM2arO11kTPdVTEH4tYpzQ5cHd17mls4=
 =SQjl
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2026-02-21' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "This is the fixes and cleanups for the end of the merge window, it's
  nearly all amdgpu, with some amdkfd, then a pagemap core fix, i915/xe
  display fixes, and some xe driver fixes.

  Nothing seems out of the ordinary, except amdgpu is a little more
  volume than usual.

  pagemap:
   - drm/pagemap: pass pagemap_addr by reference

  amdgpu:
   - DML 2.1 fixes
   - Panel replay fixes
   - Display writeback fixes
   - MES 11 old firmware compat fix
   - DC CRC improvements
   - DPIA fixes
   - XGMI fixes
   - ASPM fix
   - SMU feature bit handling fixes
   - DC LUT fixes
   - RAS fixes
   - Misc memory leak in error path fixes
   - SDMA queue reset fixes
   - PG handling fixes
   - 5 level GPUVM page table fix
   - SR-IOV fix
   - Queue reset fix
   - SMU 13.x fixes
   - DC resume lag fix
   - MPO fixes
   - DCN 3.6 fix
   - VSDB fixes
   - HWSS clean up
   - Replay fixes
   - DCE cursor fixes
   - DCN 3.5 SR DDR5 latency fixes
   - HPD fixes
   - Error path unwind fixes
   - SMU13/14 mode1 reset fixes
   - PSP 15 updates
   - SMU 15 updates
   - Sync fix in amdgpu_dma_buf_move_notify()
   - HAINAN fix
   - PSP 13.x fix
   - GPUVM locking fix
   - Fixes for DC analog support
   - DC FAMS fixes
   - DML 2.1 fixes
   - eDP fixes
   - Misc DC fixes
   - Fastboot fix
   - 3DLUT fixes
   - GPUVM fixes
   - 64bpp format fix
   - Fix for MacBooks with switchable gfx

  amdkfd:
   - Fix possible double deletion of validate list
   - Event setup fix
   - Device disconnect regression fix
   - APU GTT as VRAM fix
   - Fix piority inversion with MQDs
   - NULL check fix

  radeon:
   - HAINAN fix

  i915/xe display:
   - Regresion fix for HDR 4k displays (#15503)
   - Fixup for Dell XPS 13 7390 eDP rate limit
   - Memory leak fix on ACPI _DSM handling
   - Add missing slice count check during DP mode validation

  xe:
   - drm/xe: Prevent VFs from exposing the CCS mode sysfs file
   - SRIOV related fixes
   - PAT cache fix
   - MMIO read fix
   - W/a fixes
   - Adjust type of xe_modparam.force_vram_bar_size
   - Wedge mode fix
   - HWMon fix

* tag 'drm-next-2026-02-21' of https://gitlab.freedesktop.org/drm/kernel: (143 commits)
  drm/amd/display: Remove unneeded DAC link encoder register
  drm/amd/display: Enable DAC in DCE link encoder
  drm/amd/display: Set CRTC source for DAC using registers
  drm/amd/display: Initialize DAC in DCE link encoder using VBIOS
  drm/amd/display: Turn off DAC in DCE link encoder using VBIOS
  drm/amd/display: Don't call find_analog_engine() twice
  drm/amdgpu: fix 4-level paging if GMC supports 57-bit VA v2
  drm/amdgpu: keep vga memory on MacBooks with switchable graphics
  drm/amdgpu: Set atomics to true for xgmi
  drm/amdkfd: Check for NULL return values
  drm/amd/display: Use same max plane scaling limits for all 64 bpp formats
  drm/amdgpu: Set vmid0 PAGE_TABLE_DEPTH for GFX12.1
  drm/amdkfd: Disable MQD queue priority
  drm/amd/display: Remove conditional for shaper 3DLUT power-on
  drm/amd/display: Check return of shaper curve to HW format
  drm/amd/display: Correct logic check error for fastboot
  drm/amd/display: Skip eDP detection when no sink
  Revert "drm/amd/display: Add Gfx Base Case For Linear Tiling Handling"
  Revert "drm/amd/display: Correct hubp GfxVersion verification"
  Revert "drm/amd/display: Add Handling for gfxversion DcGfxBase"
  ...
2026-02-20 15:36:38 -08:00
Andrew Martin
09da66f139 drm/amdkfd: Check for NULL return values
This patch fixes issues when the code moves forward with a potential
NULL pointer, without checking.
Removed one redundant NULL check for a function parameter. This check
is already done in the only caller.

Signed-off-by: Andrew Martin <andrew.martin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-19 12:16:11 -05:00
Andrew Martin
73463e26f7 drm/amdkfd: Disable MQD queue priority
This solves a priority inversion issue, caused by the language
runtime making high-priority queues wait for activity on
lower-priority queues.

Signed-off-by: Andrew Martin <andrew.martin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-19 12:16:11 -05:00
Siwei He
8446c74737 drm/amdkfd: Fix APU to use GTT, not VRAM for MQD
Add a check in mqd_on_vram. If the device prefers GTT, it returns false

Fixes: d4a814f400 ("drm/amdkfd: Move gfx9.4.3 and gfx 9.5 MQD to HBM")
Signed-off-by: Siwei He <siwei.he@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-12 15:23:45 -05:00
Srinivasan Shanmugam
5a19302cab drm/amdkfd: Fix watch_id bounds checking in debug address watch v2
The address watch clear code receives watch_id as an unsigned value
(u32), but some helper functions were using a signed int and checked
bits by shifting with watch_id.

If a very large watch_id is passed from userspace, it can be converted
to a negative value.  This can cause invalid shifts and may access
memory outside the watch_points array.

drm/amdkfd: Fix watch_id bounds checking in debug address watch v2

Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before
using it.  Also use BIT(watch_id) to test and clear bits safely.

This keeps the behavior unchanged for valid watch IDs and avoids
undefined behavior for invalid ones.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448
kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow
'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c
    433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd,
    434                                         uint32_t watch_id)
    435 {
    436         int r;
    437
    438         if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id))

kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if
watch_id is larger than INT_MAX it leads to a buffer overflow.
(Negative shifts are undefined).

    439                 return -EINVAL;
    440
    441         if (!pdd->dev->kfd->shared_resources.enable_mes) {
    442                 r = debug_lock_and_unmap(pdd->dev->dqm);
    443                 if (r)
    444                         return r;
    445         }
    446
    447         amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
--> 448         pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch(
    449                                                         pdd->dev->adev,
    450                                                         watch_id);

v2: (as per, Jonathan Kim)
 - Add early watch_id >= MAX_WATCH_ADDRESSES validation in the set path to
   match the clear path.
 - Drop the redundant bounds check in kfd_dbg_owns_dev_watch_id().

Fixes: e0f85f4690 ("drm/amdkfd: add debug set and clear address watch points operation")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Jonathan Kim <jonathan.kim@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-12 15:18:08 -05:00
Linus Torvalds
136114e0ab mm.git review status for linus..mm-nonmm-stable
Total patches:       107
 Reviews/patch:       1.07
 Reviewed rate:       67%
 
 - The 2 patch series "ocfs2: give ocfs2 the ability to reclaim
   suballocator free bg" from Heming Zhao saves disk space by teaching
   ocfs2 to reclaim suballocator block group space.
 
 - The 4 patch series "Add ARRAY_END(), and use it to fix off-by-one
   bugs" from Alejandro Colomar adds the ARRAY_END() macro and uses it in
   various places.
 
 - The 2 patch series "vmcoreinfo: support VMCOREINFO_BYTES larger than
   PAGE_SIZE" from Pnina Feder makes the vmcore code future-safe, if
   VMCOREINFO_BYTES ever exceeds the page size.
 
 - The 7 patch series "kallsyms: Prevent invalid access when showing
   module buildid" from Petr Mladek cleans up kallsyms code related to
   module buildid and fixes an invalid access crash when printing
   backtraces.
 
 - The 3 patch series "Address page fault in
   ima_restore_measurement_list()" from Harshit Mogalapalli fixes a
   kexec-related crash that can occur when booting the second-stage kernel
   on x86.
 
 - The 6 patch series "kho: ABI headers and Documentation updates" from
   Mike Rapoport updates the kexec handover ABI documentation.
 
 - The 4 patch series "Align atomic storage" from Finn Thain adds the
   __aligned attribute to atomic_t and atomic64_t definitions to get
   natural alignment of both types on csky, m68k, microblaze, nios2,
   openrisc and sh.
 
 - The 2 patch series "kho: clean up page initialization logic" from
   Pratyush Yadav simplifies the page initialization logic in
   kho_restore_page().
 
 - The 6 patch series "Unload linux/kernel.h" from Yury Norov moves
   several things out of kernel.h and into more appropriate places.
 
 - The 7 patch series "don't abuse task_struct.group_leader" from Oleg
   Nesterov removes the usage of ->group_leader when it is "obviously
   unnecessary".
 
 - The 5 patch series "list private v2 & luo flb" from Pasha Tatashin
   adds some infrastructure improvements to the live update orchestrator.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY4giAAKCRDdBJ7gKXxA
 jgusAQDnKkP8UWTqXPC1jI+OrDJGU5ciAx8lzLeBVqMKzoYk9AD/TlhT2Nlx+Ef6
 0HCUHUD0FMvAw/7/Dfc6ZKxwBEIxyww=
 =mmsH
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
   disk space by teaching ocfs2 to reclaim suballocator block group
   space (Heming Zhao)

 - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
   ARRAY_END() macro and uses it in various places (Alejandro Colomar)

 - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
   the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
   page size (Pnina Feder)

 - "kallsyms: Prevent invalid access when showing module buildid" cleans
   up kallsyms code related to module buildid and fixes an invalid
   access crash when printing backtraces (Petr Mladek)

 - "Address page fault in ima_restore_measurement_list()" fixes a
   kexec-related crash that can occur when booting the second-stage
   kernel on x86 (Harshit Mogalapalli)

 - "kho: ABI headers and Documentation updates" updates the kexec
   handover ABI documentation (Mike Rapoport)

 - "Align atomic storage" adds the __aligned attribute to atomic_t and
   atomic64_t definitions to get natural alignment of both types on
   csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)

 - "kho: clean up page initialization logic" simplifies the page
   initialization logic in kho_restore_page() (Pratyush Yadav)

 - "Unload linux/kernel.h" moves several things out of kernel.h and into
   more appropriate places (Yury Norov)

 - "don't abuse task_struct.group_leader" removes the usage of
   ->group_leader when it is "obviously unnecessary" (Oleg Nesterov)

 - "list private v2 & luo flb" adds some infrastructure improvements to
   the live update orchestrator (Pasha Tatashin)

* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
  watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
  procfs: fix missing RCU protection when reading real_parent in do_task_stat()
  watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
  kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
  kho: fix doc for kho_restore_pages()
  tests/liveupdate: add in-kernel liveupdate test
  liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
  liveupdate: luo_file: Use private list
  list: add kunit test for private list primitives
  list: add primitives for private list manipulations
  delayacct: fix uapi timespec64 definition
  panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
  netclassid: use thread_group_leader(p) in update_classid_task()
  RDMA/umem: don't abuse current->group_leader
  drm/pan*: don't abuse current->group_leader
  drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
  drm/amdgpu: don't abuse current->group_leader
  android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
  android/binder: don't abuse current->group_leader
  kho: skip memoryless NUMA nodes when reserving scratch areas
  ...
2026-02-12 12:13:01 -08:00
Linus Torvalds
939faf71cf drm for 7.0-rc1
core:
 - drop kgdb support
 - replace system workqueue with percpu
 - account for property blobs in memcg
 - MAINTAINERS updates for xe + buddy
 
 rust:
 - Fix documentation for Registration constructors.
 - Use pin_init::zeroed() for fops initialization.
 - Annotate DRM helpers with __rust_helper.
 - Improve safety documentation for gem::Object::new().
 - Update AlwaysRefCounted imports.
 - mm: Prevent integer overflow in page_align().
 
 atomic:
 - add drm_device pointer to drm_private_obj
 - introduce gamma/degamma LUT size check
 
 buddy:
 - fix free_trees memory leak
 - prevent BUG_ON
 
 bridge:
 - introduce drm_bridge_unplug/enter/exit
 - add connector argument to .hpd_notify
 - lots of recounting conversions
 - convert rockchip inno hdmi to bridge
 - lontium-lt9611uxc: switch to HDMI audio helpers
 - dw-hdmi-qp: add support for HPD-less setups
 - Algoltek AG6311 support
 
 panels:
 - edp: CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H
 - st75751: add SPI support
 - Sitronix ST7920, Samsung LTL106HL02
 - LG LH546WF1-ED01, HannStar HSD156J
 - BOE NV130WUM-T08
 - Innolux G150XGE-L05
 - Anbernic RG-DS
 
 dma-buf:
 - improve sg_table debugging
 - add tracepoints
 - call clear_page instead of memset
 - start to introduce cgroup memory accounting in heaps
 - remove sysfs stats
 
 dma-fence:
 - add new helpers
 
 dp:
 - mst: avoid oob access with vcpi=0
 
 hdmi:
 - limit infoframes exposure to userspace
 
 gem:
 - reduce page table overhead with THP
 - fix leak in drm_gem_get_unmapped_area
 
 gpuvm:
 - API sanitation for rust bindings
 
 sched:
 - introduce new helpers
 
 panic:
 - report invalid panic modes
 - add kunit tests
 
 i915/xe display:
 - Expose sharpness only if num_scalers is >= 2
 - Add initial Xe3P_LPD for NVL
 - BMG FBC support
 - Add MTL+ platforms to support dpll framework
 _ fix DIMM_S DRM decoding on ICL
 - Return to using AUX interrupts
 - PSR/Panel replay refactoring
 - use consolidation HDMI tables
 - Xe3_LPD CD2X dividier changes
 
 xe:
 - vfio: add vfio_pci for intel GPU
 - multi queue support
 - dynamic pagemaps and multi-device SVM
 - expose temp attribs in hwmon
 - NO_COMPRESSION bo flag
 - expose MERT OA unit
 - sysfs survivability refactor
 - SRIOV PF: add MERT support
 - enable SR-IOV VF migration
 - Enable I2C/NVM on Crescent Island
 - Xe3p page reclaimation support
 - introduce SRIOV scheduler groups
 - add SoC remappt support in system controller
 - insert compiler barriers in GuC code
 - define NVL GuC firmware
 - handle GT resume failure
 - fix drm scheduler layering violations
 - enable GSC loading and PXP for PTL
 - disable GuC Power DCC strategy on PTL
 - unregister drm device on probe error
 
 i915:
 - move to kernel standard fault injection
 - bump recommended GuC version for DG2 and MTL
 
 amdgpu:
 - SMUIO 15.x, PSP 15.x support
 - IH 6.1.1/7.1 support
 - MMHUB 3.4/4.2 support
 - GC 11.5.4/12.1 support
 - SDMA 6.1.4/7.1/7.11.4 support
 - JPEG 5.3 support
 - UserQ updates
 - GC 9 gfx queue reset support
 - TTM memory ops parallelization
 - convert legacy logging to new helpers
 - DC analog fixes
 
 amdkfd:
 - GC 11.5.4/12.1 suppport
 - SDMA 6.1.4/7.1 support
 - per context support
 - increase kfd process hash table
 - Reserved SDMA rework
 
 radeon:
 - convert legacy logging to new helpers
 - use devm for i2c adapters
 
 msm:
 - GPU
   - Document a612/RGMU dt bindings
   - UBWC 6.0 support (for A840 / Kaanapali)
   - a225 support
 - DPU:
   - Switched to use virtual planes by default
   - Fixed DSI CMD panels on DPU 3.x
   - Rewrote format handling to remove intermediate representation
   - Fixed watchdog on DPU 8.x+
   - Fixed TE / Vsync source setting on DPU 8.x+
   - Added 3D_Mux on SC7280
   - Kaanapali platform support
   - Fixed UBWC register programming
   - Made RM reserve DSPP-enabled mixers for CRTCs with LMs.
   - Gamma correction support
 - DP:
   - Enabled support for eDP 1.4+ link rate tables
   - Fixed MDSS1 DP indices on SA8775P, making them to work
   - Fixed msm_dp_ctrl_config_msa() to work with LLVM 20
 - DSI:
   - Documented QCS8300 as compatible with SA8775P
   - Kaanapali platform support
 - DSI PHY:
   - switched to divider_determine_rate()
 - MDP5:
   - Dropped support for MSM8998, SDM660 and SDM630 (switched over
     to DPU)
 -  MDSS:
   - Kaanapali platform support
   - Fixed UBWC register programming
 
 nova-core:
 - Prepare for Turing support. This includes parsing and handling
   Turing-specific firmware headers and sections as well as a Turing
   Falcon HAL implementation.
 - Get rid of the Result<impl PinInit<T, E>> anti-pattern.
 - Relocate initializer-specific code into the appropriate initializer.
 - Use CStr::from_bytes_until_nul() to remove custom helpers.
 - Improve handling of unexpected firmware values.
 - Clean up redundant debug prints.
 - Replace c_str!() with native Rust C-string literals.
 - Update nova-core task list.
 
 nova:
 - Align GEM object size to system page size.
 
 tyr:
 - Use generated uAPI bindings for GpuInfo.
 - Replace manual sleeps with read_poll_timeout().
 - Replace c_str!() with native Rust C-string literals.
 - Suppress warnings for unread fields.
 - Fix incorrect register name in print statement.
 
 nouveau:
 - fix big page table support races in PTE management
 - improve reclocking on tegra 186+
 
 amdxdna:
 - fix suspend race conditions
 - improve handling of zero tail pointers
 - fix cu_idx overwritten during command setup
 - enable hardware context priority
 - remove NPU2 support
 - update message buffer allocation requirements
 - update firmware version check
 
 ast:
 - support imported cursor buffers
 - big endian fixes
 
 etnaviv:
 - add PPU flop reset support
 
 imagination:
 - add AM62P support
 - introduce hw version checks
 
 ivpu:
 - implement warm boot flow
 
 panfrost:
 - add bo sync ioctl
 - add GPU_PM_RT support for RZ/G3E SoC
 
 panthor:
 - add bo sync ioctl
 - enable timestamp propagation
 - scheduler robustness improvements
 - VM termination fixes
 - huge page support
 
 rockchip:
 - RK3368 HDMI Support
 - get rid of atomic_check fixups
 - RK3506 support
 - RK3576/RK3588 improved HPD handling
 
 rz-du:
 - RZ/V2H(P) MIPI-DSI Support
 
 v3d:
 - fix DMA segment size
 - convert to new logging helpers
 
 mediatek:
 - move DP training to hotplug thread
 - convert logging to new helpers
 - add support for HS speed DSI
 - Genio 510/700/1200-EVK, Radxa NIO-12L HDMI support
 
 atmel-hlcdc:
 - switch to drmm resource
 - support nomodeset
 - use newer helpers
 
 hisilicon:
 - fix various DP bugs
 
 renesas:
 - fix kernel panic on reboot
 
 exynos:
 - fix vidi_connection_ioctl using wrong device
 - fix vidi_connection deref user ptr
 - fix concurrency regression with vidi_context
 
 vkms:
 - add configfs support for display configuration
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmmMLAcACgkQDHTzWXnE
 hr7aAw//bQ2WLhXeMqWyqrPPe51l2DmWvbdwP6TKUjzMwd9+xvs6wQcSg80Mn230
 0vqSpqKq2aMB6GMmz7wdHG8JgZOvO7qDf2TZodXe5lvBiAAPjzX+UE/0bIQKuhym
 Ufb7tqCIPsj6TpcD3ef/173x3BnVPA6Y7lS11KaaG5l01vUAVlTD1vfWGDQp/L6P
 7g94cC+0+3eYZyKxE1+Rn7FDXdw08u+vtLchIoowcAHobgucZ8K/XtZZoqFFy3sj
 ZZN580AhyZoGcgmn2KhNvU4B+3tBFFMSVZkJm7skOO0IB2AMQGdEr0uVUDzLGc7K
 DrLaxYwM6HfxM4o0r0Ai0WCuoysCAJ95M2Cp58uDuNcew4lRTtIUqz32Sm2OJ8bD
 Z91Rvh/kOcA0Ru11Sb/kQvy9/OJ54CqojKVaUlkFo9VhHyPCPo9hjnPvaDvCt34N
 FmnhuVpZMWqcjjq5yO/192qpDJnm470eQExvkZ4YpgmWkekND0zwaT4PG4763dZJ
 juPlBQ5WtUlIzlUpRxdHE7C7ht1rWRS+HdzSYPM5aHTXDvktJvcA+1b/Jyicc+x4
 QZiZ/1AC0KKlLrZxpVpEcjkPdQj2CiCXHQ+0YjDfO3cHo/55EfKj4iiARzhDzokf
 h7FgKwvVhc9DycSq8KPGAf09AswceGAtvB1rKk+Jh9D/GqbgGtM=
 =RFJ2
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "Highlights:
   - amdgpu support for lots of new IP blocks which means newer GPUs
   - xe has a lot of SR-IOV and SVM improvements
   - lots of intel display refactoring across i915/xe
   - msm has more support for gen8 platforms
   - Given up on kgdb/kms integration, it's too hard on modern hw

  core:
   - drop kgdb support
   - replace system workqueue with percpu
   - account for property blobs in memcg
   - MAINTAINERS updates for xe + buddy

  rust:
   - Fix documentation for Registration constructors
   - Use pin_init::zeroed() for fops initialization
   - Annotate DRM helpers with __rust_helper
   - Improve safety documentation for gem::Object::new()
   - Update AlwaysRefCounted imports
   - mm: Prevent integer overflow in page_align()

  atomic:
   - add drm_device pointer to drm_private_obj
   - introduce gamma/degamma LUT size check

  buddy:
   - fix free_trees memory leak
   - prevent BUG_ON

  bridge:
   - introduce drm_bridge_unplug/enter/exit
   - add connector argument to .hpd_notify
   - lots of recounting conversions
   - convert rockchip inno hdmi to bridge
   - lontium-lt9611uxc: switch to HDMI audio helpers
   - dw-hdmi-qp: add support for HPD-less setups
   - Algoltek AG6311 support

  panels:
   - edp: CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H
   - st75751: add SPI support
   - Sitronix ST7920, Samsung LTL106HL02
   - LG LH546WF1-ED01, HannStar HSD156J
   - BOE NV130WUM-T08
   - Innolux G150XGE-L05
   - Anbernic RG-DS

  dma-buf:
   - improve sg_table debugging
   - add tracepoints
   - call clear_page instead of memset
   - start to introduce cgroup memory accounting in heaps
   - remove sysfs stats

  dma-fence:
   - add new helpers

  dp:
   - mst: avoid oob access with vcpi=0

  hdmi:
   - limit infoframes exposure to userspace

  gem:
   - reduce page table overhead with THP
   - fix leak in drm_gem_get_unmapped_area

  gpuvm:
   - API sanitation for rust bindings

  sched:
   - introduce new helpers

  panic:
   - report invalid panic modes
   - add kunit tests

  i915/xe display:
   - Expose sharpness only if num_scalers is >= 2
   - Add initial Xe3P_LPD for NVL
   - BMG FBC support
   - Add MTL+ platforms to support dpll framework
   _ fix DIMM_S DRM decoding on ICL
   - Return to using AUX interrupts
   - PSR/Panel replay refactoring
   - use consolidation HDMI tables
   - Xe3_LPD CD2X dividier changes

  xe:
   - vfio: add vfio_pci for intel GPU
   - multi queue support
   - dynamic pagemaps and multi-device SVM
   - expose temp attribs in hwmon
   - NO_COMPRESSION bo flag
   - expose MERT OA unit
   - sysfs survivability refactor
   - SRIOV PF: add MERT support
   - enable SR-IOV VF migration
   - Enable I2C/NVM on Crescent Island
   - Xe3p page reclaimation support
   - introduce SRIOV scheduler groups
   - add SoC remappt support in system controller
   - insert compiler barriers in GuC code
   - define NVL GuC firmware
   - handle GT resume failure
   - fix drm scheduler layering violations
   - enable GSC loading and PXP for PTL
   - disable GuC Power DCC strategy on PTL
   - unregister drm device on probe error

  i915:
   - move to kernel standard fault injection
   - bump recommended GuC version for DG2 and MTL

  amdgpu:
   - SMUIO 15.x, PSP 15.x support
   - IH 6.1.1/7.1 support
   - MMHUB 3.4/4.2 support
   - GC 11.5.4/12.1 support
   - SDMA 6.1.4/7.1/7.11.4 support
   - JPEG 5.3 support
   - UserQ updates
   - GC 9 gfx queue reset support
   - TTM memory ops parallelization
   - convert legacy logging to new helpers
   - DC analog fixes

  amdkfd:
   - GC 11.5.4/12.1 suppport
   - SDMA 6.1.4/7.1 support
   - per context support
   - increase kfd process hash table
   - Reserved SDMA rework

  radeon:
   - convert legacy logging to new helpers
   - use devm for i2c adapters

  msm:
   - GPU
      - Document a612/RGMU dt bindings
      - UBWC 6.0 support (for A840 / Kaanapali)
      - a225 support
   - DPU:
      - Switch to use virtual planes by default
      - Fix DSI CMD panels on DPU 3.x
      - Rewrite format handling to remove intermediate representation
      - Fix watchdog on DPU 8.x+
      - Fix TE / Vsync source setting on DPU 8.x+
      - Add 3D_Mux on SC7280
      - Kaanapali platform support
      - Fix UBWC register programming
      - Make RM reserve DSPP-enabled mixers for CRTCs with LMs
      - Gamma correction support
   - DP:
      - Enable support for eDP 1.4+ link rate tables
      - Fix MDSS1 DP indices on SA8775P, making them to work
      - Fix msm_dp_ctrl_config_msa() to work with LLVM 20
   - DSI:
      - Document QCS8300 as compatible with SA8775P
      - Kaanapali platform support
   - DSI PHY:
      - switch to divider_determine_rate()
   - MDP5:
      - Drop support for MSM8998, SDM660 and SDM630 (switch over to DPU)
   -  MDSS:
      - Kaanapali platform support
      - Fixed UBWC register programming

  nova-core:
   - Prepare for Turing support. This includes parsing and handling
     Turing-specific firmware headers and sections as well as a Turing
     Falcon HAL implementation
   - Get rid of the Result<impl PinInit<T, E>> anti-pattern
   - Relocate initializer-specific code into the appropriate initializer
   - Use CStr::from_bytes_until_nul() to remove custom helpers
   - Improve handling of unexpected firmware values
   - Clean up redundant debug prints
   - Replace c_str!() with native Rust C-string literals
   - Update nova-core task list

  nova:
   - Align GEM object size to system page size

  tyr:
   - Use generated uAPI bindings for GpuInfo
   - Replace manual sleeps with read_poll_timeout()
   - Replace c_str!() with native Rust C-string literals
   - Suppress warnings for unread fields
   - Fix incorrect register name in print statement

  nouveau:
   - fix big page table support races in PTE management
   - improve reclocking on tegra 186+

  amdxdna:
   - fix suspend race conditions
   - improve handling of zero tail pointers
   - fix cu_idx overwritten during command setup
   - enable hardware context priority
   - remove NPU2 support
   - update message buffer allocation requirements
   - update firmware version check

  ast:
   - support imported cursor buffers
   - big endian fixes

  etnaviv:
   - add PPU flop reset support

  imagination:
   - add AM62P support
   - introduce hw version checks

  ivpu:
   - implement warm boot flow

  panfrost:
   - add bo sync ioctl
   - add GPU_PM_RT support for RZ/G3E SoC

  panthor:
   - add bo sync ioctl
   - enable timestamp propagation
   - scheduler robustness improvements
   - VM termination fixes
   - huge page support

  rockchip:
   - RK3368 HDMI Support
   - get rid of atomic_check fixups
   - RK3506 support
   - RK3576/RK3588 improved HPD handling

  rz-du:
   - RZ/V2H(P) MIPI-DSI Support

  v3d:
   - fix DMA segment size
   - convert to new logging helpers

  mediatek:
   - move DP training to hotplug thread
   - convert logging to new helpers
   - add support for HS speed DSI
   - Genio 510/700/1200-EVK, Radxa NIO-12L HDMI support

  atmel-hlcdc:
   - switch to drmm resource
   - support nomodeset
   - use newer helpers

  hisilicon:
   - fix various DP bugs

  renesas:
   - fix kernel panic on reboot

  exynos:
   - fix vidi_connection_ioctl using wrong device
   - fix vidi_connection deref user ptr
   - fix concurrency regression with vidi_context

  vkms:
   - add configfs support for display configuration

* tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernel: (1610 commits)
  drm/xe/pm: Disable D3Cold for BMG only on specific platforms
  drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep
  drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early
  drm/xe: Fix kerneldoc for xe_migrate_exec_queue
  drm/xe/query: Fix topology query pointer advance
  drm/xe/guc: Fix kernel-doc warning in GuC scheduler ABI header
  drm/xe/guc: Fix CFI violation in debugfs access.
  accel/amdxdna: Move RPM resume into job run function
  accel/amdxdna: Fix incorrect DPM level after suspend/resume
  nouveau/vmm: start tracking if the LPT PTE is valid. (v6)
  nouveau/vmm: increase size of vmm pte tracker struct to u32 (v2)
  nouveau/vmm: rewrite pte tracker using a struct and bitfields.
  accel/amdxdna: Fix incorrect error code returned for failed chain command
  accel/amdxdna: Remove hardware context status
  drm/bridge: imx8qxp-pixel-combiner: Fix bailout for imx8qxp_pc_bridge_probe()
  drm/panel: ilitek-ili9882t: Remove duplicate initializers in tianma_il79900a_dsc
  drm/i915/display: fix the pixel normalization handling for xe3p_lpd
  drm/exynos: vidi: use ctx->lock to protect struct vidi_context member variables related to memory alloc/free
  drm/exynos: vidi: fix to avoid directly dereferencing user pointer
  drm/exynos: vidi: use priv->vidi_dev for ctx lookup in vidi_connection_ioctl()
  ...
2026-02-11 12:55:44 -08:00
Sunday Clement
8a70a26c9f drm/amdkfd: Fix out-of-bounds write in kfd_event_page_set()
The kfd_event_page_set() function writes KFD_SIGNAL_EVENT_LIMIT * 8
bytes via memset without checking the buffer size parameter. This allows
unprivileged userspace to trigger an out-of bounds kernel memory write
by passing a small buffer, leading to  potential privilege
escalation.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2026-02-05 17:20:07 -05:00
Thomas Zimmermann
2bebc88d5e Merge drm/drm-next into drm-misc-next
Backmerging to get bug fixes from v6.19-rc7.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2026-02-05 10:33:06 +01:00
Andrew Martin
08b1eb621c drm/amdgpu: Ignored various return code
The return code of a non void function should not be ignored. In cases
where we do not care, the code needs to suppress it.

Signed-off-by: Andrew Martin <andrew.martin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-03 16:46:31 -05:00
Oleg Nesterov
a87da7a9fa drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
Nowadays task->group_leader->mm != task->mm is only possible if a) task is
not a group leader and b) task->group_leader->mm == NULL because
task->group_leader has already exited using sys_exit().

I don't think that drm/amd tries to detect/nack this case.

Link: https://lkml.kernel.org/r/aXY_yLVHd63UlWtm@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Christan König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-03 08:21:25 -08:00