linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-27 00:22:00 +02:00

Author	SHA1	Message	Date
Matt Roper	4568dfbb83	drm/xe: Make decision to use Xe2-style blitter instructions a feature flag The blitter engines' MEM_COPY and MEM_SET instructions were added as part of the same hardware change that introduced service copy engines (i.e., BCS1-BCS8) which is why the driver checks for service copy engine presence when deciding whether to use these instructions or the older XY_* instructions. However when making this decision the driver should consider which engines are part of the hardware architecture, not which engines are present/usable on the current device. For graphics IP versions that architecturally include service copy engines (i.e., everything Xe2 and later, plus PVC's Xe_HPC) we should use MEM_SET and MEM_COPY even in if all of the service copy engines wind up getting fused off. I.e., we need to decide based on whether the platform's graphics descriptor contains these engines, rather than whether the usable engine mask contains them. This logic got broken when gt->info.__engine_mask was removed, although in practice that mistake has been harmless so far because there haven't been any hardware SKUs that fuse off all of the service copy engines yet. Replace the incorrect has_service_copy_support() function with a GT feature flag that tracks more accurately whether the new blitter instructions are usable. In addition to fixing incorrect logic if all service copies are fused off, the flag also makes it more obvious what the calling code is trying to do; previously it wasn't terribly obvious why "has service copy engines" was being used as the condition for using different instructions on all copy engine types. The new feature flag is named 'has_xe2_blt_instructions' because we expect this flag to be set for all Xe2 and later platforms (i.e., everything officially supported by the Xe driver). Technically there's also one Xe1-era platform (PVC) that supports these engines/instructions and will set this flag, but this still seems to be the most clear and understandable name for the flag. Fixes: `61549a2ee5` ("drm/xe: Drop __engine_mask") Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patch.msgid.link/20260507-xe2_copy-v1-1-26506381b821@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 09b399842907565a64e351fb22da790b4c673ffb) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-05-11 16:46:06 -04:00
Satyanarayana K V P	1460eae74f	drm/xe/vf: Use drm mm instead of drm sa for CCS read/write The suballocator algorithm tracks a hole cursor at the last allocation and tries to allocate after it. This is optimized for fence-ordered progress, where older allocations are expected to become reusable first. In fence-enabled mode, that ordering assumption holds. In fence-disabled mode, allocations may be freed in arbitrary order, so limiting allocation to the current hole window can miss valid free space and fail allocations despite sufficient total space. Use DRM memory manager instead of sub-allocator to get rid of this issue as CCS read/write operations do not use fences. Fixes: `864690cf4d` ("drm/xe/vf: Attach and detach CCS copy commands with BO") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260408110145.1639937-6-satyanarayana.k.v.p@intel.com (cherry picked from commit 6c84b493012aeb05dec29c709377bf0e17ac6815) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-04-29 12:51:19 -04:00
Matthew Brost	42d3b66d4c	Merge drm/drm-next into drm-xe-next Backmerging to bring in 7.00-rc3. Important ahead GPU SVM merging THP support. Signed-off-by: Matthew Brost <matthew.brost@intel.com>	2026-03-12 07:23:23 -07:00
Nitin Gote	be97fd0645	drm/xe: add xe_migrate_resolve wrapper and is_vram_resolve support Introduce an internal __xe_migrate_copy(..., is_vram_resolve) path and expose a small wrapper xe_migrate_resolve() that calls it with is_vram_resolve=true. For resolve/decompression operations we must ensure the copy code uses the compression PAT index when appropriate; this change centralizes that behavior and allows callers to schedule a resolve (decompress) operation via the migrate API. v3: Fix kernel-doc warnings v2: (Matt) - Simplify xe_migrate_resolve(), use single BO/resource; remove copy_only_ccs argument as it's always false. Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260304123758.3050386-7-nitin.r.gote@intel.com	2026-03-12 09:37:40 +00:00
Raag Jadav	e8a6e92285	drm/xe/migrate: Refactor xe_migrate_prepare_vm() Currently xe_migrate_prepare_vm() does three things. 1. Allocates pt_bo for migrate context. 2. Initializes pt_bo with actual pte details. 3. Initializes sa_manager for migrate context. Split these implementations in their own functions for better maintainability. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260303101913.3576481-1-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-03-06 15:13:38 -08:00
Dave Airlie	17b95278ae	Merge tag 'drm-xe-next-2026-03-02' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - restrict multi-lrc to VCS/VECS engines (Xin Wang) - Introduce a flag to disallow vm overcommit in fault mode (Thomas) - update used tracking kernel-doc (Auld, Fixes) - Some bind queue fixes (Auld, Fixes) Cross-subsystem Changes: - Split drm_suballoc_new() into SA alloc and init helpers (Satya, Fixes) - pass pagemap_addr by reference (Arnd, Fixes) - Revert "drm/pagemap: Disable device-to-device migration" (Thomas) - Fix unbalanced unlock in drm_gpusvm_scan_mm (Maciej, Fixes) - Small GPUSVM fixes (Brost, Fixes) - Fix xe SVM configs (Thomas, Fixes) Core Changes: - Fix a hmm_range_fault() livelock / starvation problem (Thomas, Fixes) Driver Changes: - Fix leak on xa_store failure (Shuicheng, Fixes) - Correct implementation of Wa_16025250150 (Roper, Fixes) - Refactor context init into xe_lrc_ctx_init (Raag) - Fix GSC proxy cleanup on early initialization failure (Zhanjun) - Fix exec queue creation during post-migration recovery (Tomasz, Fixes) - Apply windower hardware filtering setting on Xe3 and Xe3p (Roper) - Free ctx_restore_mid_bb in release (Shuicheng, Fixes) - Drop stale MCR steering TODO comment (Roper) - dGPU memory optimizations (Brost) - Do not preempt fence signaling CS instructions (Brost, Fixes) - Revert "drm/xe/compat: Remove unused i915_reg.h from compat header" (Uma) - Don't expose display modparam if no display support (Wajdeczko) - Some VRAM flag improvements (Wajdeczko) - Misc fix for xe_guc_ct.c (Shuicheng, Fixes) - Remove unused i915_reg.h from compat header (Uma) - Workaround cleanup & simplification (Roper) - Add prefetch pagefault support for Xe3p (Varun) - Fix fs_reclaim deadlock caused by CCS save/restore (Satya, Fixes) - Cleanup partially initialized sync on parse failure (Shuicheng, Fixes) - Allow to change VFs VRAM quota using sysfs (Michal) - Increase GuC log sizes in debug builds (Tomasz) - Wa_18041344222 changes (Harish) - Add Wa_14026781792 (Niton) - Add debugfs facility to catch RTP mistakes (Roper) - Convert GT stats to per-cpu counters (Brost) - Prevent unintended VRAM channel creation (Karthik) - Privatize struct xe_ggtt (Maarten) - remove unnecessary struct dram_info forward declaration (Jani) - pagefault refactors (Brost) - Apply Wa_14024997852 (Arvind) - Redirect faults to dummy page for wedged device (Raag, Fixes) - Force EXEC_QUEUE_FLAG_KERNEL for kernel internal VMs (Piotr) - Stop applying Wa_16018737384 from Xe3 onward (Roper) - Add new XeCore fuse registers to VF runtime regs (Roper) - Update xe_device_declare_wedged() error log (Raag) - Make xe_modparam.force_vram_bar_size signed (Shuicheng, Fixes) - Avoid reading media version when media GT is disabled (Piotr, Fixes) - Fix handling of Wa_14019988906 & Wa_14019877138 (Roper, Fixes) - Basic enabling patches for Xe3p_LPG and NVL-P (Gustavo, Roper, Shekhar) - Avoid double-adjust in 64-bit reads (Shuicheng, Fixes) - Allow VF to initialize MCR tables (Wajdeczko) - Add Wa_14025883347 for GuC DMA failure on reset (Anirban) - Add bounds check on pat_index to prevent OOB kernel read in madvise (Jia, Fixes) - Fix the address range assert in ggtt_get_pte helper (Winiarski) - XeCore fuse register changes (Roper) - Add more info to powergate_info debugfs (Vinay) - Separate out GuC RC code (Vinay) - Fix g2g_test_array indexing (Pallavi) - Mutual exclusivity between CCS-mode and PF (Nareshkumar, Fixes) - Some more _types.h cleanups (Wajdeczko) - Fix sysfs initialization (Wajdeczko, Fixes) - Drop unnecessary goto in xe_device_create (Roper) - Disable D3Cold for BMG only on specific platforms (Karthik, Fixes) - Add sriov.admin_only_pf attribute (Wajdeczko) - replace old wq(s), add WQ_PERCPU to alloc_workqueue (Marco) - Make MMIO communication more robust (Wajdeczko) - Fix warning of kerneldoc (Shuicheng, Fixes) - Fix topology query pointer advance (Shuicheng, Fixes) - use entry_dump callbacks for xe2+ PAT dumps (Xin Wang) - Fix kernel-doc warning in GuC scheduler ABI header (Chaitanya, Fixes) - Fix CFI violation in debugfs access (Daniele, Fixes) - Apply WA_16028005424 to Media (Balasubramani) - Fix typo in function kernel-doc (Wajdeczko) - Protect priority against concurrent access (Niranjana) - Fix nvm aux resource cleanup (Shuicheng, Fixes) - Fix is_bound() pci_dev lifetime (Shuicheng, Fixes) - Use CLASS() for forcewake in xe_gt_enable_comp_1wcoh (Shuicheng) - Reset VF GuC state on fini (Wajdeczko) - Move _THIS_IP_ usage from xe_vm_create() to dedicated function (Nathan Chancellor, Fixes) - Unregister drm device on probe error (Shuicheng, Fixes) - Disable DCC on PTL (Vinay, Fixes) - Fix Wa_18022495364 (Tvrtko, Fixes) - Skip address copy for sync-only execs (Shuicheng, Fixes) - derive mem copy capability from graphics version (Nitin, Fixes) - Use DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocations (Sanjay) - Context based TLB invalidations (Brost) - Enable multi_queue on xe3p_xpc (Brost, Niranjana) - Remove check for gt in xe_query (Nakshtra) - Reduce LRC timestamp stuck message on VFs to notice (Brost, Fixes) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/aaYR5G2MHjOEMXPW@lstrano-desk.jf.intel.com	2026-03-03 10:37:29 +10:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Satyanarayana K V P	bcd768d787	drm/xe/vf: Fix fs_reclaim warning with CCS save/restore BB allocation CCS save/restore batch buffers are attached during BO allocation and detached during BO teardown. The shrinker triggers xe_bo_move(), which is used for both allocation and deletion paths. When BO allocation and shrinking occur concurrently, a circular locking dependency involving fs_reclaim and swap_guard can occur, leading to a deadlock such as: =============================================================== * WARNING: possible circular locking dependency detected * --------------------------------------------------------------- * * * CPU0 CPU1 * * ---- ---- * * lock(fs_reclaim); * * lock(&sa_manager->swap_guard); * * lock(fs_reclaim); * * lock(&sa_manager->swap_guard); * * * * * DEADLOCK * * =============================================================== To avoid this, the BB pointer and SA are allocated using xe_bb_alloc() before taking lock and SA is initialized using xe_bb_init() preventing reclaim from being invoked in this context. Fixes: `864690cf4d` ("drm/xe/vf: Attach and detach CCS copy commands with BO") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260220055519.2485681-7-satyanarayana.k.v.p@intel.com	2026-02-20 10:54:03 -08:00
Shuicheng Lin	5d5ef69549	drm/xe: Fix kerneldoc for xe_migrate_exec_queue Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_migrate.c:1262 expecting prototype for xe_get_migrate_exec_queue(). Prototype was for xe_migrate_exec_queue() instead" Fixes: `916ee4704a` ("drm/xe/vf: Register CCS read/write contexts with Guc") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-6-shuicheng.lin@intel.com (cherry picked from commit `9fd8da7179`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-05 08:03:41 -05:00
Shuicheng Lin	9fd8da7179	drm/xe: Fix kerneldoc for xe_migrate_exec_queue Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_migrate.c:1262 expecting prototype for xe_get_migrate_exec_queue(). Prototype was for xe_migrate_exec_queue() instead" Fixes: `916ee4704a` ("drm/xe/vf: Register CCS read/write contexts with Guc") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-6-shuicheng.lin@intel.com	2026-02-02 22:04:55 +01:00
Dave Airlie	6704d98a4f	Linux 6.19-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAml2lQweHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0dcH/2yLU3IKlHSSgEDL Qq3oBuRK/zoVOdy+CM+TmTdl2d1LnBd8J547xFStB7kVGf5mEkdFZdHLBSHRnKDf ia1SGec06kyLpRX6x5T6FsfwOhkBmVsp59X0coM57QWxxenybugtzPvDO2TQ8/G4 buixJI0jJVgwRwXNzWB4n2W6FxNGui2A7gEN2mjtvkM2t/aDkiDjEqB8ve0pZJX9 4EWhxOgRFzwWgkd/bY+4wgXVXEt3GtI+3VvNncRqLIO00A/AnZOYmH4S2RQUDszD IbyDscYYxloZcZMDXc3PN2WgD9DCGKuP3GpJGsOHbl0DN6JkqI9nwGsOFZKGVOeF vbajwPE= =iAOa -----END PGP SIGNATURE----- BackMerge tag 'v6.19-rc7' into drm-next Linux 6.19-rc7 This is needed for msm and rust trees. Signed-off-by: Dave Airlie <airlied@redhat.com>	2026-01-28 12:44:28 +10:00
Matthew Auld	772157f626	drm/xe/migrate: fix job lock assert We are meant to be checking the user vm for the bind queue, but actually we are checking the migrate vm. For various reasons this is not currently firing but this will likely change in the future. Now that we have the user_vm attached to the bind queue, we can fix this by directly checking that here. Fixes: `dba89840a9` ("drm/xe: Add GT TLB invalidation jobs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Link: https://patch.msgid.link/20260120110609.77958-4-matthew.auld@intel.com (cherry picked from commit `9dd1048bca`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-01-21 15:24:22 +01:00
Matthew Auld	9dd1048bca	drm/xe/migrate: fix job lock assert We are meant to be checking the user vm for the bind queue, but actually we are checking the migrate vm. For various reasons this is not currently firing but this will likely change in the future. Now that we have the user_vm attached to the bind queue, we can fix this by directly checking that here. Fixes: `dba89840a9` ("drm/xe: Add GT TLB invalidation jobs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Link: https://patch.msgid.link/20260120110609.77958-4-matthew.auld@intel.com	2026-01-21 10:07:21 +00:00
Francois Dugast	15e096960a	drm/xe/migrate: Configure migration queue as low latency Commit `5488bec96b` ("drm/xe/uapi: Use hint for guc to set GT frequency") introduced low latency hint for use by user space when creating an exec queue. This instructs SLPC to ramp the GT frequency aggressively. SVM relies on an internal exec queue to migrate memory upon page faults. This change creates this exec queue with the low latency hint to speed up migration. This should not impact systems where GT frequency is set over sysfs, or with long running workloads which give enough time for the frequency to ramp up. An example of memory access pattern that shows an improvement of SVM performance is running hundreds of times IGT eu-fault-2m-once-device in xe_exec_system_allocator. The copy duration provided by GT stats in svm_2M_device_copy_us shows per GPU page fault: ~ 165 μs without low latency hint ~ 130 μs with low latency hint Suggested-by: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Link: https://patch.msgid.link/20251223115327.49555-1-francois.dugast@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-12-23 10:03:48 -05:00
Thomas Hellström	754c232384	drm/pagemap, drm/xe: Ensure that the devmem allocation is idle before use In situations where no system memory is migrated to devmem, and in upcoming patches where another GPU is performing the migration to the newly allocated devmem buffer, there is nothing to ensure any ongoing clear to the devmem allocation or async eviction from the devmem allocation is complete. Address that by passing a struct dma_fence down to the copy functions, and ensure it is waited for before migration is marked complete. v3: - New patch. v4: - Update the logic used for determining when to wait for the pre_migrate_fence. - Update the logic used for determining when to warn for the pre_migrate_fence since the scheduler fences apparently can signal out-of-order. v5: - Fix a UAF (CI) - Remove references to source P2P migration (Himal) - Put the pre_migrate_fence after migration. v6: - Pipeline the pre_migrate_fence dependency (Matt Brost) Fixes: `c5b3eb5a90` ("drm/xe: Add GPUSVM device memory copy vfunc functions") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-4-thomas.hellstrom@linux.intel.com (cherry picked from commit `16b5ad3195`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-12-23 10:11:59 +01:00
Thomas Hellström	75af93b3f5	drm/pagemap, drm/xe: Support destination migration over interconnect Support destination migration over interconnect when migrating from device-private pages with the same dev_pagemap owner. Since we now also collect device-private pages to migrate, also abort migration if the range to migrate is already fully populated with pages from the desired pagemap. Finally return -EBUSY from drm_pagemap_populate_mm() if the migration can't be completed without first migrating all pages in the range to system. It is expected that the caller will perform that before retrying the call to drm_pagemap_populate_mm(). v3: - Fix a bug where the p2p dma-address was never used. - Postpone enabling destination interconnect migration, since xe devices require source interconnect migration to ensure the source L2 cache is flushed at migration time. - Update the drm_pagemap_migrate_to_devmem() interface to pass migration details. v4: - Define XE_INTERCONNECT_P2P unconditionally (CI) - Include a missing header (CI) v5: - Use page order increments where possible (Matt Brost). - Fix a negated value of can_migrate_same_pagemap. - Move removal of some dead code to a separate patch (Matt Brost). - Remove an unnecessary zdd get() and put() (Matt Brost). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-23-thomas.hellstrom@linux.intel.com	2025-12-23 10:00:49 +01:00
Thomas Hellström	16b5ad3195	drm/pagemap, drm/xe: Ensure that the devmem allocation is idle before use In situations where no system memory is migrated to devmem, and in upcoming patches where another GPU is performing the migration to the newly allocated devmem buffer, there is nothing to ensure any ongoing clear to the devmem allocation or async eviction from the devmem allocation is complete. Address that by passing a struct dma_fence down to the copy functions, and ensure it is waited for before migration is marked complete. v3: - New patch. v4: - Update the logic used for determining when to wait for the pre_migrate_fence. - Update the logic used for determining when to warn for the pre_migrate_fence since the scheduler fences apparently can signal out-of-order. v5: - Fix a UAF (CI) - Remove references to source P2P migration (Himal) - Put the pre_migrate_fence after migration. v6: - Pipeline the pre_migrate_fence dependency (Matt Brost) Fixes: `c5b3eb5a90` ("drm/xe: Add GPUSVM device memory copy vfunc functions") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-4-thomas.hellstrom@linux.intel.com	2025-12-23 09:35:11 +01:00
Satyanarayana K V P	fa18290bf0	drm/xe/vf: Shadow buffer management for CCS read/write operations CCS copy command consist of 5-dword sequence. If vCPU halts during save/restore operations while these sequences are being programmed, incomplete writes can cause page faults during IGPU CCS metadata saving. Use shadow buffer management to prevent partial write issues during CCS operations. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251118120745.3460172-3-satyanarayana.k.v.p@intel.com	2025-11-18 21:45:08 -08:00
Lukasz Laguna	57a5f45b3b	drm/xe/migrate: Add function to copy of VRAM data in chunks Introduce a new function to copy data between VRAM and sysmem objects. The existing xe_migrate_copy() is tailored for eviction and restore operations, which involves additional logic and operates on entire objects. The xe_migrate_vram_copy_chunk() allows copying chunks of data to or from a dedicated buffer object, which is essential in case of VF migration. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251112132220.516975-22-michal.winiarski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-11-13 11:48:20 +01:00
Matthew Brost	b2d7ec41f2	drm/xe: Attach last fence to TLB invalidation job queues Add support for attaching the last fence to TLB invalidation job queues to address serialization issues during bursts of unbind jobs. Ensure that user fence signaling for a bind job reflects both the bind job itself and the last fences of all related TLB invalidations. Maintain submission order based solely on the state of the bind and TLB invalidation queues. Introduce support functions for last fence attachment to TLB invalidation queues. v3: - Fix assert in xe_exec_queue_tlb_inval_last_fence_set (CI) - Ensure migrate lock held for migrate queues (Testing) v5: - Style nits (Thomas) - Rewrite commit message (Thomas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251031234050.3043507-3-matthew.brost@intel.com	2025-11-04 08:20:57 -08:00
Sanjay Yadav	dd5d11b657	drm/xe: Fix spelling and typos across Xe driver files Corrected various spelling mistakes and typos in multiple files under the Xe directory. These fixes improve clarity and maintain consistency in documentation. v2 - Replaced all instances of "XE" with "Xe" where it referred to the driver name - of -> for - Typical -> Typically v3 - Revert "Xe" to "XE" for macro prefix reference Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20251023121453.1182035-2-sanjay.kumar.yadav@intel.com	2025-10-27 13:00:11 +00:00
Matthew Auld	f558630a7d	drm/xe/migrate: skip bounce buffer path on xe2 Now that we support MEM_COPY we should be able to use the PAGE_COPY mode, otherwise falling back to BYTE_COPY mode when we have odd sizing/alignment. v2: - Use info.has_mem_copy_instr - Rebase on latest changes. v3 (Matt Brost): - Allow various pitches including 1byte pitch for MEM_COPY Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251022163836.191405-8-matthew.auld@intel.com	2025-10-23 10:48:41 +01:00
Matthew Auld	1e12dbae9d	drm/xe/migrate: support MEM_COPY instruction Make this the default on xe2+ when doing a copy. This has a few advantages over the exiting copy instruction: 1) It has a special PAGE_COPY mode that claims to be optimised for page-in/page-out, which is the vast majority of current users. 2) It also has a simple BYTE_COPY mode that supports byte granularity copying without any restrictions. With 2) we can now easily skip the bounce buffer flow when copying buffers with strange sizing/alignment, like for memory_access. But that is left for the next patch. v2 (Matt Brost): - Use device info to check whether device should use the MEM_COPY path. This should fit better with making this a configfs tunable. - And with that also keep old path still functional on xe2 for possible experimentation. - Add a define for PAGE_COPY page-size. v3 (Matt Brost): - Fallback to an actual linear copy for pitch=1. - Also update NVL. BSpec: 57561 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251022163836.191405-7-matthew.auld@intel.com	2025-10-23 10:48:39 +01:00
Matthew Auld	0171dcce33	drm/xe/migrate: trim batch buffer sizing We have an extra two dwords, but it looks like we should only need one for the extra bb_end. Likely this is just leftover from back when the arb handling was moved into the ring programming. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251022163836.191405-6-matthew.auld@intel.com	2025-10-23 10:48:38 +01:00
Matthew Auld	1413329456	drm/xe/migrate: fix batch buffer sizing In xe_migrate_vram() the copy can straddle page boundaries, so the len might look like a single page, but actually accounting for the offset within the page we will need to emit more than one PTE. Otherwise in some cases the batch buffer will be undersized leading to warnings later. We already have npages so use that instead. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251022163836.191405-5-matthew.auld@intel.com	2025-10-23 10:48:37 +01:00
Matthew Auld	fb188d8b00	drm/xe/migrate: fix chunk handling for 2M page emit On systems with PAGE_SIZE > 4K the chunk will likely be rounded down to zero, if say we have single 2M page, so one huge pte, since we also try to align the chunk to PAGE_SIZE / XE_PAGE_SIZE, which will be 16 on 64K systems. Make the ALIGN_DOWN conditional for 4K PTEs where we can encounter gpu_page_size < PAGE_SIZE. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251022163836.191405-4-matthew.auld@intel.com	2025-10-23 10:48:36 +01:00
Matthew Auld	aaeef7a9c8	drm/xe/migrate: rework size restrictions for sram pte emit We allow the input size to not be aligned to PAGE_SIZE, which leads to various bugs in build_pt_update_batch_sram() for PAGE_SIZE > 4K systems. For example if ptes is exactly one gpu_page_size then the chunk size is rounded down to zero. The simplest fix looks to be forcing PAGE_SIZE aligned inputs. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251022163836.191405-3-matthew.auld@intel.com	2025-10-23 10:48:34 +01:00
Matthew Auld	3c767f762b	drm/xe/migrate: fix offset and len check Restriction here is pitch of 4bytes to match pixel width (32b), and hw restriction where src and dst must be aligned to 64bytes. If any of that is not possible then we need a bounce buffer. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251022163836.191405-2-matthew.auld@intel.com	2025-10-23 10:48:33 +01:00
Matthew Auld	641bcf8731	drm/xe/migrate: don't misalign current bytes If current bytes exceeds the max copy size, ensure the clamped size still accounts for the XE_CACHELINE_BYTES alignment, otherwise we trigger the assert in xe_migrate_vram with the size now being out of alignment. Fixes: `8c2d61e0e9` ("drm/xe/migrate: don't overflow max copy size") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6212 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010162020.190962-2-matthew.auld@intel.com	2025-10-15 15:26:51 +01:00
Matthew Brost	dd83b101a4	drm/xe: Enable 2M pages in xe_migrate_vram Using 2M pages in xe_migrate_vram has two benefits: we issue fewer instructions per 2M copy (1 vs. 512), and the cache hit rate should be higher. This results in increased copy engine bandwidth, as shown by benchmark IGTs. Enable 2M pages by reserving PDEs in the migrate VM and using 2M pages in xe_migrate_vram if the DMA address order matches 2M. v2: - Reuse build_pt_update_batch_sram (Stuart) - Fix build_pt_update_batch_sram for PAGE_SIZE > 4K v3: - More fixes for PAGE_SIZE > 4K, align chunk, decrement chunk as needed - Use stack incr var in xe_migrate_vram_use_pde (Stuart) v4: - Split PAGE_SIZE > 4K fix out in different patch (Stuart) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20251013034555.4121168-3-matthew.brost@intel.com	2025-10-13 12:31:22 -07:00
Matthew Brost	55991d854f	drm/xe: Fix build_pt_update_batch_sram for non-4K PAGE_SIZE The build_pt_update_batch_sram function in the Xe migrate layer assumes PAGE_SIZE == XE_PAGE_SIZE (4K), which is not a valid assumption on non-x86 platforms. This patch updates build_pt_update_batch_sram to correctly handle PAGE_SIZE > 4K by programming multiple 4K GPU pages per CPU page. v5: - Mask off non-address bits during compare Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Simon Richter <Simon.Richter@hogyros.de> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20251013034555.4121168-2-matthew.brost@intel.com	2025-10-13 12:31:21 -07:00
Thomas Hellström	381f1ed151	drm/xe/migrate: Fix an error path The exhaustive eviction accidently changed an error path goto to a return. Fix this. Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com	2025-10-10 11:35:11 +02:00
Satyanarayana K V P	673167d9f0	drm/xe: Use PPGTT addresses for TLB invalidation to avoid GGTT fixups The migrate VM builds the CCS metadata save/restore batch buffer (BB) in advance and retains it so the GuC can submit it directly when saving a VM’s state. When a VM migrates between VFs, the GGTT base can change. Any GGTT-based addresses embedded in the BB would then have to be parsed and patched. Use PPGTT addresses in the BB (including for TLB invalidation) so the BB remains GGTT-agnostic and requires no address fixups during migration. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-31-matthew.brost@intel.com	2025-10-09 03:24:07 -07:00
Thomas Hellström	187e16f69d	drm/xe: Work around clang multiple goto-label error When using drm_exec_retry_on_contention(), clang may consider all labels for which we take addresses in a function as potential retry goto targets, although strictly only one is possible. It will then in some situations generate false positive errors. In this case, the compiler, for some architectures, consider the might_lock(&m->job_mutex); as a potential goto target from drm_exec_retry_on_contention(), and errors. Work around that by moving the xe_validate / drm_exec transaction to a separate function. v2: - New commit message based on analysis of Nathan Chancellor Fixes: `59eabff2a3` ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509101853.nDmyxTEM-lkp@intel.com/ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Link: https://lore.kernel.org/r/20250911080324.180307-1-thomas.hellstrom@linux.intel.com	2025-09-18 08:27:00 +02:00
Thomas Hellström	59eabff2a3	drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction Introduce an xe_bo_create_pin_map_novm() function that does not take the drm_exec paramenter to simplify the conversion of many callsites. For the rest, ensure that the same drm_exec context that was used for locking the vm is passed down to validation. Use xe_validation_guard() where appropriate. v2: - Avoid gotos from within xe_validation_guard(). (Matt Brost) - Break out the change to pf_provision_vf_lmem8 to a separate patch. - Adapt to signature change of xe_validation_guard(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-12-thomas.hellstrom@linux.intel.com	2025-09-10 09:16:06 +02:00
Sanjay Yadav	c4dfa0bea2	drm/xe/migrate: Remove unneeded emit_pte() when copying CCS only In xe_migrate_copy(), when copy_only_ccs is true, we only need two emit_pte() calls one for the BO and one for the raw CCS storage. However, the current implementation issues three emit_pte() calls, resulting in an unnecessary PTE programming job. This fix removes the redundant emit_pte() call to avoid programming the same PTEs twice and reducing overhead during CCS-only migration. v2: Preserve correct behavior on DG2, which requires both CCS and page copies. Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904161423.2448727-1-sanjay.kumar.yadav@intel.com	2025-09-05 13:29:20 +01:00
Matthew Auld	edb1745fc6	drm/xe: improve dma-resv handling for backup object Since the dma-resv is shared we don't need to reserve and add a fence slot fence twice, plus no need to loop through the dependencies. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250829164715.720735-2-matthew.auld@intel.com	2025-09-05 11:53:00 +01:00
Matthew Auld	81a45cb7ea	drm/xe/migrate: make MI_TLB_INVALIDATE conditional When clearing VRAM we should be able to skip invalidating the TLBs if we are only using the identity map to access VRAM (which is the common case), since no modifications are made to PTEs on the fly. Also since we use huge 1G entries within the identity map, there should be a pretty decent chance that the next packet(s) (if also clears) can avoid a tree walk if we don't shoot down the TLBs, like if we have to process a long stream of clears. For normal moves/copies, we usually always end up with the src or dst being system memory, meaning we can't only rely on the identity map and will also need to emit PTEs and so will always require a TLB flush. v2: - Update commit to explain the situation for normal copies (Matt B) - Rebase on latest changes Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250808110452.467513-2-matthew.auld@intel.com	2025-08-28 09:58:19 +01:00
Simon Richter	b85bb2d677	drm/xe: Make page size consistent in loop If PAGE_SIZE != XE_PAGE_SIZE (which is currently locked behind CONFIG_BROKEN), this would generate the wrong number of PDEs. Since these PDEs are consumed by the GPU, the GPU page size needs to be used. Signed-off-by: Simon Richter <Simon.Richter@hogyros.de> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250818064806.2835-1-Simon.Richter@hogyros.de	2025-08-18 10:52:31 -07:00
Piotr Piórkowski	9337166fa1	drm/xe: Assign ioctl xe file handler to vm in xe_vm_create In several code paths, such as xe_pt_create(), the vm->xef field is used to determine whether a VM originates from userspace or the kernel. Previously, this handler was only assigned in xe_vm_create_ioctl(), after the VM was created by xe_vm_create(). However, xe_vm_create() triggers page table creation, and that function assumes vm->xef should be already set. This could lead to incorrect origin detection. To fix this problem and ensure consistency in the initialization of the VM object, let's move the assignment of this handler to xe_vm_create. v2: - take reference to the xe file object only when xef is not NULL - release the reference to the xe file object on the error path (Matthew) Fixes: `7f387e6012` ("drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE") Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250811104358.2064150-2-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-12 13:03:36 +02:00
Matthew Auld	17593a69b7	drm/xe: rework PDE PAT index selection For non-leaf paging structures we end up selecting a random index between [0, 3], depending on the first user if the page-table is shared, since non-leaf structures only have two bits in the HW for encoding the PAT index, and here we are just passing along the full user provided index, which can be an index as large as ~31 on xe2+. The user provided index is meant for the leaf node, which maps the actual BO pages where we have more PAT bits, and not the non-leaf nodes which are only mapping other paging structures, and so only needs a minimal PAT index range. Also the chosen index might need to consider how the driver mapped the paging structures on the host side, like wc vs wb, which is separate from the user provided index. With that move the PDE PAT index selection under driver control. For now just use a coherent index on platforms with page-tables that are cached on host side, and incoherent otherwise. Using a coherent index could potentially be expensive, and would be overkill if we know the page-table is always uncached on host side. v2 (Stuart): - Add some documentation and split into separate helper. BSpec: 59510 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250808103455.462424-2-matthew.auld@intel.com	2025-08-11 17:41:02 +01:00
Satyanarayana K V P	9f8aa0bcd1	drm/xe/vf: Refactor CCS save/restore to use default migration context Previously, CCS save/restore operations created separate migration contexts with new VM memory allocations, resulting in significant overhead. This commit eliminates redundant context creation reusing the default migration context by registering new execution queues for CCS save and restore on the existing migrate VM. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250808073628.32745-2-satyanarayana.k.v.p@intel.com	2025-08-08 10:29:37 -07:00
Matthew Auld	9b7ca35ed2	drm/xe/migrate: prevent potential UAF If we hit the error path, the previous fence (if there is one) has already been put() prior to this, so doing a fence_wait could lead to UAF. Tweak the flow to do to the put() until after we do the wait. Fixes: `270172f64b` ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-8-matthew.auld@intel.com	2025-08-07 16:59:22 +01:00
Matthew Auld	8c2d61e0e9	drm/xe/migrate: don't overflow max copy size With non-page aligned copy, we need to use 4 byte aligned pitch, however the size itself might still be close to our maximum of ~8M, and so the dimensions of the copy can easily exceed the S16_MAX limit of the copy command leading to the following assert: xe 0000:03:00.0: [drm] Assertion `size / pitch <= ((s16)(((u16)~0U) >> 1))` failed! platform: BATTLEMAGE subplatform: 1 graphics: Xe2_HPG 20.01 step A0 media: Xe2_HPM 13.01 step A1 tile: 0 VRAM 10.0 GiB GT: 0 type 1 WARNING: CPU: 23 PID: 10605 at drivers/gpu/drm/xe/xe_migrate.c:673 emit_copy+0x4b5/0x4e0 [xe] To fix this account for the pitch when calculating the number of current bytes to copy. Fixes: `270172f64b` ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-7-matthew.auld@intel.com	2025-08-07 16:59:22 +01:00
Matthew Auld	38b34e928a	drm/xe/migrate: prevent infinite recursion If the buf + offset is not aligned to XE_CAHELINE_BYTES we fallback to using a bounce buffer. However the bounce buffer here is allocated on the stack, and the only alignment requirement here is that it's naturally aligned to u8, and not XE_CACHELINE_BYTES. If the bounce buffer is also misaligned we then recurse back into the function again, however the new bounce buffer might also not be aligned, and might never be until we eventually blow through the stack, as we keep recursing. Instead of using the stack use kmalloc, which should respect the power-of-two alignment request here. Fixes a kernel panic when triggering this path through eudebug. v2 (Stuart): - Add build bug check for power-of-two restriction - s/EINVAL/ENOMEM/ Fixes: `270172f64b` ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-6-matthew.auld@intel.com	2025-08-07 16:59:18 +01:00
Francois Dugast	321d420325	drm/xe/migrate: Populate struct drm_pagemap_addr array Workaround to ensure all addresses are populated in the array as this is expected when creating the copy batch. This is required because the migrate layer does not support 2MB GPU pages yet. A proper fix will come in a follow-up. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250805140028.599361-6-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>	2025-08-06 13:35:05 +02:00
Francois Dugast	f35a6cdf8a	drm/pagemap: Use struct drm_pagemap_addr in mapping and copy functions This struct embeds more information than just the DMA address. This will help later to support folio orders greater than zero. At this point, there is no functional change as the only struct member used is addr. In Xe, adapt to the new drm_gpusvm_devmem_ops type signatures using struct drm_pagemap_addr, as well as the internal xe SVM functions implementing those operations. The use of this struct is propagated to xe_migrate as it makes indexed accesses to the next DMA address but they are no longer contiguous. v2: - Rename drm_pagemap_device_addr to drm_pagemap_addr (Matthew Brost) - Squash with patch for Xe (Matthew Brost) - Set proto and dir for completeness (Matthew Brost) - Assess DMA map protocol (Matthew Brost) Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250805140028.599361-3-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>	2025-08-06 13:34:42 +02:00
Satyanarayana K V P	a843b98947	drm/xe/vf: Fix VM crash during VF driver release The VF CCS save/restore series (patchwork #149108) has a dependency on the migration framework. A recent migration update in commit `d65ff1ec85` ("drm/xe: Split xe_migrate allocation from initialization") caused a VM crash during XE driver release for iGPU devices. Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b83: 0000 [#1] SMP NOPTI RIP: 0010:xe_lrc_ring_head+0x12/0xb0 [xe] Call Trace: xe_sriov_vf_ccs_fini+0x1e/0x40 [xe] devm_action_release+0x12/0x30 release_nodes+0x3a/0x120 devres_release_all+0x96/0xd0 device_unbind_cleanup+0x12/0x80 device_release_driver_internal+0x23a/0x280 device_release_driver+0x12/0x20 pci_stop_bus_device+0x69/0x90 pci_stop_and_remove_bus_device+0x12/0x30 pci_iov_remove_virtfn+0xbd/0x130 sriov_disable+0x42/0x100 pci_disable_sriov+0x34/0x50 xe_pci_sriov_configure+0xf71/0x1020 [xe] Update the VF CCS migration initialization sequence to align with the new migration framework changes, resolving the release-time crash. Fixes: `f3009272ff` ("drm/xe/vf: Create contexts for CCS read write") Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250729120720.13990-1-satyanarayana.k.v.p@intel.com	2025-07-29 22:05:14 -07:00
Matthew Brost	dba89840a9	drm/xe: Add GT TLB invalidation jobs Add GT TLB invalidation jobs which issue GT TLB invalidations. Built on top of Xe generic dependency scheduler. v2: - Fix checkpatch v3: - Fix kernel doc in xe_gt_tlb_inval_job_alloc_dep, xe_gt_tlb_inval_job_push - Use IS_ERR_OR_NULL in xe_gt_tlb_inval_job_put - Squash migrate lock / unlock helpers into this patch (Stuart) Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-6-matthew.brost@intel.com	2025-07-24 18:27:22 -07:00

1 2 3 4

170 Commits