linux

mirror of https://github.com/torvalds/linux.git synced 2026-06-02 19:43:40 +02:00

Author	SHA1	Message	Date
Tejas Upadhyay	4f39a194d4	drm/xe/xe3p_lpg: Restrict UAPI to enable L2 flush optimization When set, starting xe3p_lpg, the L2 flush optimization feature will control whether L2 is in Persistent or Transient mode through monitoring of media activity. To enable L2 flush optimization include new feature flag GUC_CTL_ENABLE_L2FLUSH_OPT for Novalake platforms when media type is detected. Tighten UAPI validation to restrict userptr, svm and dmabuf mappings to be either 2WAY or XA+1WAY V5(Thomas): logic correction V4(MattA): Modify uapi doc and commit V3(MattA): check valid op and pat_index value V2(MattA): validate dma-buf bos and madvise pat-index Acked-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: Carl Zhang <carl.zhang@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260305121902.1892593-9-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2026-03-23 15:24:14 +05:30
Matthew Brost	a6ab444a11	drm/xe: Forcefully tear down exec queues in GuC submit fini In GuC submit fini, forcefully tear down any exec queues by disabling CTs, stopping the scheduler (which cleans up lost G2H), killing all remaining queues, and resuming scheduling to allow any remaining cleanup actions to complete and signal any remaining fences. Split guc_submit_fini into device related and software only part. Using device-managed and drm-managed action guarantees the correct ordering of cleanup. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-3-zhanjun.dong@intel.com	2026-03-13 18:04:02 -07:00
Daniele Ceraolo Spurio	c85ec5c575	drm/xe/guc: Fail immediately on GuC load error By using the same variable for both the return of poll_timeout_us and the return of the polled function guc_wait_ucode, the return value of the latter is overwritten and lost after exiting the polling loop. Since guc_wait_ucode returns -1 on GuC load failure, we lose that information and always continue as if the GuC had been loaded correctly. This is fixed by simply using 2 separate variables. Fixes: `a4916b4da4` ("drm/xe/guc: Refactor GuC load to use poll_timeout_us()") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260303001732.2540493-2-daniele.ceraolospurio@intel.com	2026-03-13 11:42:35 -07:00
Vinay Belgaumkar	b458597544	drm/xe: Don't disable GuCRC in suspend path GuCRC should not be disabled in xe_guc_stop_prepare() as C6 is a prerequisite for s0ix and s2idle. This is a regression caused by the patch below. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7510 Fixes: `40a684f91d` ("drm/xe: Decouple GuC RC code from xe_guc_pc") Cc: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patch.msgid.link/20260303181416.3880937-1-vinay.belgaumkar@intel.com	2026-03-06 10:59:43 -08:00
Matt Roper	4405938293	drm/xe/pvc: Drop pre-prod workarounds Production PVC hardware had a graphics stepping of C0. Xe1 platforms already aren't officially supported by the Xe driver, but pre-production steppings are especially out of scope (and 'has_pre_prod_wa' is not set in the device descriptor). Drop the workarounds that aren't relevant to production hardware. v2: - Drop the stream->override_gucrc which is no longer set anywhere after the removal of Wa_1509372804. (Bala) - Drop xe_guc_rc_set_mode / xe_guc_rc_unset_mode which are no longer used after the removal of Wa_1509372804. Bspec: 44484 Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patch.msgid.link/20260220-forupstream-wa_cleanup-v2-2-b12005a05af6@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-23 15:43:35 -08:00
Sk Anirban	c57db41b8d	drm/xe/guc: Add Wa_14025883347 for GuC DMA failure on reset Prevent GuC firmware DMA failures during GuC-only reset by disabling idle flow and verifying SRAM handling completion. Without this, reset can be issued while SRAM handler is copying WOPCM to SRAM, causing GuC HW to get stuck. v2: Modify error message (Badal) Rename reg bit name (Daniele) Update WA skip condition (Daniele) Update SRAM handling logic (Daniele) v3: Reorder WA call (Badal) Wait for GuC ready status (Daniele) v4: Update reg name (Badal) Add comment (Daniele) Add extended graphics version (Daniele) Modify rules Signed-off-by: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260202105313.3338094-4-sk.anirban@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-02-09 12:00:16 -08:00
Vinay Belgaumkar	40a684f91d	drm/xe: Decouple GuC RC code from xe_guc_pc Move enable/disable GuC RC logic into the new file. This will allow us to independently enable/disable GuC RC and not rely on SLPC related functions. GuC already provides separate H2G interfaces to setup GuC RC and SLPC. Cc: Riana Tauro <riana.tauro@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260204014234.2867763-2-vinay.belgaumkar@intel.com	2026-02-05 14:17:30 -08:00
Michal Wajdeczko	65b9886062	drm/xe/guc: Allow second H2G retry on FLR During VF FLR the scratch registers could be cleared both by the GuC and by the PF driver. Allow to retry more times once we find out that the HXG header was cleared and wait at least 256ms before resending the same message again to the GuC. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260127193727.601-7-michal.wajdeczko@intel.com	2026-02-02 22:35:46 +01:00
Michal Wajdeczko	e116fd5c60	drm/xe/guc: Wait before retrying sending H2G We shall resend H2G message after receiving NO_RESPONSE_RETRY reply, but since GuC dropped that H2G due to some interim state, we should give it a little time to stabilize. Wait before sending the same H2G again, start with 1ms delay, then increase exponentially to 256ms. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260127193727.601-6-michal.wajdeczko@intel.com	2026-02-02 22:35:45 +01:00
Michal Wajdeczko	09b45fd9d3	drm/xe/guc: Drop redundant register read The xe_mmio_wait32() already returns the last value of the register for which we were waiting, there is no need read it again. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260127193727.601-5-michal.wajdeczko@intel.com	2026-02-02 22:35:44 +01:00
Daniele Ceraolo Spurio	dd8ea2f2ab	drm/xe/guc: Fix CFI violation in debugfs access. xe_guc_print_info is void-returning, but the function pointer it is assigned to expects an int-returning function, leading to the following CFI error: [ 206.873690] CFI failure at guc_debugfs_show+0xa1/0xf0 [xe] (target: xe_guc_print_info+0x0/0x370 [xe]; expected type: 0xbe3bc66a) Fix this by updating xe_guc_print_info to return an integer. Fixes: `e15826bb3c` ("drm/xe/guc: Refactor GuC debugfs initialization") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: George D Sworo <george.d.sworo@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129182547.32899-2-daniele.ceraolospurio@intel.com	2026-02-02 09:12:56 -08:00
Michal Wajdeczko	d043b95983	drm/xe/vf: Reset VF GuC state on fini Unlike native/PF driver, which was explicitly triggering full GuC reset during driver unwind, the VF driver was not notifying GuC that it is about to unwind, and this could lead GuC to access stale data, which in turn could be interpreted as VF's malicious activity. Add managed action to send to GuC VF_RESET message during GT unwind. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260122151924.3726-1-michal.wajdeczko@intel.com	2026-01-27 21:26:38 +01:00
Daniele Ceraolo Spurio	8d87fa1916	drm/xe/gt: Add engine masks for each class Follow up patches will need the engine masks for VCS and VECS engines. Since we already have a macro for the CCS engines, just extend the same approach to all classes. To avoid confusion with the XE_HW_ENGINE_*_MASK masks, the new macros use the _INSTANCES suffix instead. For consistency, rename CCS_MASK to CCS_INSTANCES as well. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251218223846.1146344-15-daniele.ceraolospurio@intel.com	2025-12-22 10:21:56 -08:00
Michal Wajdeczko	47f5cee419	drm/xe/guc: Fix version check for page-reclaim feature Page reclamation interfaces were introduced in GuC firmware version 70.31.0 (which corresponds to GuC ABI version 1.14.0), but since this feature is also available for the VFs and VFs don't know the firmware version, use GuC compatibility version check instead. Fixes: `77ebc7c10d` ("drm/xe/guc: Add page reclamation interface to GuC") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Brian Nguyen <brian3.nguyen@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251215170433.196398-1-michal.wajdeczko@intel.com	2025-12-15 13:33:55 -08:00
Brian Nguyen	77ebc7c10d	drm/xe/guc: Add page reclamation interface to GuC Add page reclamation related changes to GuC interface, handlers, and senders to support page reclamation. Currently TLB invalidations will perform an entire PPC flush in order to prevent stale memory access for noncoherent system memory. Page reclamation is an extension of the typical TLB invalidation workflow, allowing disabling of full PPC flush and enable selective PPC flushing. Selective flushing will be decided by a list of pages whom's address is passed to GuC at time of action. Page reclamation interfaces require at least GuC FW ver 70.31.0. v2: - Moved send_page_reclaim to first patch usage. - Add comments explaining shared done handler. (Matthew B) - Add FW version fallback to disable page reclaim on older versions. (Matthew B, Shuicheng) Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251212213225.3564537-16-brian3.nguyen@intel.com	2025-12-12 16:59:09 -08:00
Satyanarayana K V P	75e7d26281	drm/xe/vf: Requeue recovery on GuC MIGRATION error during VF post-migration Handle GuC response `XE_GUC_RESPONSE_VF_MIGRATED` as a special case in the VF post-migration recovery flow. When this error occurs, it indicates that a new migration was detected while the resource fixup process was still in progress. Instead of failing immediately, requeue the VF into the recovery path to allow proper handling of the new migration event. This improves robustness of VF recovery in SR-IOV environments where migrations can overlap with resource fixup steps. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251201095011.21453-9-satyanarayana.k.v.p@intel.com	2025-12-02 16:17:25 +01:00
Raag Jadav	43fb9e113b	drm/xe/gt: Introduce runtime suspend/resume If power state is retained between suspend/resume cycle, we don't need to perform full GT re-initialization. Introduce runtime helpers for GT which greatly reduce suspend/resume delay. v2: Drop redundant xe_gt_sanitize() and xe_guc_ct_stop() (Daniele) Use runtime naming for guc helpers (Daniele) v3: Drop redundant logging, add kernel doc (Michal) Use runtime naming for ct helpers (Michal) v4: Fix tags (Rodrigo) v5: Include host_l2_vram workaround (Daniele) Reuse xe_guc_submit_enable/disable() helpers (Daniele) Co-developed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Raag Jadav <raag.jadav@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20251030122357.128825-5-raag.jadav@intel.com	2025-11-27 09:05:28 -08:00
Zhanjun Dong	f4c8298cf5	drm/xe/guc: Cleanup GuC log buffer macros and helpers Cleanup GuC log buffer macros and helpers, add Xe style macro prefix. Update buffer type values to align with the GuC specification Update buffer offset calculation. Remove helper functions, replaced with macros. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20251105233143.1168759-1-zhanjun.dong@intel.com	2025-11-24 10:50:07 -08:00
Matt Roper	3947e482b5	drm/xe/guc: Use scope-based cleanup Use scope-based cleanup for forcewake and runtime PM. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20251118164338.3572146-34-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-11-19 11:58:57 -08:00
Michał Winiarski	d608fbf400	drm/xe/pf: Increase PF GuC Buffer Cache size and use it for VF migration Contiguous PF GGTT VMAs can be scarce after creating VFs. Increase the GuC buffer cache size to 8M for PF so that we can fit GuC migration data (which currently maxes out at just over 4M) and use the cache instead of allocating fresh BOs. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251112132220.516975-13-michal.winiarski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-11-13 11:48:19 +01:00
Balasubramani Vivekanandan	e681ddca30	drm/xe/xe3p_lpm: Add special check in Media GT for Main GAMCTRL For Xe3p arch some subunits of an IP may be different. The GMD_ID register returns the Xe3p arch and dedicates the reserved field to mark possible subunit differences. Generally this is an under-the-hood implementation detail that drivers don't need to worry about, but the new Main_GAMCTRL may be enabled or not depending on those. Those reserved bits are described for Xe3p as: "If Zero, No special case to be handled. If Non-Zero, special case to be handled by Software agent.". That special case is defined per Arch. So if media version is 35, also check the additional reserved bits. To avoid confusion with the usual meaning of "reserved", define them as GMD_ID_SUBIP_FLAG_MASK. Bspec: 74201 Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251019-xe3p-gamctrl-v1-2-ad66d3c1908f@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-20 17:21:11 -07:00
Brian Welty	94edd65186	drm/xe/xe3p_lpm: Configure MAIN_GAMCTRL_QUEUE_SELECT Starting from Xe3p, there are two different copies of some of the GAM registers: the traditional MCR variant at their old locations, and a new unicast copy known as "main_gamctrl." The Xe driver doesn't use these registers directly, but we need to instruct the GuC on which set it should use. Since the new, unicast registers are preferred (since they avoid the need for unnecessary MCR synchronization), set a new GuC feature flag, GUC_CTL_MAIN_GAMCTRL_QUEUES to convey this decision. A new helper function, xe_guc_using_main_gamctrl_queues(), is added for use in the 3 independent places that need to handle configuration of the new reporting queues. The mmio write to enable the main gamctl is only done during the general GuC upload. The gamctrl registers are not accessed by the GuC during hwconfig load. Last, the ADS blob for communicating the queue addresses contains both a DPA and GGTT offset. The GuC documentation states that DPA is now MBZ when using the MAIN_GAMCTRL queues. Bspec: 76445, 73540 Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251019-xe3p-gamctrl-v1-1-ad66d3c1908f@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-20 17:21:11 -07:00
Satyanarayana K V P	60e2667557	drm/xe/guc: Increase wait timeout to 2sec after BUSY reply from GuC Some VF2GUC actions may take longer to process. Increase default timeout after received BUSY indication to 2sec to cover all worst case scenarios. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251008214532.3442967-35-matthew.brost@intel.com	2025-10-09 03:24:28 -07:00
Lucas De Marchi	a4916b4da4	drm/xe/guc: Refactor GuC load to use poll_timeout_us() Currently there are 2 wait loops for loading GuC: one in xe_mmio_wait32_not() and one guc_wait_ucode(). Now that there's a generic poll_timeout_us(), refactor the code to use that to be more readable. Main change in behavior is that there's no exponential wait anymore: that is now replaced by a 10msec retry. Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-5-06438311a63f@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-24 21:23:19 -07:00
Lucas De Marchi	abde96d844	drm/xe/guc: Extract function to print load error Move the error parsing and print out of guc_wait_ucode() into a helper to clean up the wait function. Since now the `load_done != 1` condition has a return statement, also simplify the if/else chain. Reviewed-by: John Harrison <John.C.Harrison@Intel.com> # v2 Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-4-06438311a63f@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-24 21:23:19 -07:00
Lucas De Marchi	2a16f47dcc	drm/xe/guc: Drop helper to read freq As the forcewake is already held during GuC load, there's no need to use a helper function to call xe_guc_pc_get_cur_freq(). Just call xe_guc_pc_get_cur_freq_fw() directly. Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-3-06438311a63f@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-24 21:23:19 -07:00
John Harrison	3b09b11805	drm/xe/guc: Return an error code if the GuC load fails Due to multiple explosion issues in the early days of the Xe driver, the GuC load was hacked to never return a failure. That prevented kernel panics and such initially, but now all it achieves is creating more confusing errors when the driver tries to submit commands to a GuC it already knows is not there. So fix that up. As a stop-gap and to help with debug of load failures due to invalid GuC init params, a wedge call had been added to the inner GuC load function. The reason being that it leaves the GuC log accessible via debugfs. However, for an end user, simply aborting the module load is much cleaner than wedging and trying to continue. The wedge blocks user submissions but it seems that various bits of the driver itself still try to submit to a dead GuC and lots of subsequent errors occur. And with regards to developers debugging why their particular code change is being rejected by the GuC, it is trivial to either add the wedge back in and hack the return code to zero again or to just do a GuC log dump to dmesg. v2: Add support for error injection testing and drop the now redundant wedge call. CC: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250909224132.536320-1-John.C.Harrison@Intel.com	2025-09-16 12:11:08 -07:00
John Harrison	456b32c9c1	drm/xe/guc: Add test for G2G communications Add a test for sending messages from every GuC to every other GuC to test G2G communications. Note that, being a debug only feature, the test interface only exists in pre-production builds of the GuC firmware. v2: Fix 'default' case to actually use the driver's registration code as well as allocation. Add comments explaining the different test types. Fix (C) date and an assert. Review feedback from Daniele. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-5-John.C.Harrison@Intel.com	2025-09-15 09:53:26 -07:00
Daniele Ceraolo Spurio	8843444843	drm/xe/guc: Set RCS/CCS yield policy All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval. v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize Fixes: `d9a1ae0d17` ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com	2025-09-11 09:45:35 -07:00
John Harrison	0b05857dc1	drm/xe/guc: Clean up of GuC 'CTL' defines All the field generation for the CTL defines (used for GuC init data) were hand-rolled rather than using FIELD_PREP/REG_GENMASK/BIT macros. Also, there were a bunch of macros defined for verbosity settings that were never used. So fix that all up. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250904195752.3846138-2-John.C.Harrison@Intel.com	2025-09-05 13:31:32 -07:00
Satyanarayana K V P	ee4b32220a	drm/xe/guc: Add devm release action to safely tear down CT When a buffer object (BO) is allocated with the XE_BO_FLAG_GGTT_INVALIDATE flag, the driver initiates TLB invalidation requests via the CTB mechanism while releasing the BO. However a premature release of the CTB BO can lead to system crashes, as observed in: Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:h2g_write+0x2f3/0x7c0 [xe] Call Trace: guc_ct_send_locked+0x8b/0x670 [xe] xe_guc_ct_send_locked+0x19/0x60 [xe] send_tlb_invalidation+0xb4/0x460 [xe] xe_gt_tlb_invalidation_ggtt+0x15e/0x2e0 [xe] ggtt_invalidate_gt_tlb.part.0+0x16/0x90 [xe] ggtt_node_remove+0x110/0x140 [xe] xe_ggtt_node_remove+0x40/0xa0 [xe] xe_ggtt_remove_bo+0x87/0x250 [xe] Introduce a devm-managed release action during xe_guc_ct_init() and xe_guc_ct_init_post_hwconfig() to ensure proper CTB disablement before resource deallocation, preventing the use-after-free scenario. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Summers Stuart <stuart.summers@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250901072541.31461-1-satyanarayana.k.v.p@intel.com	2025-09-02 08:21:58 +02:00
Vinay Belgaumkar	95b3899b4d	drm/xe/psmi: Add Wa_16023683509 This WA ensures GuC will restore the media MCFG registers at C6 exit. Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-5-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:44 -07:00
Lucas De Marchi	efeb036ffd	drm/xe/psmi: Add GuC flag to enable PSMI PSMI allows to capture data from the GPU useful for early validation. From the kernel side there isn't much to be done, just a few things: 1) Toggle the feature support in GuC 2) Enable some additional WAs 3) Allocate buffers Here is the first step, with the next ones to follow. For now everything is disabled through a check in configfs that is currently hardcoded to disabled. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-1-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:43 -07:00
Matt Atwood	4d5c98eb77	drm/xe: rename XE_WA to XE_GT_WA Now that there are two types of wa tables and infrastructure, be more concise in the naming of GT wa macros. v2: update the documentation Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250807214224.32728-1-matthew.s.atwood@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-08 10:50:45 -04:00
John Harrison	45fbb51050	drm/xe/guc: Add more GuC load error status codes The GuC load process will abort if certain status codes (which are indicative of a fatal error) are reported. Otherwise, it keeps waiting until the 'success' code is returned. New error codes have been added in recent GuC releases, so add support for aborting on those as well. v2: Shuffle HWCONFIG_START to the front of the switch to keep the ordering as per the enum define for clarity (review feedback by Jonathan). Also add a description for the basic 'invalid init data' code which was missing. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250726024337.4056272-1-John.C.Harrison@Intel.com	2025-07-28 14:38:44 -07:00
Zhanjun Dong	b2c4ac219f	drm/xe/uc: Disable GuC communication on hardware initialization error Disable GuC communication on Xe micro controller hardware initialization error. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4917 Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250707231108.3217573-1-zhanjun.dong@intel.com	2025-07-08 14:56:49 -07:00
Matthew Brost	ec9223b49a	drm/xe: Drop bo->size bo->size is redundant because the base GEM object already has a size field with the same value. Drop bo->size and use the base GEM object’s size instead. While at it, introduce xe_bo_size() to abstract the BO size. v2: - Fix typo in kernel doc (Ashutosh) - Fix kunit (CI) - Fix line wrap (Checkpatch) v3: - Fix sriov build (CI) v4: - Fix display build (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com	2025-06-27 14:52:31 -07:00
Daniele Ceraolo Spurio	9c7d93a8f1	drm/xe/guc: Enable the Dynamic Inhibit Context Switch optimization The Dynamic Inhibit Context Switch is an optimization aimed at reducing the amount of time the HW is stuck waiting on an unsatisfied semaphore. When this optimization is enabled, the GuC will dynamically modify the CTX_CTRL_INHIBIT_SYN_CTX_SWITCH in the CTX_CONTEXT_CONTROL register of LRCs to enable immediate switching out on an unsatisfied semaphore wait when multiple contexts are competing for time on the same engine. This feature is available on recent HW from GuC 70.40.1 onwards and it is enabled via a per-VF feature opt-in. v2: rebase v3: switch to using guc_buf_cache instead of dedicated alloc v4: add helper to check for feature availability (Michal), don't enable if multi-lrc is possible. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250625205405.1653212-4-daniele.ceraolospurio@intel.com	2025-06-27 11:08:50 -07:00
Daniele Ceraolo Spurio	a7ffcea863	drm/xe/guc: Enable extended CAT error reporting On newer HW (Xe2 onwards + PVC) it is possible to get extra information when a CAT error occurs, specifically a dword reporting the error type. To enable this extra reporting, we need to opt-in with the GuC, which is done via a specific per-VF feature opt-in H2G. On platforms where the HW does not support the extra reporting, the GuC will set the type to 0xdeadbeef, so we can keep the code simple and opt-in to the feature on every platform and then just discard the data if it is invalid. Note that on native/PF we're guaranteed that the opt in is available because we don't support any GuC old enough to not have it, but if we're a VF we might be running on a non-XE PF with an older GuC, so we need to handle that case. We can re-use the invalid type above to handle this scenario the same way as if the feature was not supported in HW. Given that this patch is the first user of the guc_buf_cache on native and VF, it also extends that feature to non-PF use-cases. v2: simpler print for the error type (John), rebase v3: use guc_buf_cache instead of new alloc, simpler doc (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@intel.com	2025-06-27 11:08:40 -07:00
Maarten Lankhorst	a42939ee86	drm/xe: Remove xe_uc_fini_hw xe_uc_init_hw() is called multiple times from xe_gt.c, and that makes the name xe_uc_fini_hw(), called for a different reason in xe_guc.c confusing. Remove it and inline the xe_uc_sanitize_reset into xe_guc.c directly. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-23-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:11:35 +02:00
Maarten Lankhorst	396044c9d8	drm/xe: Simplify GuC early initialization Add a 2-stage GuC init. An early one for everything that is needed for VF, and a full one that loads GuC and is allowed to do allocations. Link: https://lore.kernel.org/r/20250619104858.418440-16-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-06-26 22:10:30 +02:00
Maarten Lankhorst	b3412d7233	drm/xe/sriov: Move VF bootstrap and query_config to vf_guc_init We want to split up GUC init to an alloc and noalloc part to keep the init path the same for VF and !VF as much as possible. Everything in vf_guc_init should be done as early as possible, otherwise VRAM probing becomes impossible. Also move xe_gt_mmio_init to the end of xe_gt_init_early(), cleaning up the init in xe_device slightly. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250619104858.418440-15-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-06-26 22:07:44 +02:00
Daniele Ceraolo Spurio	90f4d3f756	drm/xe/vf: Boostrap all GTs immediately after MMIO init Currently we perform the bootstrap for the primary GT early on during device init, while the media GT bootstrap happens when we try and fetch the hwconfig table. For consistency, move the bootstrap of the media GT happen at the same time as the primary GT, so that all the subsequent code can rely on both GTs being in the same state. v2: Also drop config query from min_guc_load since we now do it early (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250603235432.720833-8-daniele.ceraolospurio@intel.com	2025-06-06 08:33:18 -07:00
Michal Wajdeczko	d65650a9d1	drm/xe/guc: Resend potentially lost H2G MMIO request There could be a scenario where the VF driver is resuming faster than the driver PF is able to complete the VF FLR sequence which includes reset of the VF scratch registers. This may result in deletion of the ongoing HXG message (it could be either a host request or a GuC response). When we detect that HXG message was likey lost (scratch register with HXG header was zeroed) try to send this request once more before giving up. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250528090021.329-1-michal.wajdeczko@intel.com	2025-06-02 19:22:03 +02:00
Michal Wajdeczko	b86babc9d9	drm/xe/guc: Unblock GuC buffer cache for all modes Today we were using GuC buffer cache only in the PF mode, but shortly we will want to use it also in native and VF mode. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250512220018.172-2-michal.wajdeczko@intel.com	2025-05-15 12:29:54 +02:00
Daniele Ceraolo Spurio	dba7d17d50	drm/xe/vf: Fix guc_info debugfs for VFs The guc_info debugfs attempts to read a bunch of registers that the VFs doesn't have access to, so fix it by skipping the reads. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4775 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://lore.kernel.org/r/20250423173908.1571412-1-daniele.ceraolospurio@intel.com	2025-04-29 13:20:57 -07:00
Satyanarayana K V P	e9dea328e8	drm/xe: Introduce fault injection for guc mmio send/recv. Fault can be injected with below steps. FAILTYPE=fail_function FAILFUNC=xe_guc_mmio_send_recv echo > /sys/kernel/debug/$FAILTYPE/inject echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject printf %#x -5 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 10 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval echo -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michał Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250403120641.7258-2-satyanarayana.k.v.p@intel.com	2025-04-17 22:11:16 +02:00
Matthew Brost	045448da87	drm/xe: Add XE_BO_FLAG_PINNED_NORESTORE Not all BOs need to be restored on resume / d3cold exit, add XE_BO_FLAG_PINNED_NO_RESTORE which skips restoring of BOs rather just allocates VRAM for the BO. This should slightly speedup resume / d3cold exit flows. Marking GuC ADS, GuC CT, GuC log, GuC PC, and SA as NORESTORE. v2: - s/WONTNEED/NORESTORE (Vivi) - Rebase on newly added g2g and backup object flow Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-11-matthew.auld@intel.com	2025-04-04 11:41:01 +01:00
Rodrigo Vivi	fc858ddf9c	drm/xe/guc_pc: Remove duplicated pc_start call xe_guc_pc_start() was getting called from both xe_uc_init_hw() and from xe_guc_start(). But both are called from do_gt_restart() and only xe_uc_init_hw() is called at initialization. So, let's remove the duplication in the regular gt_restart path. The only place where xe_guc_pc_start() won't get called now is on the gt_reset failure path. However, if gt_reset has failed, it is really unlikely that the PC start will work or is desired. Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306220643.1014049-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-03-07 11:10:05 -05:00
Riana Tauro	6978c5f5a6	drm/xe/xe_pmu: Add PMU support for engine activity PMU provides two counters (engine-active-ticks, engine-total-ticks) to calculate engine activity. When querying engine activity, user must group these 2 counters using the perf_event group mechanism to ensure both counters are sampled together. To list the events ./perf list xe_0000_03_00.0/engine-active-ticks/ [Kernel PMU event] xe_0000_03_00.0/engine-total-ticks/ [Kernel PMU event] The formats to be used with the above are engine_instance - config:12-19 engine_class - config:20-27 gt - config:60-63 The events can then be read using perf tool ./perf stat -e xe_0000_03_00.0/engine-active-ticks,gt=0, engine_class=0,engine_instance=0/, xe_0000_03_00.0/engine-total-ticks,gt=0, engine_class=0,engine_instance=0/ -I 1000 Engine activity can then be calculated as below engine activity % = (engine active ticks/engine total ticks) * 100 v2: validate gt rename total-ticks to engine-total-ticks add helper to get hwe (Umesh) v3: fix checkpatch warning add details to documentation (Umesh) remove ascii formats from documentation (Lucas) v4: remove unnecessary warn within raw_spinlock (Lucas) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-5-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 13:23:57 -08:00

1 2 3 4

161 Commits