drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset

Xe can skip the reset if TDR has fired before the free job worker and can
also re-arm the timeout timer in some scenarios. Instead of manipulating
scheduler's internals, inform the scheduler that the job did not actually
timeout and no reset was performed through the new status code
DRM_GPU_SCHED_STAT_NO_HANG.

Note that, in the first case, there is no need to restart submission if it
hasn't been stopped.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-7-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
This commit is contained in:
Maíra Canal 2025-07-14 19:07:08 -03:00
parent 8902c2b17a
commit 53dcd0eaa2
No known key found for this signature in database
GPG Key ID: 3FF30E8A7688FAAA

View File

@ -1092,12 +1092,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
* list so job can be freed and kick scheduler ensuring free job is not
* lost.
*/
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) {
xe_sched_add_pending_job(sched, job);
xe_sched_submission_start(sched);
return DRM_GPU_SCHED_STAT_RESET;
}
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags))
return DRM_GPU_SCHED_STAT_NO_HANG;
/* Kill the run_job entry point */
xe_sched_submission_stop(sched);
@ -1275,10 +1271,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
* but there is not currently an easy way to do in DRM scheduler. With
* some thought, do this in a follow up.
*/
xe_sched_add_pending_job(sched, job);
xe_sched_submission_start(sched);
return DRM_GPU_SCHED_STAT_RESET;
return DRM_GPU_SCHED_STAT_NO_HANG;
}
static void __guc_exec_queue_fini_async(struct work_struct *w)