drm/xe/vf: Requeue recovery on GuC MIGRATION error during VF post-migration

Handle GuC response `XE_GUC_RESPONSE_VF_MIGRATED` as a special case in the
VF post-migration recovery flow. When this error occurs, it indicates that
a new migration was detected while the resource fixup process was still in
progress. Instead of failing immediately, requeue the VF into the recovery
path to allow proper handling of the new migration event.

This improves robustness of VF recovery in SR-IOV environments where
migrations can overlap with resource fixup steps.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251201095011.21453-9-satyanarayana.k.v.p@intel.com
This commit is contained in:
Satyanarayana K V P 2025-12-01 15:20:15 +05:30 committed by Michal Wajdeczko
parent b5fbb94341
commit 75e7d26281
2 changed files with 9 additions and 0 deletions

View File

@ -1268,6 +1268,9 @@ static void vf_post_migration_recovery(struct xe_gt *gt)
err = vf_post_migration_resfix_done(gt, marker);
if (err) {
if (err == -EREMCHG)
goto queue;
xe_gt_sriov_err(gt, "Recovery failed at GuC RESFIX_DONE step (%pe)\n",
ERR_PTR(err));
goto fail;

View File

@ -1484,6 +1484,12 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
u32 hint = FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, header);
u32 error = FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, header);
if (unlikely(error == XE_GUC_RESPONSE_VF_MIGRATED)) {
xe_gt_dbg(gt, "GuC mmio request %#x rejected due to MIGRATION (hint %#x)\n",
request[0], hint);
return -EREMCHG;
}
xe_gt_err(gt, "GuC mmio request %#x: failure %#x hint %#x\n",
request[0], error, hint);
return -ENXIO;