drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails

Add a fallback mechanism to attempt pipe reset when KCQ reset
fails to recover the ring. After performing the KCQ reset and
queue remapping, test the ring functionality. If the ring test
fails, initiate a pipe reset as an additional recovery step.

v2: fix the typo (Lijo)
v3: try pipeline reset when kiq mapping fails (Lijo)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Jesse.Zhang 2025-09-16 13:11:06 +08:00 committed by Alex Deucher
parent 4c709ccc47
commit 7469567d88

View File

@ -3560,6 +3560,7 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
struct amdgpu_device *adev = ring->adev;
struct amdgpu_kiq *kiq = &adev->gfx.kiq[ring->xcc_id];
struct amdgpu_ring *kiq_ring = &kiq->ring;
int reset_mode = AMDGPU_RESET_TYPE_PER_QUEUE;
unsigned long flags;
int r;
@ -3597,6 +3598,7 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
if (!(adev->gfx.compute_supported_reset & AMDGPU_RESET_TYPE_PER_PIPE))
return -EOPNOTSUPP;
r = gfx_v9_4_3_reset_hw_pipe(ring);
reset_mode = AMDGPU_RESET_TYPE_PER_PIPE;
dev_info(adev->dev, "ring: %s pipe reset :%s\n", ring->name,
r ? "failed" : "successfully");
if (r)
@ -3619,10 +3621,20 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring *ring,
r = amdgpu_ring_test_ring(kiq_ring);
spin_unlock_irqrestore(&kiq->ring_lock, flags);
if (r) {
if (reset_mode == AMDGPU_RESET_TYPE_PER_QUEUE)
goto pipe_reset;
dev_err(adev->dev, "fail to remap queue\n");
return r;
}
if (reset_mode == AMDGPU_RESET_TYPE_PER_QUEUE) {
r = amdgpu_ring_test_ring(ring);
if (r)
goto pipe_reset;
}
return amdgpu_ring_reset_helper_end(ring, timedout_fence);
}