mirror of
https://github.com/torvalds/linux.git
synced 2026-06-03 03:53:37 +02:00
drm/i915/guc: Handle race condition where wakeref count drops below 0
There is a rare race condition when preparing for a reset where guc_lrc_desc_unpin() could be in the process of deregistering a context while a different thread is scrubbing outstanding contexts and it alters the context state and does a wakeref put. Then, if there is a failure with deregister_context(), a second wakeref put could occur. As a result the wakeref count could drop below 0 and fail an INTEL_WAKEREF_BUG_ON() check. Therefore if there is a failure with deregister_context(), undo the context state changes and do a wakeref put only if the context was set to be destroyed earlier. v2: Expand comment to better explain change. (Daniele) v3: Removed addition to the original comment. (Daniele) Fixes:2f2cc53b5f("drm/i915/guc: Close deregister-context race against CT-loss") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Mousumi Jana <mousumi.jana@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250528230551.1855177-1-jesus.narvaez@intel.com (cherry picked from commitf36a75aba1) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This commit is contained in:
parent
57d63c6cd0
commit
0323a5127e
|
|
@ -3443,18 +3443,29 @@ static inline int guc_lrc_desc_unpin(struct intel_context *ce)
|
|||
* GuC is active, lets destroy this context, but at this point we can still be racing
|
||||
* with suspend, so we undo everything if the H2G fails in deregister_context so
|
||||
* that GuC reset will find this context during clean up.
|
||||
*
|
||||
* There is a race condition where the reset code could have altered
|
||||
* this context's state and done a wakeref put before we try to
|
||||
* deregister it here. So check if the context is still set to be
|
||||
* destroyed before undoing earlier changes, to avoid two wakeref puts
|
||||
* on the same context.
|
||||
*/
|
||||
ret = deregister_context(ce, ce->guc_id.id);
|
||||
if (ret) {
|
||||
bool pending_destroyed;
|
||||
spin_lock_irqsave(&ce->guc_state.lock, flags);
|
||||
set_context_registered(ce);
|
||||
clr_context_destroyed(ce);
|
||||
pending_destroyed = context_destroyed(ce);
|
||||
if (pending_destroyed) {
|
||||
set_context_registered(ce);
|
||||
clr_context_destroyed(ce);
|
||||
}
|
||||
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
|
||||
/*
|
||||
* As gt-pm is awake at function entry, intel_wakeref_put_async merely decrements
|
||||
* the wakeref immediately but per function spec usage call this after unlock.
|
||||
*/
|
||||
intel_wakeref_put_async(>->wakeref);
|
||||
if (pending_destroyed)
|
||||
intel_wakeref_put_async(>->wakeref);
|
||||
}
|
||||
|
||||
return ret;
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user