linux/kernel/sched
John Stultz c2ae8b0df2 sched/core: Fix psi_dequeue() for Proxy Execution
Currently, if the sleep flag is set, psi_dequeue() doesn't
change any of the psi_flags.

This is because psi_task_switch() will clear TSK_ONCPU as well
as other potential flags (TSK_RUNNING), and the assumption is
that a voluntary sleep always consists of a task being dequeued
followed shortly there after with a psi_sched_switch() call.

Proxy Execution changes this expectation, as mutex-blocked tasks
that would normally sleep stay on the runqueue. But in the case
where the mutex-owning task goes to sleep, or the owner is on a
remote cpu, we will then deactivate the blocked task shortly
after.

In that situation, the mutex-blocked task will have had its
TSK_ONCPU cleared when it was switched off the cpu, but it will
stay TSK_RUNNING. Then if we later dequeue it (as currently done
if we hit a case find_proxy_task() can't yet handle, such as the
case of the owner being on another rq or a sleeping owner)
psi_dequeue() won't change any state (leaving it TSK_RUNNING),
as it incorrectly expects a psi_task_switch() call to
immediately follow.

Later on when the task get woken/re-enqueued, and psi_flags are
set for TSK_RUNNING, we hit an error as the task is already
TSK_RUNNING:

  psi: inconsistent task state! task=188:kworker/28:0 cpu=28 psi_flags=4 clear=0 set=4

To resolve this, extend the logic in psi_dequeue() so that
if the sleep flag is set, we also check if psi_flags have
TSK_ONCPU set (meaning the psi_task_switch is imminent) before
we do the shortcut return.

If TSK_ONCPU is not set, that means we've already switched away,
and this psi_dequeue call needs to clear the flags.

Fixes: be41bde4c3 ("sched: Add an initial sketch of the find_proxy_task() function")
Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Haiyue Wang <haiyuewa@163.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://patch.msgid.link/20251205012721.756394-1-jstultz@google.com
Closes: https://lore.kernel.org/lkml/20251117185550.365156-1-kprateek.nayak@amd.com/
2025-12-06 10:13:16 +01:00
..
autogroup.c sched: Clean up and standardize #if/#else/#endif markers in sched/autogroup.[ch] 2025-06-13 08:47:14 +02:00
autogroup.h sched: Clean up and standardize #if/#else/#endif markers in sched/autogroup.[ch] 2025-06-13 08:47:14 +02:00
build_policy.c sched_ext: Move internal type and accessor definitions to ext_internal.h 2025-09-03 11:33:28 -10:00
build_utility.c sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
clock.c sched: Clean up and standardize #if/#else/#endif markers in sched/clock.c 2025-06-13 08:47:14 +02:00
completion.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
core_sched.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
core.c sched/rt: Remove a preempt-disable section in rt_mutex_setprio() 2025-12-06 10:03:13 +01:00
cpuacct.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpudeadline.c sched/deadline: only set free_cpus for online runqueues 2025-10-16 11:13:49 +02:00
cpudeadline.h sched/deadline: only set free_cpus for online runqueues 2025-10-16 11:13:49 +02:00
cpufreq_schedutil.c sched: Clean up and standardize #if/#else/#endif markers in sched/cpufreq_schedutil.c 2025-06-13 08:47:15 +02:00
cpufreq.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpupri.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpupri.h sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
cputime.c seqlock: Change thread_group_cputime() to use scoped_seqlock_read() 2025-10-21 12:31:57 +02:00
deadline.c sched/deadline: Minor cleanup in select_task_rq_dl() 2025-11-11 17:27:55 +01:00
debug.c sched/eevdf: Fix min_vruntime vs avg_vruntime 2025-11-11 12:33:38 +01:00
ext_idle.c sched_ext: Merge branch 'for-6.17-fixes' into for-6.18 2025-09-23 09:10:20 -10:00
ext_idle.h sched_ext: Always use SMP versions in kernel/sched/ext_idle.h 2025-06-13 14:47:59 -10:00
ext_internal.h sched_ext: Add SCX_EFLAG_INITIALIZED to indicate successful ops.init() 2025-09-23 09:03:26 -10:00
ext.c Scheduler changes for v6.19: 2025-12-01 21:04:45 -08:00
ext.h sched_ext: Use cgroup_lock/unlock() to synchronize against cgroup operations 2025-09-03 11:36:07 -10:00
fair.c sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out 2025-12-06 10:03:13 +01:00
features.h sched/fair: Proportional newidle balance 2025-11-17 17:13:16 +01:00
idle.c sched/deadline: Fix dl_server time accounting 2025-11-11 12:33:38 +01:00
isolation.c sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any 2025-11-20 20:17:31 +01:00
loadavg.c Merge branch 'tip/sched/urgent' 2025-07-14 17:16:28 +02:00
Makefile tracing: Disable branch profiling in noinstr code 2025-03-22 09:49:26 +01:00
membarrier.c rseq: Simplify the event notification 2025-11-04 08:30:09 +01:00
pelt.c sched: Clean up and standardize #if/#else/#endif markers in sched/pelt.[ch] 2025-06-13 08:47:17 +02:00
pelt.h sched/fair: Switch to task based throttle model 2025-09-03 10:03:14 +02:00
psi.c sched/psi: Fix psi_seq initialization 2025-08-04 10:51:22 -07:00
rq-offsets.c sched: Make migrate_{en,dis}able() inline 2025-09-25 09:57:16 +02:00
rt.c sched/proxy: Yield the donor task 2025-11-11 12:33:36 +01:00
sched-pelt.h sched: Make clangd usable 2025-06-11 11:20:53 +02:00
sched.h sched/headers: Remove whitespace noise from kernel/sched/sched.h 2025-12-06 10:03:13 +01:00
smp.h sched: Make clangd usable 2025-06-11 11:20:53 +02:00
stats.c sched/smp: Use the SMP version of schedstats 2025-06-13 08:47:21 +02:00
stats.h sched/core: Fix psi_dequeue() for Proxy Execution 2025-12-06 10:13:16 +01:00
stop_task.c sched: Add support to pick functions to take rf 2025-10-16 11:13:55 +02:00
swait.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
syscalls.c Updates for the interrupt core and treewide cleanups: 2025-12-02 09:14:26 -08:00
topology.c sched/fair: Proportional newidle balance 2025-11-17 17:13:16 +01:00
wait_bit.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
wait.c ARM: 2025-07-30 17:14:01 -07:00