linux/kernel/sched
Suren Baghdasaryan 26cd2564e1 FROMLIST: psi: stop relying on timer_pending for poll_work rescheduling
Psi polling mechanism is trying to minimize the number of wakeups to
run psi_poll_work and is currently relying on timer_pending() to detect
when this work is already scheduled. This provides a window of opportunity
for psi_group_change to schedule an immediate psi_poll_work after
poll_timer_fn got called but before psi_poll_work could reschedule itself.
Below is the depiction of this entire window:

poll_timer_fn
  wake_up_interruptible(&group->poll_wait);

psi_poll_worker
  wait_event_interruptible(group->poll_wait, ...)
  psi_poll_work
    psi_schedule_poll_work
      if (timer_pending(&group->poll_timer)) return;
      ...
      mod_timer(&group->poll_timer, jiffies + delay);

Prior to 461daba06b we used to rely on poll_scheduled atomic which was
reset and set back inside psi_poll_work and therefore this race window
was much smaller.
The larger window causes increased number of wakeups and our partners
report visible power regression of ~10mA after applying 461daba06b.
Bring back the poll_scheduled atomic and make this race window even
narrower by resetting poll_scheduled only when we reach polling expiration
time. This does not completely eliminate the possibility of extra wakeups
caused by a race with psi_group_change however it will limit it to the
worst case scenario of one extra wakeup per every tracking window (0.5s
in the worst case).
This patch also ensures correct ordering between clearing poll_scheduled
flag and obtaining changed_states using memory barrier. Correct ordering
between updating changed_states and setting poll_scheduled is ensured by
atomic_xchg operation.
By tracing the number of immediate rescheduling attempts performed by
psi_group_change and the number of these attempts being blocked due to
psi monitor being already active, we can assess the effects of this change:

Before the patch:
                                           Run#1    Run#2      Run#3
Immediate reschedules attempted:           684365   1385156    1261240
Immediate reschedules blocked:             682846   1381654    1258682
Immediate reschedules (delta):             1519     3502       2558
Immediate reschedules (% of attempted):    0.22%    0.25%      0.20%

After the patch:
                                           Run#1    Run#2      Run#3
Immediate reschedules attempted:           882244   770298    426218
Immediate reschedules blocked:             881996   769796    426074
Immediate reschedules (delta):             248      502       144
Immediate reschedules (% of attempted):    0.03%    0.07%     0.03%

The number of non-blocked immediate reschedules dropped from 0.22-0.25%
to 0.03-0.07%. The drop is attributed to the decrease in the race window
size and the fact that we allow this race only when psi monitors reach
polling window expiration time.

Fixes: 461daba06b ("psi: eliminate kthread_worker from psi trigger scheduling mechanism")
Reported-by: Kathleen Chang <yt.chang@mediatek.com>
Reported-by: Wenju Xu <wenju.xu@mediatek.com>
Reported-by: Jonathan Chen <jonathan.jmchen@mediatek.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: SH Chen <show-hong.chen@mediatek.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Link: https://lore.kernel.org/patchwork/patch/1455172/
Bug: 191127654
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ie61547ca043e702442a9c6db1468cfb60ff2e729
2021-07-14 20:52:04 -07:00
..
autogroup.c
autogroup.h
clock.c
completion.c
core.c ANDROID: sched: remove regular vendor hooks for 32bit execve 2021-07-01 22:32:03 -07:00
cpuacct.c sched/cpuacct: Fix charge cpuacct.usage_sys 2020-05-19 20:34:14 +02:00
cpudeadline.c sched/deadline: Implement fallback mechanism for !fit case 2020-06-15 14:10:05 +02:00
cpudeadline.h
cpufreq_schedutil.c ANDROID: sched: Add vendor hooks to compute new cpu freq. 2021-04-23 18:42:38 -07:00
cpufreq.c ANDROID: android: Export symbols for invoking cpufreq_update_util() 2021-06-29 10:44:12 +00:00
cpupri.c ANDROID: sched: Export symbol for vendor RT hook funcion 2021-01-12 12:57:37 -08:00
cpupri.h
cputime.c ANDROID: vendor_hooks: Add hooks for account irqtime process tick 2021-05-13 08:26:15 +00:00
deadline.c FROMGIT: sched/deadline: Reduce rq lock contention in dl_add_task_root_domain() 2021-02-10 10:48:24 +00:00
debug.c Merge 5.10.37 into android12-5.10 2021-05-15 09:28:55 +02:00
fair.c ANDROID: sched: Add vendor hook to select ilb cpu 2021-06-07 11:00:05 +00:00
features.h Revert "Revert "sched,fair: Alternative sched_slice()"" 2021-05-21 13:17:06 -07:00
idle.c Merge 5.10.20 into android12-5.10 2021-03-07 12:33:33 +01:00
isolation.c isolcpus: Affine unbound kernel threads to housekeeping cpus 2020-06-15 14:10:03 +02:00
loadavg.c ANDROID: GKI: loadavg: Export for get_avenrun 2020-08-12 15:08:42 +00:00
Makefile
membarrier.c sched/membarrier: fix missing local execution of ipi_sync_rq_state() 2021-03-17 17:06:35 +01:00
OWNERS ANDROID: Add OWNERS files referring to the respective android-mainline OWNERS 2021-04-03 14:11:30 +00:00
pelt.c ANDROID: sched: pelt: Fix the PELT arrays 2021-03-04 11:53:51 +00:00
pelt.h sched/pelt: Cleanup PELT divider 2020-06-15 14:10:06 +02:00
psi.c FROMLIST: psi: stop relying on timer_pending for poll_work rescheduling 2021-07-14 20:52:04 -07:00
rt.c ANDROID: sched/rt: Only enable RT sync for SMP targets 2021-03-01 18:50:27 +00:00
sched-pelt.h ANDROID: sched: pelt: Fix the PELT arrays 2021-03-04 11:53:51 +00:00
sched.h ANDROID: sched: Add vendor data in struct cfs_rq 2021-07-01 22:32:03 -07:00
smp.h sched/headers: Split out open-coded prototypes into kernel/sched/smp.h 2020-05-28 11:03:20 +02:00
stats.c
stats.h
stop_task.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
swait.c
topology.c ANDROID: vendor_hooks: Add hooks for scheduler 2021-03-16 09:08:22 +00:00
wait_bit.c
wait.c ANDROID: Add a vendor hook that allow a module to modify the wake flag 2021-03-04 16:19:04 +00:00