linux/kernel/time
Frederic Weisbecker de3ced72a7 timers/migration: Enforce group initialization visibility to tree walkers
Commit 2522c84db513 ("timers/migration: Fix another race between hotplug
and idle entry/exit") fixed yet another race between idle exit and CPU
hotplug up leading to a wrong "0" value migrator assigned to the top
level. However there is yet another situation that remains unhandled:

         [GRP0:0]
      migrator  = TMIGR_NONE
      active    = NONE
      groupmask = 1
      /     \      \
     0       1     2..7
   idle      idle   idle

0) The system is fully idle.

         [GRP0:0]
      migrator  = CPU 0
      active    = CPU 0
      groupmask = 1
      /     \      \
     0       1     2..7
   active   idle   idle

1) CPU 0 is activating. It has done the cmpxchg on the top's ->migr_state
but it hasn't yet returned to __walk_groups().

         [GRP0:0]
      migrator  = CPU 0
      active    = CPU 0, CPU 1
      groupmask = 1
      /     \      \
     0       1     2..7
   active  active  idle

2) CPU 1 is activating. CPU 0 stays the migrator (still stuck in
__walk_groups(), delayed by #VMEXIT for example).

                    [GRP1:0]
                migrator = TMIGR_NONE
                active   = NONE
                groupmask = 1
             /                   \
         [GRP0:0]                  [GRP0:1]
      migrator  = CPU 0           migrator = TMIGR_NONE
      active    = CPU 0, CPU1     active   = NONE
      groupmask = 1               groupmask = 2
      /     \      \
     0       1     2..7                   8
   active  active  idle                !online

3) CPU 8 is preparing to boot. CPUHP_TMIGR_PREPARE is being ran by CPU 1
which has created the GRP0:1 and the new top GRP1:0 connected to GRP0:1
and GRP0:0. CPU 1 hasn't yet propagated its activation up to GRP1:0.

                    [GRP1:0]
               migrator = GRP0:0
               active   = GRP0:0
               groupmask = 1
             /                   \
         [GRP0:0]                  [GRP0:1]
     migrator  = CPU 0           migrator = TMIGR_NONE
     active    = CPU 0, CPU1     active   = NONE
     groupmask = 1               groupmask = 2
     /     \      \
    0       1     2..7                   8
  active  active  idle                !online

4) CPU 0 finally resumed after its #VMEXIT. It's in __walk_groups()
returning from tmigr_cpu_active(). The new top GRP1:0 is visible and
fetched and the pre-initialized groupmask of GRP0:0 is also visible.
As a result tmigr_active_up() is called to GRP1:0 with GRP0:0 as active
and migrator. CPU 0 is returning to __walk_groups() but suffers again
a #VMEXIT.

                    [GRP1:0]
               migrator = GRP0:0
               active   = GRP0:0
               groupmask = 1
             /                   \
         [GRP0:0]                  [GRP0:1]
     migrator  = CPU 0           migrator = TMIGR_NONE
     active    = CPU 0, CPU1     active   = NONE
     groupmask = 1               groupmask = 2
     /     \      \
    0       1     2..7                   8
  active  active  idle                 !online

5) CPU 1 propagates its activation of GRP0:0 to GRP1:0. This has no
   effect since CPU 0 did it already.

                    [GRP1:0]
               migrator = GRP0:0
               active   = GRP0:0, GRP0:1
               groupmask = 1
             /                   \
         [GRP0:0]                  [GRP0:1]
     migrator  = CPU 0           migrator = CPU 8
     active    = CPU 0, CPU1     active   = CPU 8
     groupmask = 1               groupmask = 2
     /     \      \                     \
    0       1     2..7                   8
  active  active  idle                 active

6) CPU 1 links CPU 8 to its group. CPU 8 boots and goes through
   CPUHP_AP_TMIGR_ONLINE which propagates activation.

                                   [GRP2:0]
                              migrator = TMIGR_NONE
                              active   = NONE
                              groupmask = 1
                             /                \
                    [GRP1:0]                    [GRP1:1]
               migrator = GRP0:0              migrator = TMIGR_NONE
               active   = GRP0:0, GRP0:1      active   = NONE
               groupmask = 1                  groupmask = 2
             /                   \
         [GRP0:0]                  [GRP0:1]                [GRP0:2]
     migrator  = CPU 0           migrator = CPU 8        migrator = TMIGR_NONE
     active    = CPU 0, CPU1     active   = CPU 8        active   = NONE
     groupmask = 1               groupmask = 2           groupmask = 0
     /     \      \                     \
    0       1     2..7                   8                  64
  active  active  idle                 active             !online

7) CPU 64 is booting. CPUHP_TMIGR_PREPARE is being ran by CPU 1
which has created the GRP1:1, GRP0:2 and the new top GRP2:0 connected to
GRP1:1 and GRP1:0. CPU 1 hasn't yet propagated its activation up to
GRP2:0.

                                   [GRP2:0]
                              migrator = 0 (!!!)
                              active   = NONE
                              groupmask = 1
                             /                \
                    [GRP1:0]                    [GRP1:1]
               migrator = GRP0:0              migrator = TMIGR_NONE
               active   = GRP0:0, GRP0:1      active   = NONE
               groupmask = 1                  groupmask = 2
             /                   \
         [GRP0:0]                  [GRP0:1]                [GRP0:2]
     migrator  = CPU 0           migrator = CPU 8        migrator = TMIGR_NONE
     active    = CPU 0, CPU1     active   = CPU 8        active   = NONE
     groupmask = 1               groupmask = 2           groupmask = 0
     /     \      \                     \
    0       1     2..7                   8                  64
  active  active  idle                 active             !online

8) CPU 0 finally resumed after its #VMEXIT. It's in __walk_groups()
returning from tmigr_cpu_active(). The new top GRP2:0 is visible and
fetched but the pre-initialized groupmask of GRP1:0 is not because no
ordering made its initialization visible. As a result tmigr_active_up()
may be called to GRP2:0 with a "0" child's groumask. Leaving the timers
ignored for ever when the system is fully idle.

The race is highly theoretical and perhaps impossible in practice but
the groupmask of the child is not the only concern here as the whole
initialization of the child is not guaranteed to be visible to any
tree walker racing against hotplug (idle entry/exit, remote handling,
etc...). Although the current code layout seem to be resilient to such
hazards, this doesn't tell much about the future.

Fix this with enforcing address dependency between group initialization
and the write/read to the group's parent's pointer. Fortunately that
doesn't involve any barrier addition in the fast paths.

Fixes: 10a0e6f3d3 ("timers/migration: Move hierarchy setup into cpuhotplug prepare callback")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250114231507.21672-3-frederic@kernel.org
2025-01-16 12:47:11 +01:00
..
alarmtimer.c alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack() 2024-11-07 02:47:07 +01:00
clockevents.c clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYING 2024-10-31 10:41:42 +01:00
clocksource-wdtest.c time: Add MODULE_DESCRIPTION() to time test modules 2024-06-03 11:18:50 +02:00
clocksource.c clocksource: Make negative motion detection more robust 2024-12-05 16:03:24 +01:00
hrtimer.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00
itimer.c signal: Confine POSIX_TIMERS properly 2024-10-29 11:43:18 +01:00
jiffies.c
Kconfig timekeeping: Always check for negative motion 2024-11-02 10:14:31 +01:00
Makefile timers: Move *sleep*() and timeout functions into a separate file 2024-10-16 00:36:46 +02:00
namespace.c vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy 2022-12-01 11:35:40 +01:00
ntp_internal.h ntp: Make sure RTC is synchronized when time goes backwards 2024-09-10 13:50:40 +02:00
ntp.c ntp: Remove invalid cast in time offset math 2024-11-28 12:02:38 +01:00
posix-clock.c posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime() 2024-10-23 16:05:01 +02:00
posix-cpu-timers.c posix-timers: Cleanup SIG_IGN workaround leftovers 2024-11-07 02:14:45 +01:00
posix-stubs.c posix-timers: Get rid of [COMPAT_]SYS_NI() uses 2023-12-20 21:30:27 -08:00
posix-timers.c posix-timers: Cleanup SIG_IGN workaround leftovers 2024-11-07 02:14:45 +01:00
posix-timers.h posix-timers: Cleanup SIG_IGN workaround leftovers 2024-11-07 02:14:45 +01:00
sched_clock.c seqlock, treewide: Switch to non-raw seqcount_latch interface 2024-11-05 12:55:35 +01:00
sleep_timeout.c timers: Switch to use hrtimer_setup_sleeper_on_stack() 2024-11-07 02:47:06 +01:00
test_udelay.c time: Add MODULE_DESCRIPTION() to time test modules 2024-06-03 11:18:50 +02:00
tick-broadcast-hrtimer.c time/tick-broadcast: Remove RCU_NONIDLE() usage 2023-01-13 11:48:16 +01:00
tick-broadcast.c tick/broadcast: Move per CPU pointer access into the atomic section 2024-07-31 12:37:43 +02:00
tick-common.c tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() 2024-06-10 20:18:13 +02:00
tick-internal.h clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYING 2024-10-31 10:41:42 +01:00
tick-legacy.c
tick-oneshot.c time: Fix various kernel-doc problems 2023-01-03 11:07:58 +01:00
tick-sched.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00
tick-sched.h tick/sched: Fix struct tick_sched doc warnings 2024-04-01 10:36:35 +02:00
time_test.c time: Add MODULE_DESCRIPTION() to time test modules 2024-06-03 11:18:50 +02:00
time.c time: Fix references to _msecs_to_jiffies() handling of values 2024-10-25 19:50:10 +02:00
timeconst.bc
timeconv.c
timecounter.c
timekeeping_debug.c timekeeping: Add percpu counter for tracking floor swap events 2024-10-10 10:20:46 +02:00
timekeeping_internal.h clocksource: Make negative motion detection more robust 2024-12-05 16:03:24 +01:00
timekeeping.c clocksource: Make negative motion detection more robust 2024-12-05 16:03:24 +01:00
timekeeping.h
timer_list.c tick: Split nohz and highres features from nohz_mode 2024-02-26 11:37:32 +01:00
timer_migration.c timers/migration: Enforce group initialization visibility to tree walkers 2025-01-16 12:47:11 +01:00
timer_migration.h timers/migration: Rename childmask by groupmask to make naming more obvious 2024-07-22 18:03:34 +02:00
timer.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00
vsyscall.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00