Commit Graph

2912 Commits

Author SHA1 Message Date
Frederic Weisbecker
bd3c45dd01 timers/migration: Fix another hotplug activation race
The hotplug control CPU is assumed to be active in the hierarchy but
that doesn't imply that the root is active. If the current CPU is not
the one that activated the current hierarchy, and the CPU performing
this duty is still halfway through the tree, the root may still be
observed inactive. And this can break the activation of a new root as in
the following scenario:

1) Initially, the whole system has 64 CPUs and only CPU 63 is awake.

                   [GRP1:0]
                    active
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
           idle      idle      active
         /   |   \               |
     CPU 0  CPU 1  ...         CPU 63
     idle   idle               active

2) CPU 63 goes idle _but_ due to a #VMEXIT it hasn't yet reached the
   [GRP1:0]->parent dereference (that would be NULL and stop the walk)
   in __walk_groups_from().

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
           idle      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
     idle   idle                idle

3) CPU 1 wakes up, activates GRP0:0 but didn't yet manage to propagate
   up to GRP1:0 due to yet another #VMEXIT.

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
         active      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
     idle  active               idle

3) CPU 0 wakes up and doesn't need to walk above GRP0:0 as it's CPU 1
   role.

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
         active      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
    active  active              idle

4) CPU 0 boots CPU 64. It creates a new root for it.

                             [GRP2:0]
                               idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

5) CPU 0 activates the new root, but note that GRP1:0 is still idle,
   waiting for CPU 1 to resume from #VMEXIT and activate it.

                             [GRP2:0]
                              active
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

6) CPU 63 resumes after #VMEXIT and sees the new GRP1:0 parent.
   Therefore it propagates the stale inactive state of GRP1:0 up to
   GRP2:0.

                             [GRP2:0]
                              idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

7) CPU 1 resumes after #VMEXIT and finally activates GRP1:0. But it
   doesn't observe its parent link because no ordering enforced that.
   Therefore GRP2:0 is spuriously left idle.

                             [GRP2:0]
                              idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   active                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

Such races are highly theoretical and the problem would solve itself
once the old root ever becomes idle again. But it still leaves a taste
of discomfort.

Fix it with enforcing a fully ordered atomic read of the old root state
before propagating the activate state up to the new root. It has a two
directions ordering effect:

* Acquire + release of the latest old root state: If the hotplug control
  CPU is not the one that woke up the old root, make sure to acquire its
  active state and propagate it upwards through the ordered chain of
  activation (the acquire pairs with the cmpxchg() in tmigr_active_up()
  and subsequent releases will pair with atomic_read_acquire() and
  smp_mb__after_atomic() in tmigr_inactive_up()).

* Release: If the hotplug control CPU is not the one that must wake up
  the old root, but the CPU covering that is lagging behind its duty,
  publish the links from the old root to the new parents. This way the
  lagging CPU will propagate the active state itself.

Fixes: 7ee9887703 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260423165354.95152-2-frederic@kernel.org
2026-05-06 08:21:12 +02:00
Thomas Gleixner
4096fd0e8e clockevents: Add missing resets of the next_event_forced flag
The prevention mechanism against timer interrupt starvation missed to reset
the next_event_forced flag in a couple of places:

    - When the clock event state changes. That can cause the flag to be
      stale over a shutdown/startup sequence

    - When a non-forced event is armed, which then prevents rearming before
      that event. If that event is far out in the future this will cause
      missed timer interrupts.

    - In the suspend wakeup handler.

That led to stalls which have been reported by several people.

Add the missing resets, which fixes the problems for the reporters.

Fixes: d6e152d905 ("clockevents: Prevent timer interrupt starvation")
Reported-by: Hanabishi <i.r.e.c.c.a.k.u.n+kernel.org@gmail.com>
Reported-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Hanabishi <i.r.e.c.c.a.k.u.n+kernel.org@gmail.com>
Tested-by: Eric Naim <dnaim@cachyos.org>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/68d1e9ac-2780-4be3-8ee3-0788062dd3a4@gmail.com
Link: https://patch.msgid.link/87340xfeje.ffs@tglx
2026-04-16 21:22:04 +02:00
Linus Torvalds
334fbe734e mm.git review status for linus..mm-stable
Everything:
 
 Total patches:       368
 Reviews/patch:       1.56
 Reviewed rate:       74%
 
 Excluding DAMON:
 
 Total patches:       316
 Reviews/patch:       1.77
 Reviewed rate:       81%
 
 Excluding DAMON and zram:
 
 Total patches:       306
 Reviews/patch:       1.81
 Reviewed rate:       82%
 
 Excluding DAMON, zram and maple_tree:
 
 Total patches:       276
 Reviews/patch:       2.01
 Reviewed rate:       91%
 
 Significant patch series in this merge:
 
 - The 30 patch series "maple_tree: Replace big node with maple copy"
   from Liam Howlett is mainly prepararatory work for ongoing development
   but it does reduce stack usage and is an improvement.
 
 - The 12 patch series "mm, swap: swap table phase III: remove swap_map"
   from Kairui Song offers memory savings by removing the static swap_map.
   It also yields some CPU savings and implements several cleanups.
 
 - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush
   Yadav adds file seal preservation to LUO's memfd code.
 
 - The 2 patch series "mm: zswap: add per-memcg stat for incompressible
   pages" from Jiayuan Chen adds additional userspace stats reportng to
   zswap.
 
 - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike
   Rapoport implements some cleanups for our handling of ZERO_PAGE() and
   zero_pfn.
 
 - The 2 patch series "mm/kmemleak: Improve scan_should_stop()
   implementation" from Zhongqiu Han provides an robustness improvement and
   some cleanups in the kmemleak code.
 
 - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang
   "improves the khugepaged scan logic and reduces CPU consumption by
   prioritizing scanning tasks that access memory frequently".
 
 - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies
   Kexec Handover by "transitioning KHO from an xarray-based metadata
   tracking system with serialization to a radix tree data structure that
   can be passed directly to the next kernel"
 
 - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan
   tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's
   tracepointing.
 
 - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper
   and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow
   stack code: remove per-arch code in favour of a generic implementation.
 
 - The 2 patch series "Fix KASAN support for KHO restored vmalloc
   regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO
   restores a vmalloc area.
 
 - The 4 patch series "mm: Remove stray references to pagevec" from Tal
   Zussman provides several cleanups, mainly udpating references to "struct
   pagevec", which became folio_batch three years ago.
 
 - The 17 patch series "mm: Eliminate fake head pages from vmemmap
   optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap
   optimization (HVO) by changing how tail pages encode their relationship
   to the head page.
 
 - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for
   core layer filters" from SeongJae Park improves two problematic
   behaviors of DAMOS that makes it less efficient when core layer filters
   are used.
 
 - The 3 patch series "mm/damon: strictly respect min_nr_regions" from
   SeongJae Park improves DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter.
 
 - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil
   Babka is a proper fix for a previously hotfixed SMP=n issue.  Code
   simplifications and cleanups ennsed.
 
 - The 16 patch series "mm: cleanups around unmapping / zapping" from
   David Hildenbrand implements "a bunch of cleanups around unmapping and
   zapping.  Mostly simplifications, code movements, documentation and
   renaming of zapping functions".
 
 - The 6 patch series "support batched checking of the young flag for
   MGLRU" from Baolin Wang supports batched checking of the young flag for
   MGLRU.  It's part cleanups; one benchmark shows large performance
   benefits for arm64.
 
 - The 5 patch series "memcg: obj stock and slab stat caching cleanups"
   from Johannes Weiner provides memcg cleanup and robustness improvements.
 
 - The 5 patch series "Allow order zero pages in page reporting" from
   Yuvraj Sakshith enhances page_reporting's free page reporting - it is
   presently and undesirably order-0 pages when reporting free memory.
 
 - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is
   cleanup work following from the recent conversion of the VMA flags to a
   bitmap.
 
 - The 10 patch series "mm/damon: add optional debugging-purpose sanity
   checks" from SeongJae Park adds some more developer-facing debug checks
   into DAMON core.
 
 - The 2 patch series "mm/damon: test and document power-of-2
   min_region_sz requirement" from SeongJae Park adds an additional DAMON
   kunit test and makes some adjustments to the addr_unit parameter
   handling.
 
 - The 3 patch series "mm/damon/core: make passed_sample_intervals
   comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time
   overflow issue in DAMON core.
 
 - The 7 patch series "mm/damon: improve/fixup/update ratio calculation,
   test and documentation" from SeongJae Park is a "batch of misc/minor
   improvements and fixups" for DAMON.
 
 - The 4 patch series "mm: move vma_(kernel|mmu)_pagesize() out of
   hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device
   when CONFIG_HUGETLB=n.  Some code movement was required.
 
 - The 6 patch series "zram: recompression cleanups and tweaks" from
   Sergey Senozhatsky provides "a somewhat random mix of fixups,
   recompression cleanups and improvements" in the zram code.
 
 - The 11 patch series "mm/damon: support multiple goal-based quota
   tuning algorithms" from SeongJae Park extend DAMOS quotas goal
   auto-tuning to support multiple tuning algorithms that users can select.
 
 - The 4 patch series "mm: thp: reduce unnecessary
   start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs
   handling so we no longer spam the logs with reams of junk when
   starting/stopping khugepaged.
 
 - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes
   provides some cleanups and slight fixes in the mremap, mmap and vma
   code.
 
 - The 5 patch series "mm/damon: support addr_unit on default monitoring
   targets for modules" from SeongJae Park extends the use of DAMON core's
   addr_unit tunable.
 
 - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites"
   from Nico Pache provides cleanups in the khugepaged and is a base for
   Nico's planned khugepaged mTHP support.
 
 - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups"
   from David Hildenbrand implements code movement and cleanups in the
   memhotplug and sparsemem code.
 
 - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and
   cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some
   memhotplug Kconfig support.
 
 - The 6 patch series "change young flag check functions to return bool"
   from Baolin Wang is "a cleanup patchset to change all young flag check
   functions to return bool".
 
 - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL
   dereference issues" from Josh Law and SeongJae Park fixes a few
   potential DAMON bugs.
 
 - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma
   code" from "converts a lot of the existing use of the legacy vm_flags_t
   data type to the new vma_flags_t type which replaces it".  Mainly in the
   vma code.
 
 - The 21 patch series "mm: expand mmap_prepare functionality and usage"
   from Lorenzo Stoakes "expands the mmap_prepare functionality, which is
   intended to replace the deprecated f_op->mmap hook which has been the
   source of bugs and security issues for some time".  Cleanups,
   documentation, extension of mmap_prepare into filesystem drivers.
 
 - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from
   Lorenzo Stoakes simplifies and cleans up zap_huge_pmd().  Additional
   cleanups around vm_normal_folio_pmd() and the softleaf functionality are
   performed.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA
 jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi
 GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ=
 =1Q7o
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "maple_tree: Replace big node with maple copy" (Liam Howlett)

   Mainly prepararatory work for ongoing development but it does reduce
   stack usage and is an improvement.

 - "mm, swap: swap table phase III: remove swap_map" (Kairui Song)

   Offers memory savings by removing the static swap_map. It also yields
   some CPU savings and implements several cleanups.

 - "mm: memfd_luo: preserve file seals" (Pratyush Yadav)

   File seal preservation to LUO's memfd code

 - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
   Chen)

   Additional userspace stats reportng to zswap

 - "arch, mm: consolidate empty_zero_page" (Mike Rapoport)

   Some cleanups for our handling of ZERO_PAGE() and zero_pfn

 - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
   Han)

   A robustness improvement and some cleanups in the kmemleak code

 - "Improve khugepaged scan logic" (Vernon Yang)

   Improve khugepaged scan logic and reduce CPU consumption by
   prioritizing scanning tasks that access memory frequently

 - "Make KHO Stateless" (Jason Miu)

   Simplify Kexec Handover by transitioning KHO from an xarray-based
   metadata tracking system with serialization to a radix tree data
   structure that can be passed directly to the next kernel

 - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
   Ballasi and Steven Rostedt)

   Enhance vmscan's tracepointing

 - "mm: arch/shstk: Common shadow stack mapping helper and
   VM_NOHUGEPAGE" (Catalin Marinas)

   Cleanup for the shadow stack code: remove per-arch code in favour of
   a generic implementation

 - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)

   Fix a WARN() which can be emitted the KHO restores a vmalloc area

 - "mm: Remove stray references to pagevec" (Tal Zussman)

   Several cleanups, mainly udpating references to "struct pagevec",
   which became folio_batch three years ago

 - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
   Shutsemau)

   Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
   pages encode their relationship to the head page

 - "mm/damon/core: improve DAMOS quota efficiency for core layer
   filters" (SeongJae Park)

   Improve two problematic behaviors of DAMOS that makes it less
   efficient when core layer filters are used

 - "mm/damon: strictly respect min_nr_regions" (SeongJae Park)

   Improve DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter

 - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)

   The proper fix for a previously hotfixed SMP=n issue. Code
   simplifications and cleanups ensued

 - "mm: cleanups around unmapping / zapping" (David Hildenbrand)

   A bunch of cleanups around unmapping and zapping. Mostly
   simplifications, code movements, documentation and renaming of
   zapping functions

 - "support batched checking of the young flag for MGLRU" (Baolin Wang)

   Batched checking of the young flag for MGLRU. It's part cleanups; one
   benchmark shows large performance benefits for arm64

 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)

   memcg cleanup and robustness improvements

 - "Allow order zero pages in page reporting" (Yuvraj Sakshith)

   Enhance free page reporting - it is presently and undesirably order-0
   pages when reporting free memory.

 - "mm: vma flag tweaks" (Lorenzo Stoakes)

   Cleanup work following from the recent conversion of the VMA flags to
   a bitmap

 - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
   Park)

   Add some more developer-facing debug checks into DAMON core

 - "mm/damon: test and document power-of-2 min_region_sz requirement"
   (SeongJae Park)

   An additional DAMON kunit test and makes some adjustments to the
   addr_unit parameter handling

 - "mm/damon/core: make passed_sample_intervals comparisons
   overflow-safe" (SeongJae Park)

   Fix a hard-to-hit time overflow issue in DAMON core

 - "mm/damon: improve/fixup/update ratio calculation, test and
   documentation" (SeongJae Park)

   A batch of misc/minor improvements and fixups for DAMON

 - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
   Hildenbrand)

   Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
   movement was required.

 - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)

   A somewhat random mix of fixups, recompression cleanups and
   improvements in the zram code

 - "mm/damon: support multiple goal-based quota tuning algorithms"
   (SeongJae Park)

   Extend DAMOS quotas goal auto-tuning to support multiple tuning
   algorithms that users can select

 - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)

   Fix the khugpaged sysfs handling so we no longer spam the logs with
   reams of junk when starting/stopping khugepaged

 - "mm: improve map count checks" (Lorenzo Stoakes)

   Provide some cleanups and slight fixes in the mremap, mmap and vma
   code

 - "mm/damon: support addr_unit on default monitoring targets for
   modules" (SeongJae Park)

   Extend the use of DAMON core's addr_unit tunable

 - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)

   Cleanups to khugepaged and is a base for Nico's planned khugepaged
   mTHP support

 - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)

   Code movement and cleanups in the memhotplug and sparsemem code

 - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
   CONFIG_MIGRATION" (David Hildenbrand)

   Rationalize some memhotplug Kconfig support

 - "change young flag check functions to return bool" (Baolin Wang)

   Cleanups to change all young flag check functions to return bool

 - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
   Law and SeongJae Park)

   Fix a few potential DAMON bugs

 - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
   Stoakes)

   Convert a lot of the existing use of the legacy vm_flags_t data type
   to the new vma_flags_t type which replaces it. Mainly in the vma
   code.

 - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)

   Expand the mmap_prepare functionality, which is intended to replace
   the deprecated f_op->mmap hook which has been the source of bugs and
   security issues for some time. Cleanups, documentation, extension of
   mmap_prepare into filesystem drivers

 - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)

   Simplify and clean up zap_huge_pmd(). Additional cleanups around
   vm_normal_folio_pmd() and the softleaf functionality are performed.

* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm: fix deferred split queue races during migration
  mm/khugepaged: fix issue with tracking lock
  mm/huge_memory: add and use has_deposited_pgtable()
  mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
  mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
  mm/huge_memory: separate out the folio part of zap_huge_pmd()
  mm/huge_memory: use mm instead of tlb->mm
  mm/huge_memory: remove unnecessary sanity checks
  mm/huge_memory: deduplicate zap deposited table call
  mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
  mm/huge_memory: add a common exit path to zap_huge_pmd()
  mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
  mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
  mm/huge: avoid big else branch in zap_huge_pmd()
  mm/huge_memory: simplify vma_is_specal_huge()
  mm: on remap assert that input range within the proposed VMA
  mm: add mmap_action_map_kernel_pages[_full]()
  uio: replace deprecated mmap hook with mmap_prepare in uio_info
  drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
  mm: allow handling of stacked mmap_prepare hooks in more drivers
  ...
2026-04-15 12:59:16 -07:00
Linus Torvalds
f21f7b5162 Update to the VDSO subsystem:
- Make the handling of compat functions consistent and more robust
 
      - Rework the underlying data store so that it is dynamically
        allocated, which allows the conversion of the last holdout SPARC64
        to the generic VDSO implementation
 
      - Rework the SPARC64 VDSO to utilize the generic implementation
 
      - Mop up the left overs of the non-generic VDSO support in the core
        code.
 
      - Expand the VDSO selftest and make them more robust
 
      - Allow time namespaces to be enabled independently of the generic
        VDSO support, which was not possible before due to SPARC64 not
        using it.
 
      - Various cleanups and improvements in the related code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmnb0v8QHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYocfqD/9ywgnvwRH6B612mY4PI3qCbLHs6n9f78aH
 YwyXmmfBZ5vt1ZtptHD+BAxiIMm9GC+/exdj5zhcOWucnBVhorcloE6evxhkJAMn
 RhTQFKkEmcA/UV2Yfct9r+33kgZRyu4IIul4J7hgn2o5T1BqwZbOil0W/O5adr5P
 MDLxjT1OLV80ZZWI9qbWcR/aR7W7sHcdwfVPPqjhombRY7f391Mo3dZeM5C2y55x
 8TXCEqVpN1RJzFinWEgQN7QpP4OmF0rRuXSrDQpkH6pk/+RSqNlT/QGG7MJtmCQR
 E6CeBjNRUn318KiroaGyTKlM9xsL3gNoiCY24ZTwzZxx3g5gSAR3KTCTJhQU0hpu
 Svxj+ksqEAyW7fAOIsbce6W8fUPKC2KM+juXgPKcqZ5hjE2fALD+eEYMlq00jSiu
 sj71007cM9tZKOXPdWs3Fv7AY2Yj7iiRiRz9gv1wqS1z7ybxiaFjxjLYYakej0tr
 rmwBDEGhNow7msZZttr01BRZk9hDUWfIiJtL+0BrgRLNzst2A7WoagtZ2s0Z7Psl
 RjtWgYNBDJ878xK0J+Djqb9TyLraGWZShIIna9uYCAJX9i954xfKJ//NOnUkZhcl
 jslDLHhdttyJ+TmgIsc1ntUGvYvHqH5ywQpyDfWepMKyIYdaJLHOr2K6bwFnGHdw
 uocXvLrkXw==
 =8ixX
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull vdso updates from Thomas Gleixner:

 - Make the handling of compat functions consistent and more robust

 - Rework the underlying data store so that it is dynamically allocated,
   which allows the conversion of the last holdout SPARC64 to the
   generic VDSO implementation

 - Rework the SPARC64 VDSO to utilize the generic implementation

 - Mop up the left overs of the non-generic VDSO support in the core
   code

 - Expand the VDSO selftest and make them more robust

 - Allow time namespaces to be enabled independently of the generic VDSO
   support, which was not possible before due to SPARC64 not using it

 - Various cleanups and improvements in the related code

* tag 'timers-vdso-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  timens: Use task_lock guard in timens_get*()
  timens: Use mutex guard in proc_timens_set_offset()
  timens: Simplify some calls to put_time_ns()
  timens: Add a __free() wrapper for put_time_ns()
  timens: Remove dependency on the vDSO
  vdso/timens: Move functions to new file
  selftests: vDSO: vdso_test_correctness: Add a test for time()
  selftests: vDSO: vdso_test_correctness: Use facilities from parse_vdso.c
  selftests: vDSO: vdso_test_correctness: Handle different tv_usec types
  selftests: vDSO: vdso_test_correctness: Drop SYS_getcpu fallbacks
  selftests: vDSO: vdso_test_gettimeofday: Remove nolibc checks
  Revert "selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers"
  random: vDSO: Remove ifdeffery
  random: vDSO: Trim vDSO includes
  vdso/datapage: Trim down unnecessary includes
  vdso/datapage: Remove inclusion of gettimeofday.h
  vdso/helpers: Explicitly include vdso/processor.h
  vdso/gettimeofday: Add explicit includes
  random: vDSO: Add explicit includes
  MIPS: vdso: Explicitly include asm/vdso/vdso.h
  ...
2026-04-14 10:53:44 -07:00
Thomas Gleixner
ff1c0c5d07 Merge branch 'timers/urgent' into timers/core
to resolve the conflict with urgent fixes.
2026-04-11 07:58:33 +02:00
Thomas Gleixner
d6e152d905 clockevents: Prevent timer interrupt starvation
Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
up in user space. He provided a reproducer, which sets up a timerfd based
timer and then rearms it in a loop with an absolute expiry time of 1ns.

As the expiry time is in the past, the timer ends up as the first expiring
timer in the per CPU hrtimer base and the clockevent device is programmed
with the minimum delta value. If the machine is fast enough, this ends up
in a endless loop of programming the delta value to the minimum value
defined by the clock event device, before the timer interrupt can fire,
which starves the interrupt and consequently triggers the lockup detector
because the hrtimer callback of the lockup mechanism is never invoked.

As a first step to prevent this, avoid reprogramming the clock event device
when:
     - a forced minimum delta event is pending
     - the new expiry delta is less then or equal to the minimum delta

Thanks to Calvin for providing the reproducer and to Borislav for testing
and providing data from his Zen5 machine.

The problem is not limited to Zen5, but depending on the underlying
clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
not necessarily observable.

This change serves only as the last resort and further changes will be made
to prevent this scenario earlier in the call chain as far as possible.

[ tglx: Updated to restore the old behaviour vs. !force and delta <= 0 and
  	fixed up the tick-broadcast handlers as pointed out by Borislav ]

Fixes: d316c57ff6 ("[PATCH] clockevents: add core functionality")
Reported-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Calvin Owens <calvin@wbinvd.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
Link: https://patch.msgid.link/20260407083247.562657657@kernel.org
2026-04-10 22:45:38 +02:00
Zhan Xusheng
09c04714cb alarmtimer: Access timerqueue node under lock in suspend
In alarmtimer_suspend(), timerqueue_getnext() is called under
base->lock, but next->expires is read after the lock is released.

This is safe because suspend freezes all relevant task contexts,
but reading the node while holding the lock makes the code easier
to reason about and not worry about a theoretical UAF.

Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260407143627.19405-1-zhanxusheng@xiaomi.com
2026-04-07 19:14:26 +02:00
Josh Snyder
82b915051d tick/nohz: Fix inverted return value in check_tick_dependency() fast path
Commit 56534673ce ("tick/nohz: Optimize check_tick_dependency() with
early return") added a fast path that returns !val when the tick_stop
tracepoint is disabled.

This is inverted: the slow path returns true when a dependency IS found
(val != 0), but !val returns true when val is zero (no dependency).  The
result is that can_stop_full_tick() sees "dependency found" when there are
none, and the tick never stops on nohz_full CPUs.

Fix this by returning !!val instead of !val, matching the slow-path semantics.

Fixes: 56534673ce ("tick/nohz: Optimize check_tick_dependency() with early return")
Signed-off-by: Josh Snyder <josh@code406.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Assisted-by: Claude:claude-opus-4-6
Link: https://patch.msgid.link/20260402-fix-idle-tick2-v1-1-eecb589649d3@code406.com
2026-04-07 15:30:21 +02:00
Zhan Xusheng
c5283a1ffd hrtimer: Fix incorrect #endif comment for BITS_PER_LONG check
The #endif comment says "BITS_PER_LONG >= 64", but the corresponding #if
guard is "BITS_PER_LONG < 64".

The comment was originally correct when the block had a three-way
#if/#else/#endif structure, where the #else branch provided a 64-bit inline
version.  Commit 79bf2bb335 ("[PATCH] tick-management: dyntick / highres
functionality") removed the #else branch but did not update the #endif
comment, leaving it inconsistent with the remaining #if condition.

Fix the comment to match the preprocessor guard.

Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260331074811.26147-1-zhanxusheng@xiaomi.com
2026-04-01 18:48:15 +02:00
Thomas Weißschuh
7138a8698a timens: Use task_lock guard in timens_get*()
Simplify the logic in timens_get*() by converting the task_lock
usage to a guard().

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260330-timens-cleanup-v1-4-936e91c9dd30@linutronix.de
2026-04-01 17:13:36 +02:00
Thomas Weißschuh
6d89dc8b1c timens: Use mutex guard in proc_timens_set_offset()
Simplify the logic in proc_timens_set_offset() by converting the mutex
usage to a guard().

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260330-timens-cleanup-v1-3-936e91c9dd30@linutronix.de
2026-04-01 17:13:35 +02:00
Thomas Weißschuh
3fa3aeb4a5 timens: Simplify some calls to put_time_ns()
Use the new __free() based cleanup helpers to simplify some functions.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260330-timens-cleanup-v1-2-936e91c9dd30@linutronix.de
2026-04-01 17:13:35 +02:00
Zhan Xusheng
e9fb60a780 posix-timers: Fix stale function name in comment
The comment in exit_itimers() still refers to itimer_delete(),
which was replaced by posix_timer_delete(). Update the comment
accordingly.

Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260326142210.98632-1-zhanxusheng@xiaomi.com
2026-03-28 14:18:13 +01:00
Thomas Weißschuh
1b6c89285d timens: Remove dependency on the vDSO
Previously, missing time namespace support in the vDSO meant that time
namespaces needed to be disabled globally. This was expressed in a hard
dependency on the generic vDSO library. This also meant that architectures
without any vDSO or only a stub vDSO could not enable time namespaces.
Now that all architectures using a real vDSO are using the generic library,
that dependency is not necessary anymore.

Remove the dependency and let all architectures enable time namespaces.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260326-vdso-timens-decoupling-v2-2-c82693a7775f@linutronix.de
2026-03-26 15:44:23 +01:00
Thomas Weißschuh
5dc9cf835a vdso/timens: Move functions to new file
As a preparation of the untangling of time namespaces and the vDSO, move
the glue functions between those subsystems into a new file.

While at it, switch the mutex lock and mmap_read_lock() in the vDSO
namespace code to guard().

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260326-vdso-timens-decoupling-v2-1-c82693a7775f@linutronix.de
2026-03-26 15:44:22 +01:00
Shrikanth Hegde
551e49beb1 timers: Get this_cpu once while clearing the idle state
Calling smp_processor_id() on:
- In CONFIG_DEBUG_PREEMPT=y, if preemption/irq is disabled, then it does
not print any warning.
- In CONFIG_DEBUG_PREEMPT=n, it doesn't do anything apart from getting
__smp_processor_id

So with both CONFIG_DEBUG_PREEMPT=y/n, in preemption disabled section it is
better to cache the value. It saves a few cycles. Though tiny, repeated
adds up.

timer_clear_idle() is called with interrupts disabled. So cache the value
once.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Link: https://patch.msgid.link/20260323193630.640311-5-sshegde@linux.ibm.com
2026-03-24 23:21:30 +01:00
Zhan Xusheng
5d16467ae5 alarmtimer: Fix argument order in alarm_timer_forward()
alarm_timer_forward() passes arguments to alarm_forward() in the wrong
order:

  alarm_forward(alarm, timr->it_interval, now);

However, alarm_forward() is defined as:

  u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval);

and uses the second argument as the current time:

  delta = ktime_sub(now, alarm->node.expires);

Passing the interval as "now" results in incorrect delta computation,
which can lead to missed expirations or incorrect overrun accounting.

This issue has been present since the introduction of
alarm_timer_forward().

Fix this by swapping the arguments.

Fixes: e7561f1633 ("alarmtimer: Implement forward callback")
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260323061130.29991-1-zhanxusheng@xiaomi.com
2026-03-24 23:17:14 +01:00
Ingo Molnar
f6472b1793 Linux 7.0-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmm3G/UeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGZJUH/R0vQ3Vha48QDEic
 1NREwaHxAoTFi0i3y7OPPklqrP2V09D1qg4Q6fExYQVTQgV6F2DRjVbyPKrmr4ay
 BA6aHrUdnFngYHpDlI1b1r7rJiAIN4WFHl7StO70bS+EB+UPsP9cfP3CKXUfKfqT
 kyHXzUrd5QnjYmlb9rQw1E6rzsRamNtGUtZf7TwDidJYjtm3sPeDHUkjyRy4xkYd
 UouIu6W7UXoicl38bJAgaWBY5BiYtjN6ktnY4/gcqDeqYd7mTM3Eb1B+OSXgFfip
 F0OYfJhfWn+63WnPA+1I5jXWC1UrdVXTMK/NTYjhmGlfdmkLcWDlNGtu+qKZbpwj
 fmF3Kyo=
 =6nX1
 -----END PGP SIGNATURE-----

Merge tag 'v7.0-rc4' into timers/core, to resolve conflict

Resolve conflict between this change in the upstream kernel:

  4c652a4772 ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline")

... and this pending change in timers/core:

  0e98eb1481 ("entry: Prepare for deferred hrtimer rearming")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2026-03-21 08:02:36 +01:00
Thomas Gleixner
763aacf86f clocksource: Rewrite watchdog code completely
The clocksource watchdog code has over time reached the state of an
impenetrable maze of duct tape and staples. The original design, which was
made in the context of systems far smaller than today, is based on the
assumption that the to be monitored clocksource (TSC) can be trivially
compared against a known to be stable clocksource (HPET/ACPI-PM timer).

Over the years it turned out that this approach has major flaws:

  - Long delays between watchdog invocations can result in wrap arounds
    of the reference clocksource

  - Scalability of the reference clocksource readout can degrade on large
    multi-socket systems due to interconnect congestion

This was addressed with various heuristics which degraded the accuracy of
the watchdog to the point that it fails to detect actual TSC problems on
older hardware which exposes slow inter CPU drifts due to firmware
manipulating the TSC to hide SMI time.

To address this and bring back sanity to the watchdog, rewrite the code
completely with a different approach:

  1) Restrict the validation against a reference clocksource to the boot
     CPU, which is usually the CPU/Socket closest to the legacy block which
     contains the reference source (HPET/ACPI-PM timer). Validate that the
     reference readout is within a bound latency so that the actual
     comparison against the TSC stays within 500ppm as long as the clocks
     are stable.

  2) Compare the TSCs of the other CPUs in a round robin fashion against
     the boot CPU in the same way the TSC synchronization on CPU hotplug
     works. This still can suffer from delayed reaction of the remote CPU
     to the SMP function call and the latency of the control variable cache
     line. But this latency is not affecting correctness. It only affects
     the accuracy. With low contention the readout latency is in the low
     nanoseconds range, which detects even slight skews between CPUs. Under
     high contention this becomes obviously less accurate, but still
     detects slow skews reliably as it solely relies on subsequent readouts
     being monotonically increasing. It just can take slightly longer to
     detect the issue.

  3) Rewrite the watchdog test so it tests the various mechanisms one by
     one and validating the result against the expectation.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Jiri Wiesner <jwiesner@suse.de>
Reviewed-by: Daniel J Blueman <daniel@quora.org>
Link: https://patch.msgid.link/20260123231521.926490888@kernel.org
Link: https://patch.msgid.link/87h5qeomm5.ffs@tglx
2026-03-20 13:36:32 +01:00
Thomas Gleixner
1432f9d4e8 clocksource: Don't use non-continuous clocksources as watchdog
Using a non-continuous aka untrusted clocksource as a watchdog for another
untrusted clocksource is equivalent to putting the fox in charge of the
henhouse.

That's especially true with the jiffies clocksource which depends on
interrupt delivery based on a periodic timer. Neither the frequency of that
timer is trustworthy nor the kernel's ability to react on it in a timely
manner and rearm it if it is not self rearming.

Just don't bother to deal with this. It's not worth the trouble and only
relevant to museum piece hardware.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260123231521.858743259@kernel.org
2026-03-12 12:23:27 +01:00
Thomas Weißschuh (Schneider Electric)
88c316ff76 hrtimer: Add a helper to retrieve a hrtimer from its timerqueue node
The container_of() call is open-coded multiple times.

Add a helper macro.

Use container_of_const() to preserve constness.

Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-12-095357392669@linutronix.de
2026-03-12 12:15:56 +01:00
Thomas Weißschuh (Schneider Electric)
bd803783df hrtimer: Drop unnecessary pointer indirection in hrtimer_expire_entry event
This pointer indirection is a remnant from when ktime_t was a struct,
today it is pointless.

Drop the pointer indirection.

Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-9-095357392669@linutronix.de
2026-03-12 12:15:55 +01:00
Thomas Weißschuh (Schneider Electric)
194675f16d hrtimer: Don't zero-initialize ret in hrtimer_nanosleep()
The value will be assigned to before any usage.
No other function in hrtimer.c does such a zero-initialization.

Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-7-095357392669@linutronix.de
2026-03-12 12:15:55 +01:00
Thomas Weißschuh (Schneider Electric)
112c685f02 timekeeping: Mark offsets array as const
Neither the array nor the offsets it is pointing to are meant to be
changed through the array.

Mark both the array and the values it points to as const.

Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-5-095357392669@linutronix.de
2026-03-12 12:15:54 +01:00
Thomas Weißschuh (Schneider Electric)
ba546d3d89 timekeeping/auxclock: Consistently use raw timekeeper for tk_setup_internals()
In aux_clock_enable() the clocksource from tkr_raw is used to call
tk_setup_internals(). Do the same in tk_aux_update_clocksource().  While
the clocksources will be the same in any case, this is less confusing.

Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-4-095357392669@linutronix.de
2026-03-12 12:15:54 +01:00
Thomas Weißschuh (Schneider Electric)
bb2705b4e0 timer_list: Print offset as signed integer
The offset of a hrtimer base may be negative.

Print those values correctly.

Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-3-095357392669@linutronix.de
2026-03-12 12:15:54 +01:00
Thomas Gleixner
1e4a70e0f6 Merge branch 'sched/hrtick' into timers/core
Pick up the hrtick related hrtimer changes so other unrelated changes can
be queued on top.
2026-03-11 21:14:32 +01:00
Peter Zijlstra
92f7ee408c hrtimer: Less agressive interrupt 'hang' handling
When the hrtimer_interrupt needs to restart more than 3 times and still has
expired timers, the interrupt is considered hung. To give the system a
little time to recover, the hardware timer is programmed a little into the
future.

Prior to commit 2889243848 ("hrtimer: Re-arrange hrtimer_interrupt()"),
this was relative to the amount of time spend serving the interrupt with a
max of 100 msec.

However, in order to simplify, and because this condition 'should' not
happen, the timeout was unconditionally set to 100 msec.

'Obviously' there is a benchmark that hits this hard, by programming a
ton of very short timers :-/

Since reprogramming is decoupled from the interrupt handling, the actual
execution time is lost, however the code does track max_hang_time. Using
that, rather than the 100 ms max restores performance.

  stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --timermix 64

                  bogo ops/s
 288924384856^1: 23715979.93
 288924384856:   11550049.77
 patched:        23361116.78

Additionally, Thomas noted that cpu_base->hang_detected should not be
cleared until the next interrupt, such that __hrtimer_reprogram() won't
undo the extra delay.

Fixes: 2889243848 ("hrtimer: Re-arrange hrtimer_interrupt()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311121500.GF652779@noisy.programming.kicks-ass.net
Closes: https://lore.kernel.org/oe-lkp/202603102229.74b9dee4-lkp@intel.com
2026-03-11 21:13:55 +01:00
Steven Rostedt
755a648e78 time/jiffies: Mark jiffies_64_to_clock_t() notrace
The trace_clock_jiffies() function that handles the "uptime" clock for
tracing calls jiffies_64_to_clock_t(). This causes the function tracer to
constantly recurse when the tracing clock is set to "uptime". Mark it
notrace to prevent unnecessary recursion when using the "uptime" clock.

Fixes: 58d4e21e50 ("tracing: Fix wraparound problems in "uptime" trace clock")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260306212403.72270bb2@robin
2026-03-11 10:33:12 +01:00
Arnd Bergmann
c453b9abb4 clocksource: Remove ARCH_CLOCKSOURCE_DATA
After sparc64, there are no remaining users of ARCH_CLOCKSOURCE_DATA
and it can just be removed.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Andreas Larsson <andreas@gaisler.com>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260304-vdso-sparc64-generic-2-v6-14-d8eb3b0e1410@linutronix.de

[Thomas: drop sparc64 bits from the patch]
2026-03-11 10:18:33 +01:00
Linus Torvalds
6ff1020c2f Make clock_adjtime() syscall timex validation slightly more
permissive for auxiliary clocks, to not reject syscalls
 based on the status field that do not try to modify the
 status field. This makes ABI behavior in clock_adjtime()
 consistent with CLOCK_REALTIME.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmsxzkRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hq8g//fRTp9p2pVfmRWUoxWELrT/bMK1r+D6F3
 6BYkwp68peRhchVrFxkI/Y37rjAIC8CXZSPuvkubqIROrH3gA7SCCQYCcZKdss+t
 i3lbpQF8IbagPIS5btpOAN2KRCu2S7aqjDdH0rWb9VhQdlW7fI71Z72Uz07YEA+q
 TWpy3gE531P/dgAqcvIAyMHnFZDCb1S6z8wZvT3SV4r4GkczfXpTFyNHHtETSu0V
 7isuOBfloM4HpDU50oUotlqBiwigH27J2Ad6aIrnCA7iaQPrzREysG+8E96ShhaB
 g6+qaQS5gTgFryA1bggA6LzGveLOI8bjy2kZ2SnZWuFPj46OReGIuwK4kyY07jz2
 xk0sd37alN16ETKhGVLfAgjmzVGoKVNnp4ak9J3VmMbxWEmXeObuOC8SmF9VImc1
 4bRaG9+Tlfd4DtOOz2+E4VcPE1D9A2tMw4esgUaXRrrp4GlEcKOJ5PRlWj0uGvrh
 xLPLbL0XIiWsjMsHdVs4Gq9Z0MvfRHc4VLOviIqLFtHox2DscZypPkyjKAv5inp0
 /VWyUYJkkr07RMQQ3nqHnP+lzAfO2aSeZ72D9NnHStL3RPbGC4jYvpoi8dnH0/TT
 PKJgj2jb7u3h+1cxKBi1RM0JbxUYD5+4N8zfJISa9uqkHZ3XY3VyuuT+2RHO6CQp
 d1BdX0V4oDA=
 =zjov
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Make clock_adjtime() syscall timex validation slightly more permissive
  for auxiliary clocks, to not reject syscalls based on the status field
  that do not try to modify the status field.

  This makes the ABI behavior in clock_adjtime() consistent with
  CLOCK_REALTIME"

* tag 'timers-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix timex status validation for auxiliary clocks
2026-03-07 17:09:15 -08:00
Thomas Gleixner
53007d526e clocksource: Update clocksource::freq_khz on registration
Borislav reported a division by zero in the timekeeping code and random
hangs with the new coupled clocksource/clockevent functionality.

It turned out that the TSC clocksource is not always updating the
freq_khz field of the clocksource on registration. The coupled mode
conversion calculation requires the frequency and as it's not
initialized the resulting factor is zero or a random value. As a
consequence this causes a division by zero or random boot hangs.

Instead of chasing down all clocksources which fail to update that
member, fill it in at registration time where the caller has to supply
the frequency anyway. Except for special clocksources like jiffies which
never can have coupled mode.

To make this more robust put a check into the registration function to
validate that the caller supplied a frequency if the coupled mode
feature bit is set. If not, emit a warning and clear the feature bit.

Fixes: cd38bdb8e6 ("timekeeping: Provide infrastructure for coupled clockevents")
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/87cy1jsa4m.ffs@tglx
Closes: https://lore.kernel.org/20260303213027.GA2168957@ax162
2026-03-05 17:41:06 +01:00
Thomas Gleixner
9d5e25b361 timekeeping: Initialize the coupled clocksource conversion completely
Nathan reported a boot failure after the coupled clocksource/event support
was enabled for the TSC deadline timer. It turns out that on the affected
test systems the TSC frequency is not refined against HPET, so it is
registered with the same frequency as the TSC-early clocksource.

As a consequence the update function which checks for a change of the
shift/mult pair of the clocksource fails to compute the conversion
limit, which is zero initialized. This check is there to avoid pointless
computations on every timekeeping update cycle (tick).

So the actual clockevent conversion function limits the delta expiry to
zero, which means the timer is always programmed to expire in the
past. This obviously results in a spectacular timer interrupt storm,
which goes unnoticed because the per CPU interrupts on x86 are not
exposed to the runaway detection mechanism and the NMI watchdog is not
yet functional. So the machine simply stops booting.

That did not show up in testing. All test machines refine the TSC frequency
so TSC has a differrent shift/mult pair than TSC-early and the conversion
limit is properly initialized.

Cure that by setting the conversion limit right at the point where the new
clocksource is installed.

Fixes: cd38bdb8e6 ("timekeeping: Provide infrastructure for coupled clockevents")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/87bjh4zies.ffs@tglx
Closes: https://lore.kernel.org/20260303012905.GA978396@ax162
2026-03-05 17:40:46 +01:00
Miroslav Lichvar
e48a869957 timekeeping: Fix timex status validation for auxiliary clocks
The timekeeping_validate_timex() function validates the timex status
of an auxiliary system clock even when the status is not to be changed,
which causes unexpected errors for applications that make read-only
clock_adjtime() calls, or set some other timex fields, but without
clearing the status field.

Do the AUX-specific status validation only when the modes field contains
ADJ_STATUS, i.e. the application is actually trying to change the
status. This makes the AUX-specific clock_adjtime() behavior consistent
with CLOCK_REALTIME.

Fixes: 4eca49d0b6 ("timekeeping: Prepare do_adtimex() for auxiliary clocks")
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260225085231.276751-1-mlichvar@redhat.com
2026-03-04 20:05:37 +01:00
Linus Torvalds
ecc64d2dc9 Summary
* Fix error when reporting jiffies converted values back to user space
 
   Return the converted value instead of "Invalid argument" error.
 
 * Testing
 
   Spent around a week in linux-next -enough for this small fix-
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmmoNL0ACgkQupfNUreW
 QU/dMQv/YL6Lpv76iFGOL8gP1tU3oebKWKGJYDcQtqPZfLnlFmWu+XliHCldqJ7J
 Ur9u2KleA0jM/Szq/v4FOyq2L7992dpKSkzM6ZsMyEfrz0e21WCZus40pcpE0L2j
 kMNo4Vf3bAP+18KNsxh6zUc9WeYJ3suySmme+je2WNkab/io9XNUxYv7LhnKWze7
 3iCXYZj/HtF3G9/xk0v3Ihlw6rNRVxNPfC3DpGXlvtnTSchlj9S9IK4pczcAmdw8
 CNTEGCi+yzZYCcyI310IoeH0d3L5k39daJqtSC0BlVp607kr57nt5Hygf08WdnG8
 2U+lvoWKp7odyu9/D1nqcpoQVY+9IzRkW+RM1bnYOmNYAiFrhiKTNCpZOhhqWn6P
 3f3zvRq3Wt9zuA8upGjT6adxTrPMkpiqQD4POExgzSvoqkZ31Lw1/A6INtdWng82
 +rFdL4PqdElrghVl07zydX5UWz/+fZKsQMz/j1cKROhKRQsaLXWIYHbo6OSlp6AC
 JLONWmgW
 =F0SK
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-7.00-fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl fix from Joel Granados:

 - Fix error when reporting jiffies converted values back to user space

   Return the converted value instead of "Invalid argument" error

* tag 'sysctl-7.00-fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  time/jiffies: Fix sysctl file error on configurations where USER_HZ < HZ
2026-03-04 08:21:11 -08:00
Gerd Rausch
6932256d3a time/jiffies: Fix sysctl file error on configurations where USER_HZ < HZ
Commit 2dc164a48e ("sysctl: Create converter functions with two new
macros") incorrectly returns error to user space when jiffies sysctl
converter is used. The old overflow check got replaced with an
unconditional one:
     +    if (USER_HZ < HZ)
     +        return -EINVAL;
which will always be true on configurations with "USER_HZ < HZ".

Remove the check; it is no longer needed as clock_t_to_jiffies() returns
ULONG_MAX for the overflow case and proc_int_u2k_conv_uop() checks for
"> INT_MAX" after conversion

Fixes: 2dc164a48e ("sysctl: Create converter functions with two new macros")
Reported-by: Colm Harrington <colm.harrington@oracle.com>
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2026-03-04 13:48:31 +01:00
Linus Torvalds
0031c06807 cgroup: Fixes for v7.0-rc2
- Fix circular locking dependency in cpuset partition code by deferring
   housekeeping_update() calls to a workqueue instead of calling them
   directly under cpus_read_lock.
 
 - Fix null-ptr-deref in rebuild_sched_domains_cpuslocked() when
   generate_sched_domains() returns NULL due to kmalloc failure.
 
 - Fix incorrect cpuset behavior for effective_xcpus in
   partition_xcpus_del() and cpuset_update_tasks_cpumask() in
   update_cpumasks_hier().
 
 - Fix race between task migration and cgroup iteration.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaadVVQ4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGef0AQDLuJE3vzc2VeCBc4rGcj7ZSRmc3tc28lOqHRzi
 XEx1iwD+PeFcb9wt1CTqA5hAiIY1LGR/5iO1kTH7paRd16DBRAc=
 =S8WE
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - Fix circular locking dependency in cpuset partition code by
   deferring housekeeping_update() calls to a workqueue instead
   of calling them directly under cpus_read_lock

 - Fix null-ptr-deref in rebuild_sched_domains_cpuslocked() when
   generate_sched_domains() returns NULL due to kmalloc failure

 - Fix incorrect cpuset behavior for effective_xcpus in
   partition_xcpus_del() and cpuset_update_tasks_cpumask()
   in update_cpumasks_hier()

 - Fix race between task migration and cgroup iteration

* tag 'cgroup-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: fix null-ptr-deref in rebuild_sched_domains_cpuslocked
  cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock
  cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue
  cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together
  kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command
  cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed
  cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
  cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier()
  cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del()
  cgroup: fix race between task migration and iteration
2026-03-03 14:25:18 -08:00
Linus Torvalds
f6542af922 Improve the inlining of jiffies_to_msecs() and jiffies_to_usecs(),
for the common HZ=100, 250 or 1000 cases, only inlining them
 for odd HZ values like HZ=300.
 
 The inlining overhead showed up in performance tests of the TCP code.
 
 (Marked as an RFC pull request, as it's not a regression.)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmmkA94RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ggmhAAqvNgH1vG+Ad1/r2aZ0iQNOJ3lLL9DKVU
 V0Q2iXiRPKvzjXaHB+C2B2GkGgBBxyvdl1wGPpknI/OkoC8aBzK8jSchmfLFfWdx
 C1MC0xoZRtaWDQaMYYQ83ZGQqdKHct4ZrwfsPu+g95NGvNg3m/W8p3cZjruknvH3
 bKZmbN2wQiSx6+PB7/FkEjV7eAlaskYYdKLNqZzd/62oQ6is4ppAjEAp+X/FidJv
 0lUVr7ILx6lZHjWnavUjex2iSvvZ8XqiaYVKlj5EE9cCaPhXDTpkaO8l/ELcjpHL
 ZBBV6rpbDo7+Mo3UnUiz+CaYi6XAAOwgj6wZBiBXhVQUAW0PIlMaefDjccWFlDw1
 oAcRCg5i3KF8cvrfQYUs4519W52eWOThqQ1fs5ql6P6ycHZ0KTsmaRAbNih811RN
 A2VMTkiyX25bXOUQ9e5Y7cYOvDMGGCWiocT6C7Is9gZQMfkj92NDCKURLwRYzzBr
 2XDjg46YekGXwy8OamMwXMRcdyUC5fAIaWOaq7IL1K3cgbS2qbZx55Y85+AOfngF
 DvFWDIfjslYMZqzQQ+4+MaJitRQ2V6CqdOP3kQbJ3Z6DmIcwi7DkOzqH1hEglb7O
 IjXCxjxosZv4iofpxr0FKJPx7KBVSzzxezjMLzeijM+zDdbF4GWFpqRD1ONh/4vm
 /1tfaC6TU/E=
 =duTI
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Improve the inlining of jiffies_to_msecs() and jiffies_to_usecs(), for
  the common HZ=100, 250 or 1000 cases. Only use a function call for odd
  HZ values like HZ=300 that generate more code.

  The function call overhead showed up in performance tests of the TCP
  code"

* tag 'timers-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time/jiffies: Inline jiffies_to_msecs() and jiffies_to_usecs()
2026-03-01 12:15:58 -08:00
Thomas Gleixner
343f2f4dc5 hrtimer: Try to modify timers in place
When modifying the expiry of a armed timer it is first dequeued, then the
expiry value is updated and then it is queued again.

This can be avoided when the new expiry value is within the range of the
previous and the next timer as that does not change the position in the RB
tree.

The linked timerqueue allows to peak ahead to the neighbours and check
whether the new expiry time is within the range of the previous and next
timer. If so just modify the timer in place and spare the enqueue and
requeue effort, which might end up rotating the RB tree twice for nothing.

This speeds up the handling of frequently rearmed hrtimers, like the hrtick
scheduler timer significantly.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.873359816@kernel.org
2026-02-27 16:40:17 +01:00
Thomas Gleixner
b7418e6e9b hrtimer: Use linked timerqueue
To prepare for optimizing the rearming of enqueued timers, switch to the
linked timerqueue. That allows to check whether the new expiry time changes
the position of the timer in the RB tree or not, by checking the new expiry
time against the previous and the next timers expiry.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.806643179@kernel.org
2026-02-27 16:40:16 +01:00
Thomas Gleixner
3601a1d850 hrtimer: Optimize for_each_active_base()
Give the compiler some help to emit way better code.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.599804894@kernel.org
2026-02-27 16:40:15 +01:00
Thomas Gleixner
a64ad57e41 hrtimer: Simplify run_hrtimer_queues()
Replace the open coded container_of() orgy with a trivial
clock_base_next_timer() helper.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.532927977@kernel.org
2026-02-27 16:40:15 +01:00
Thomas Gleixner
2bd1cc24fa hrtimer: Rework next event evaluation
The per clock base cached expiry time allows to do a more efficient
evaluation of the next expiry on a CPU.

Separate the reprogramming evaluation from the NOHZ idle evaluation which
needs to exclude the NOHZ timer to keep the reprogramming path lean and
clean.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.468186893@kernel.org
2026-02-27 16:40:15 +01:00
Thomas Gleixner
eddffab828 hrtimer: Keep track of first expiring timer per clock base
Evaluating the next expiry time of all clock bases is cache line expensive
as the expiry time of the first expiring timer is not cached in the base
and requires to access the timer itself, which is definitely in a different
cache line.

It's way more efficient to keep track of the expiry time on enqueue and
dequeue operations as the relevant data is already in the cache at that
point.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.404839710@kernel.org
2026-02-27 16:40:14 +01:00
Thomas Gleixner
b95c4442b0 hrtimer: Avoid re-evaluation when nothing changed
Most times there is no change between hrtimer_interrupt() deferring the rearm
and the invocation of hrtimer_rearm_deferred(). In those cases it's a pointless
exercise to re-evaluate the next expiring timer.

Cache the required data and use it if nothing changed.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.338569372@kernel.org
2026-02-27 16:40:14 +01:00
Peter Zijlstra
15dd3a9488 hrtimer: Push reprogramming timers into the interrupt return path
Currently hrtimer_interrupt() runs expired timers, which can re-arm
themselves, after which it computes the next expiration time and
re-programs the hardware.

However, things like HRTICK, a highres timer driving preemption, cannot
re-arm itself at the point of running, since the next task has not been
determined yet. The schedule() in the interrupt return path will switch to
the next task, which then causes a new hrtimer to be programmed.

This then results in reprogramming the hardware at least twice, once after
running the timers, and once upon selecting the new task.

Notably, *both* events happen in the interrupt.

By pushing the hrtimer reprogram all the way into the interrupt return
path, it runs after schedule() picks the new task and the double reprogram
can be avoided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.273488269@kernel.org
2026-02-27 16:40:14 +01:00
Peter Zijlstra
a43b4856bc hrtimer: Prepare stubs for deferred rearming
The hrtimer interrupt expires timers and at the end of the interrupt it
rearms the clockevent device for the next expiring timer.

That's obviously correct, but in the case that a expired timer set
NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is
enabled then schedule() will modify the hrtick timer, which causes another
reprogramming of the hardware.

That can be avoided by deferring the rearming to the return from interrupt
path and if the return results in a immediate schedule() invocation then it
can be deferred until the end of schedule().

To make this correct the affected code parts need to be made aware of this.

Provide empty stubs for the deferred rearming mechanism, so that the
relevant code changes for entry, softirq and scheduler can be split up into
separate changes independent of the actual enablement in the hrtimer code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163431.000891171@kernel.org
2026-02-27 16:40:13 +01:00
Thomas Gleixner
9e07a9c980 hrtimer: Rename hrtimer_cpu_base::in_hrtirq to deferred_rearm
The upcoming deferred rearming scheme has the same effect as the deferred
rearming when the hrtimer interrupt is executing. So it can reuse the
in_hrtirq flag, but when it gets deferred beyond the hrtimer interrupt
path, then the name does not make sense anymore.

Rename it to deferred_rearm upfront to keep the actual functional change
separate from the mechanical rename churn.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.935623347@kernel.org
2026-02-27 16:40:12 +01:00
Peter Zijlstra
2889243848 hrtimer: Re-arrange hrtimer_interrupt()
Rework hrtimer_interrupt() such that reprogramming is split out into an
independent function at the end of the interrupt.

This prepares for reprogramming getting delayed beyond the end of
hrtimer_interrupt().

Notably, this changes the hang handling to always wait 100ms instead of
trying to keep it proportional to the actual delay. This simplifies the
state, also this really shouldn't be happening.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.870639266@kernel.org
2026-02-27 16:40:12 +01:00
Thomas Gleixner
85a690d1c1 hrtimer: Separate remove/enqueue handling for local timers
As the base switch can be avoided completely when the base stays the same
the remove/enqueue handling can be more streamlined.

Split it out into a separate function which handles both in one go which is
way more efficient and makes the code simpler to follow.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.737600486@kernel.org
2026-02-27 16:40:11 +01:00