Mutexes:
- Add killable flavor to guard definitions (Davidlohr Bueso)
- Remove the list_head from struct mutex (Matthew Wilcox)
- Rename mutex_init_lockep() (Davidlohr Bueso)
rwsems:
- Remove the list_head from struct rw_semaphore and
replace it with a single pointer (Matthew Wilcox)
- Fix logic error in rwsem_del_waiter() (Andrei Vagin)
Semaphores:
- Remove the list_head from struct semaphore (Matthew Wilcox)
Jump labels:
- Use ATOMIC_INIT() for initialization of .enabled (Thomas Weißschuh)
- Remove workaround for old compilers in initializations
(Thomas Weißschuh)
Lock context analysis changes and improvements:
- Add context analysis for rwsems (Peter Zijlstra)
- Fix rwlock and spinlock lock context annotations (Bart Van Assche)
- Fix rwlock support in <linux/spinlock_up.h> (Bart Van Assche)
- Add lock context annotations in the spinlock implementation
(Bart Van Assche)
- signal: Fix the lock_task_sighand() annotation (Bart Van Assche)
- ww-mutex: Fix the ww_acquire_ctx function annotations
(Bart Van Assche)
- Add lock context support in do_raw_{read,write}_trylock()
(Bart Van Assche)
- arm64, compiler-context-analysis: Permit alias analysis through
__READ_ONCE() with CONFIG_LTO=y (Marco Elver)
- Add __cond_releases() (Peter Zijlstra)
- Add context analysis for mutexes (Peter Zijlstra)
- Add context analysis for rtmutexes (Peter Zijlstra)
- Convert futexes to compiler context analysis (Peter Zijlstra)
Rust integration updates:
- Add atomic fetch_sub() implementation (Andreas Hindborg)
- Refactor various rust_helper_ methods for expansion (Boqun Feng)
- Add Atomic<*{mut,const} T> support (Boqun Feng)
- Add atomic operation helpers over raw pointers (Boqun Feng)
- Add performance-optimal Flag type for atomic booleans, to avoid
slow byte-sized RMWs on architectures that don't support them.
(FUJITA Tomonori)
- Misc cleanups and fixes (Andreas Hindborg, Boqun Feng,
FUJITA Tomonori)
LTO support updates:
- arm64: Optimize __READ_ONCE() with CONFIG_LTO=y (Marco Elver)
- compiler: Simplify generic RELOC_HIDE() (Marco Elver)
Miscellaneous fixes and cleanups by Peter Zijlstra, Randy Dunlap,
Thomas Weißschuh, Davidlohr Bueso and Mikhail Gavrilov.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmncnGERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ig7Q//a3UgHjUe96/zuJIv/X1lt5MU0GHP/m/n
Rf6c39P0VWV6iupJtZ6gPmtQBQDyqWsnfE9S6PFDW4P/Njn0CGEBhk5bcYiN7dc5
pN0hfM67rY1Ids2FJo5JVIxw2pNZpZHU4v3dJC84xH1cPwmccHxt3XW67iQnJCY9
6m7RJ3nUfmNC1qLGKtAFQp3N91hK+BYxqZQ1Wn6a0lRWfmYY7WDs8qrr5N6Ezn7W
53ZNXXbXUC09iOO/slOZmFD5tDrp5Z1nPYTeOdFnWYC5SoTvkfauTqmfZRN5sFad
8vRxXHuCsdBthNF+ljobBUhZx9QL4UJMGOJTFVp9dZSj13vI7UNlbfAtwMKM8lsR
L+v+GSsGdQWwrhzaiz53k6ZuUUDECltjwKFFUBy9RPFMtKkpKsgjW+X6I+SFeTQW
QAPzaA/fEK45bvSPUcjn09kKKC1EVIjHQ6NvByoVjPABz92PpJgOR0si2PW5F314
7sKzk4Ra5x8NGDSEiC9uSwB7mIv56/lUq5zuVoz3CB2rehqIpPdeieE8TaXvcmdK
8otsWYcUXDcj/d6en9XBzb0t3LpQ1TumcOGw3xUJhrSoB4DvmWgW788SbGIKklR9
KFQLU2USlm4u8JHEFXOAeaDhWME+eCqP5FCq3YTqxLksiA+oYx3Xui1R+5L4yjRs
bVbp+BIs3N4=
=aw95
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Mutexes:
- Add killable flavor to guard definitions (Davidlohr Bueso)
- Remove the list_head from struct mutex (Matthew Wilcox)
- Rename mutex_init_lockep() (Davidlohr Bueso)
rwsems:
- Remove the list_head from struct rw_semaphore and
replace it with a single pointer (Matthew Wilcox)
- Fix logic error in rwsem_del_waiter() (Andrei Vagin)
Semaphores:
- Remove the list_head from struct semaphore (Matthew Wilcox)
Jump labels:
- Use ATOMIC_INIT() for initialization of .enabled (Thomas Weißschuh)
- Remove workaround for old compilers in initializations
(Thomas Weißschuh)
Lock context analysis changes and improvements:
- Add context analysis for rwsems (Peter Zijlstra)
- Fix rwlock and spinlock lock context annotations (Bart Van Assche)
- Fix rwlock support in <linux/spinlock_up.h> (Bart Van Assche)
- Add lock context annotations in the spinlock implementation
(Bart Van Assche)
- signal: Fix the lock_task_sighand() annotation (Bart Van Assche)
- ww-mutex: Fix the ww_acquire_ctx function annotations
(Bart Van Assche)
- Add lock context support in do_raw_{read,write}_trylock()
(Bart Van Assche)
- arm64, compiler-context-analysis: Permit alias analysis through
__READ_ONCE() with CONFIG_LTO=y (Marco Elver)
- Add __cond_releases() (Peter Zijlstra)
- Add context analysis for mutexes (Peter Zijlstra)
- Add context analysis for rtmutexes (Peter Zijlstra)
- Convert futexes to compiler context analysis (Peter Zijlstra)
Rust integration updates:
- Add atomic fetch_sub() implementation (Andreas Hindborg)
- Refactor various rust_helper_ methods for expansion (Boqun Feng)
- Add Atomic<*{mut,const} T> support (Boqun Feng)
- Add atomic operation helpers over raw pointers (Boqun Feng)
- Add performance-optimal Flag type for atomic booleans, to avoid
slow byte-sized RMWs on architectures that don't support them.
(FUJITA Tomonori)
- Misc cleanups and fixes (Andreas Hindborg, Boqun Feng, FUJITA
Tomonori)
LTO support updates:
- arm64: Optimize __READ_ONCE() with CONFIG_LTO=y (Marco Elver)
- compiler: Simplify generic RELOC_HIDE() (Marco Elver)
Miscellaneous fixes and cleanups by Peter Zijlstra, Randy Dunlap,
Thomas Weißschuh, Davidlohr Bueso and Mikhail Gavrilov"
* tag 'locking-core-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
compiler: Simplify generic RELOC_HIDE()
locking: Add lock context annotations in the spinlock implementation
locking: Add lock context support in do_raw_{read,write}_trylock()
locking: Fix rwlock support in <linux/spinlock_up.h>
lockdep: Raise default stack trace limits when KASAN is enabled
cleanup: Optimize guards
jump_label: remove workaround for old compilers in initializations
jump_label: use ATOMIC_INIT() for initialization of .enabled
futex: Convert to compiler context analysis
locking/rwsem: Fix logic error in rwsem_del_waiter()
locking/rwsem: Add context analysis
locking/rtmutex: Add context analysis
locking/mutex: Add context analysis
compiler-context-analysys: Add __cond_releases()
locking/mutex: Remove the list_head from struct mutex
locking/semaphore: Remove the list_head from struct semaphore
locking/rwsem: Remove the list_head from struct rw_semaphore
rust: atomic: Update a safety comment in impl of `fetch_add()`
rust: sync: atomic: Update documentation for `fetch_add()`
rust: sync: atomic: Add fetch_sub()
...
Fuzzying/stressing futexes triggered:
WARNING: kernel/futex/core.c:825 at wait_for_owner_exiting+0x7a/0x80, CPU#11: futex_lock_pi_s/524
When futex_lock_pi_atomic() sees the owner is exiting, it returns -EBUSY
and stores a refcounted task pointer in 'exiting'.
After wait_for_owner_exiting() consumes that reference, the local pointer
is never reset to nil. Upon a retry, if futex_lock_pi_atomic() returns a
different error, the bogus pointer is passed to wait_for_owner_exiting().
CPU0 CPU1 CPU2
futex_lock_pi(uaddr)
// acquires the PI futex
exit()
futex_cleanup_begin()
futex_state = EXITING;
futex_lock_pi(uaddr)
futex_lock_pi_atomic()
attach_to_pi_owner()
// observes EXITING
*exiting = owner; // takes ref
return -EBUSY
wait_for_owner_exiting(-EBUSY, owner)
put_task_struct(); // drops ref
// exiting still points to owner
goto retry;
futex_lock_pi_atomic()
lock_pi_update_atomic()
cmpxchg(uaddr)
*uaddr ^= WAITERS // whatever
// value changed
return -EAGAIN;
wait_for_owner_exiting(-EAGAIN, exiting) // stale
WARN_ON_ONCE(exiting)
Fix this by resetting upon retry, essentially aligning it with requeue_pi.
Fixes: 3ef240eaff ("futex: Prevent exit livelock")
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260326001759.4129680-1-dave@stgolabs.net
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
futex_lock_pi() and __fixup_pi_state_owner() acquire the
futex_q::lock_ptr without holding a reference assuming the previously
obtained hash bucket and the assigned lock_ptr are still valid. This
isn't the case once the private hash can be resized and becomes invalid
after the reference drop.
Introduce futex_q_lockptr_lock() to lock the hash bucket recorded in
futex_q::lock_ptr. The lock pointer is read in a RCU section to ensure
that it does not go away if the hash bucket has been replaced and the
old pointer has been observed. After locking the pointer needs to be
compared to check if it changed. If so then the hash bucket has been
replaced and the user has been moved to the new one and lock_ptr has
been updated. The lock operation needs to be redone in this case.
The locked hash bucket is not returned.
A special case is an early return in futex_lock_pi() (due to signal or
timeout) and a successful futex_wait_requeue_pi(). In both cases a valid
futex_q::lock_ptr is expected (and its matching hash bucket) but since
the waiter has been removed from the hash this can no longer be
guaranteed. Therefore before the waiter is removed and a reference is
acquired which is later dropped by the waiter to avoid a resize.
Add futex_q_lockptr_lock() and use it.
Acquire an additional reference in requeue_pi_wake_futex() and
futex_unlock_pi() while the futex_q is removed, denote this extra
reference in futex_q::drop_hb_ref and let the waiter drop the reference
in this case.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-11-bigeasy@linutronix.de
Create explicit scopes for hb variables; almost pure re-indent.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-6-bigeasy@linutronix.de
Getting the hash bucket and queuing it are two distinct actions. In
light of wanting to add a put hash bucket function later, untangle
them.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-5-bigeasy@linutronix.de
futex_queue() -> __futex_queue() uses 'current' as the task to store in
the struct futex_q->task field. This is fine for synchronous usage of
the futex infrastructure, but it's not always correct when used by
io_uring where the task doing the initial futex_queue() might not be
available later on. This doesn't lead to any issues currently, as the
io_uring side doesn't support PI futexes, but it does leave a
potentially dangling pointer which is never a good idea.
Have futex_queue() take a task_struct argument, and have the regular
callers pass in 'current' for that. Meanwhile io_uring can just pass in
NULL, as the task should never be used off that path. In theory
req->tctx->task could be used here, but there's no point populating it
with a task field that will never be used anyway.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/22484a23-542c-4003-b721-400688a0d055@kernel.dk
A common pattern seen when wake_qs are used to defer a wakeup
until after a lock is released is something like:
preempt_disable();
raw_spin_unlock(lock);
wake_up_q(wake_q);
preempt_enable();
So create some raw_spin_unlock*_wake() helper functions to clean
this up.
Applies on top of the fix I submitted here:
https://lore.kernel.org/lkml/20241212222138.2400498-1-jstultz@google.com/
NOTE: I recognise the unlock()/unlock_irq()/unlock_irqrestore()
variants creates its own duplication, which we could use a macro
to generate the similar functions, but I often dislike how those
generation macros making finding the actual implementation
harder, so I left the three functions as is. If folks would
prefer otherwise, let me know and I'll switch it.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241217040803.243420-1-jstultz@google.com
In preparation to nest mutex::wait_lock under rq::lock we need
to remove wakeups from under it.
Do this by utilizing wake_qs to defer the wakeup until after the
lock is dropped.
[Heavily changed after 55f036ca7e ("locking: WW mutex cleanup") and
08295b3b5b ("locking: Implement an algorithm choice for Wound-Wait
mutexes")]
[jstultz: rebased to mainline, added extra wake_up_q & init
to avoid hangs, similar to Connor's rework of this patch]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Link: https://lore.kernel.org/r/20241009235352.1614323-2-jstultz@google.com
Jiri Slaby reported a futex state inconsistency resulting in -EINVAL during
a lock operation for a PI futex. It requires that the a lock process is
interrupted by a timeout or signal:
T1 Owns the futex in user space.
T2 Tries to acquire the futex in kernel (futex_lock_pi()). Allocates a
pi_state and attaches itself to it.
T2 Times out and removes its rt_waiter from the rt_mutex. Drops the
rtmutex lock and tries to acquire the hash bucket lock to remove
the futex_q. The lock is contended and T2 schedules out.
T1 Unlocks the futex (futex_unlock_pi()). Finds a futex_q but no
rt_waiter. Unlocks the futex (do_uncontended) and makes it available
to user space.
T3 Acquires the futex in user space.
T4 Tries to acquire the futex in kernel (futex_lock_pi()). Finds the
existing futex_q of T2 and tries to attach itself to the existing
pi_state. This (attach_to_pi_state()) fails with -EINVAL because uval
contains the TID of T3 but pi_state points to T1.
It's incorrect to unlock the futex and make it available for user space to
acquire as long as there is still an existing state attached to it in the
kernel.
T1 cannot hand over the futex to T2 because T2 already gave up and started
to clean up and is blocked on the hash bucket lock, so T2's futex_q with
the pi_state pointing to T1 is still queued.
T2 observes the futex_q, but ignores it as there is no waiter on the
corresponding rt_mutex and takes the uncontended path which allows the
subsequent caller of futex_lock_pi() (T4) to observe that stale state.
To prevent this the unlock path must dequeue all futex_q entries which
point to the same pi_state when there is no waiter on the rt mutex. This
requires obviously to make the dequeue conditional in the locking path to
prevent a double dequeue. With that it's guaranteed that user space cannot
observe an uncontended futex which has kernel state attached.
Fixes: fbeb558b0d ("futex/pi: Fix recursive rt_mutex waiter state")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20240118115451.0TkD_ZhB@linutronix.de
Closes: https://lore.kernel.org/all/4611bcf2-44d0-4c34-9b84-17406f881003@kernel.org
Some new assertions pointed out that the existing code has nested rt_mutex wait
state in the futex code.
Specifically, the futex_lock_pi() cancel case uses spin_lock() while there
still is a rt_waiter enqueued for this task, resulting in a state where there
are two waiters for the same task (and task_struct::pi_blocked_on gets
scrambled).
The reason to take hb->lock at this point is to avoid the wake_futex_pi()
EAGAIN case.
This happens when futex_top_waiter() and rt_mutex_top_waiter() state becomes
inconsistent. The current rules are such that this inconsistency will not be
observed.
Notably the case that needs to be avoided is where futex_lock_pi() and
futex_unlock_pi() interleave such that unlock will fail to observe a new
waiter.
*However* the case at hand is where a waiter is leaving, in this case the race
means a waiter that is going away is not observed -- which is harmless,
provided this race is explicitly handled.
This is a somewhat dangerous proposition because the converse race is not
observing a new waiter, which must absolutely not happen. But since the race is
valid this cannot be asserted.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20230915151943.GD6743@noisy.programming.kicks-ass.net
Have rt_mutex use the rt_mutex specific scheduler helpers to avoid
recursion vs rtlock on the PI state.
[[ peterz: adapted to new names ]]
Reported-by: Crystal Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-6-bigeasy@linutronix.de
Earlier the PREEMPT_RT patch had a PREEMPT_RT_FULL and PREEMPT_RT_BASE
Kconfig option. The latter was a subset of the functionality that was
enabled with PREEMPT_RT_FULL and was mainly useful for debugging.
During the merging efforts the two Kconfig options were abandoned in the
v5.4.3-rt1 release and since then there is only PREEMPT_RT which enables
the full features set (as PREEMPT_RT_FULL did in earlier releases).
Replace the PREEMPT_RT_FULL reference with PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/YnvWUvq1vpqCfCU7@linutronix.de
Move the PI futex implementation into it's own file.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@collabora.com>
Link: https://lore.kernel.org/r/20210923171111.300673-10-andrealmeid@collabora.com