Commit Graph

19 Commits

Author SHA1 Message Date
Yufan Chen
04fe9aeb4f io_uring/eventfd: reset deferred signal state
Recursive eventfd wakeups must defer io_uring eventfd signaling because
eventfd_signal_mask() rejects reentry from eventfd wakeup handlers. The
io_ev_fd ops bit tracks an outstanding deferred signal so that the same
rcu_head is not queued twice.

That bit is only set today. Once the first deferred callback runs, later
recursive notifications still see the bit set and skip queueing another
deferred signal. This can leave new completions without a matching
eventfd wake after the first recursive deferral.

Clear the pending bit before issuing the deferred signal. If the wakeup
path recurses while the callback runs, a new signal can be queued for
the next RCU grace period while the current callback keeps its reference
until it returns.

Signed-off-by: Yufan Chen <ericterminal@gmail.com>
Fixes: 60b6c075e8 ("io_uring/eventfd: move to more idiomatic RCU free usage")
Link: https://patch.msgid.link/20260503175710.37209-1-yufan.chen@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-03 23:21:40 -06:00
Jens Axboe
f1a424e21c io_uring: switch struct io_ring_ctx internal bitfields to flags
Bitfields cannot be set and checked atomically, and this makes it more
clear that these are indeed in shared storage and must be checked and
set in a sane fashion. This is in preparation for annotating a few of
the known racy, but harmless, flags checking.

No intended functional changes in this patch.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-16 15:32:59 -06:00
Jens Axboe
0e46cb553f Merge branch 'io_uring-7.0' into for-7.1/io_uring
Merge upstream io_uring fixes to avoid conflicts in later patches.

* io_uring-7.0:
  io_uring/kbuf: check if target buffer list is still legacy on recycle
  io_uring: fix physical SQE bounds check for SQE_MIXED 128-byte ops
  io_uring/eventfd: use ctx->rings_rcu for flags checking
  io_uring: ensure ctx->rings is stable for task work flags manipulation
  io_uring/bpf_filter: use bpf_prog_run_pin_on_cpu() to prevent migration
  io_uring/register: fix comment about task_no_new_privs
2026-03-14 08:57:15 -06:00
Jens Axboe
177c694321 io_uring/eventfd: use ctx->rings_rcu for flags checking
Similarly to what commit e78f7b70e837 did for local task work additions,
use ->rings_rcu under RCU rather than dereference ->rings directly. See
that commit for more details.

Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-11 14:35:19 -06:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Pavel Begunkov
f6da4fee69 io_uring/eventfd: open code io_eventfd_grab()
io_eventfd_grab() doesn't help wit understanding the path, it'll be
simpler to keep the helper open coded.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5cb53ce3876c2819db9e8055cf41dca4398521db.1745493845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-24 08:33:54 -06:00
Pavel Begunkov
da01f60f8a io_uring/eventfd: clean up rcu locking
Conditional locking is never welcome if there are better options. Move
rcu locking into io_eventfd_signal(), make it unconditional and use
guards. It also helps with sparse warnings.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/91a925e708ca8a5aa7fee61f96d29b24ea9adeaf.1745493845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-24 08:33:54 -06:00
Pavel Begunkov
62f666df76 io_uring/eventfd: dedup signalling helpers
Consolidate io_eventfd_flush_signal() and io_eventfd_signal(). Not much
of a difference for now, but it prepares it for following changes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5beecd4da65d8d2d83df499196f84b329387f6a2.1745493845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-24 08:33:54 -06:00
Jens Axboe
c9a40292a4 io_uring/eventfd: ensure io_eventfd_signal() defers another RCU period
io_eventfd_do_signal() is invoked from an RCU callback, but when
dropping the reference to the io_ev_fd, it calls io_eventfd_free()
directly if the refcount drops to zero. This isn't correct, as any
potential freeing of the io_ev_fd should be deferred another RCU grace
period.

Just call io_eventfd_put() rather than open-code the dec-and-test and
free, which will correctly defer it another RCU grace period.

Fixes: 21a091b970 ("io_uring: signal registered eventfd to process deferred task work")
Reported-by: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org
Tested-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Li Zetao<lizetao1@huawei.com>
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-09 07:16:45 -07:00
Jens Axboe
f4bb2f65bb io_uring/eventfd: move ctx->evfd_last_cq_tail into io_ev_fd
Everything else about the io_uring eventfd support is nicely kept
private to that code, except the cached_cq_tail tracking. With
everything else in place, move io_eventfd_flush_signal() to using
the ev_fd grab+release helpers, which then enables the direct use of
io_ev_fd for this tracking too.

Link: https://lore.kernel.org/r/20240921080307.185186-7-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29 13:43:26 -06:00
Jens Axboe
83a4f865e2 io_uring/eventfd: abstract out ev_fd grab + release helpers
In preparation for needing the ev_fd grabbing (and releasing) from
another path, abstract out two helpers for that.

Link: https://lore.kernel.org/r/20240921080307.185186-6-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29 13:43:26 -06:00
Jens Axboe
3ca5a35604 io_uring/eventfd: move trigger check into a helper
It's a bit hard to read what guards the triggering, move it into a
helper and add a comment explaining it too. This additionally moves
the ev_fd == NULL check in there as well.

Link: https://lore.kernel.org/r/20240921080307.185186-5-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29 13:43:26 -06:00
Jens Axboe
60c5f15800 io_uring/eventfd: move actual signaling part into separate helper
In preparation for using this from multiple spots, move the signaling
into a helper.

Link: https://lore.kernel.org/r/20240921080307.185186-4-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29 13:43:26 -06:00
Jens Axboe
3c90b80df5 io_uring/eventfd: check for the need to async notifier earlier
It's not necessary to do this post grabbing a reference. With that, we
can drop the out goto path as well.

Link: https://lore.kernel.org/r/20240921080307.185186-3-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29 13:43:26 -06:00
Jens Axboe
165126dc5e io_uring/eventfd: abstract out ev_fd put helper
We call this in two spot, have a helper for it. In preparation for
extending this part.

Link: https://lore.kernel.org/r/20240921080307.185186-2-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29 13:43:26 -06:00
Jens Axboe
0e0bcf07ec io_uring/eventfd: move refs to refcount_t
atomic_t for the struct io_ev_fd references and there are no issues with
it. While the ref getting and putting for the eventfd code is somewhat
performance critical for cases where eventfd signaling is used (news
flash, you should not...), it probably doesn't warrant using an atomic_t
for this. Let's just move to it to refcount_t to get the added
protection of over/underflows.

Link: https://lore.kernel.org/lkml/202409082039.hnsaIJ3X-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202409082039.hnsaIJ3X-lkp@intel.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-08 16:43:57 -06:00
Anuj Gupta
6cf52b42c4 io_uring: add new line after variable declaration
Fixes checkpatch warning

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/20240902062134.136387-2-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-02 09:39:57 -06:00
Jens Axboe
200f3abd14 io_uring/eventfd: move eventfd handling to separate file
This is pretty nicely abstracted already, but let's move it to a separate
file rather than have it in the main io_uring file. With that, we can
also move the io_ev_fd struct and enum out of global scope.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-16 14:54:55 -06:00