linux/fs
Peter Xu fe4cdc2c4e mm/userfaultfd: fix release hang over concurrent GUP
This patch should fix a possible userfaultfd release() hang during
concurrent GUP.

This problem was initially reported by Dimitris Siakavaras in July 2023
[1] in a firecracker use case.  Firecracker has a separate process
handling page faults remotely, and when the process releases the
userfaultfd it can race with a concurrent GUP from KVM trying to fault in
a guest page during the secondary MMU page fault process.

A similar problem was reported recently again by Jinjiang Tu in March 2025
[2], even though the race happened this time with a mlockall() operation,
which does GUP in a similar fashion.

In 2017, commit 656710a60e ("userfaultfd: non-cooperative: closing the
uffd without triggering SIGBUS") was trying to fix this issue.  AFAIU,
that fixes well the fault paths but may not work yet for GUP.  In GUP, the
issue is NOPAGE will be almost treated the same as "page fault resolved"
in faultin_page(), then the GUP will follow page again, seeing page
missing, and it'll keep going into a live lock situation as reported.

This change makes core mm return RETRY instead of NOPAGE for both the GUP
and fault paths, proactively releasing the mmap read lock.  This should
guarantee the other release thread make progress on taking the write lock
and avoid the live lock even for GUP.

When at it, rearrange the comments to make sure it's uptodate.

[1] https://lore.kernel.org/r/79375b71-db2e-3e66-346b-254c90d915e2@cslab.ece.ntua.gr
[2] https://lore.kernel.org/r/20250307072133.3522652-1-tujinjiang@huawei.com

Link: https://lkml.kernel.org/r/20250312145131.1143062-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Dimitris Siakavaras <jimsiak@cslab.ece.ntua.gr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:14:42 -07:00
..
9p Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
adfs Merge patch series "adfs, affs, befs, hfs, hfsplus: convert to new mount api" 2024-10-08 14:41:53 +02:00
affs vfs-6.15-rc1.async.dir 2025-03-24 10:47:14 -07:00
afs vfs-6.15-rc1.afs 2025-03-24 13:15:16 -07:00
autofs vfs-6.15-rc1.async.dir 2025-03-24 10:47:14 -07:00
bcachefs - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
befs
bfs
btrfs - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
cachefiles vfs-6.15-rc1.async.dir 2025-03-24 10:47:14 -07:00
ceph vfs-6.15-rc1.ceph 2025-03-24 12:17:13 -07:00
coda Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
configfs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
cramfs
crypto for-6.15/block-20250322 2025-03-26 18:08:55 -07:00
debugfs debugfs: Fix the missing initializations in __debugfs_file_get() 2025-01-30 08:22:31 +01:00
devpts vfs: Convert devpts to use the new mount API 2025-02-06 11:51:43 +01:00
dlm dlm: make tcp still work in multi-link env 2025-03-18 10:49:22 -05:00
ecryptfs vfs-6.15-rc1.async.dir 2025-03-24 10:47:14 -07:00
efivarfs EFI updates for v6.15 2025-03-29 11:36:19 -07:00
efs efs: fix the efs new mount api implementation 2024-10-15 15:58:36 +02:00
erofs erofs: enable 48-bit layout support 2025-03-17 14:02:16 +08:00
exfat exfat: call bh_read in get_block only when necessary 2025-03-29 22:03:11 +09:00
exportfs exportfs: remove locking around ->get_parent() call. 2025-03-14 11:39:59 +01:00
ext2 \n 2025-03-31 17:53:44 -07:00
ext4 - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
f2fs f2fs-for-6.15-rc1 2025-03-27 12:55:54 -07:00
fat Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
freevxfs freevxfs: Replace one-element array with flexible array member 2024-11-06 10:42:06 +01:00
fuse - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
gfs2 gfs2 changes 2025-03-27 12:09:25 -07:00
hfs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
hfsplus Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
hostfs hostfs: store inode in dentry after mkdir if possible. 2025-02-27 20:00:17 +01:00
hpfs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
hugetlbfs - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
iomap - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
isofs isofs: fix KMSAN uninit-value bug in do_isofs_readdir() 2025-02-12 14:25:19 +01:00
jbd2 jbd2: add a missing data flush during file and fs synchronization 2025-03-21 00:59:28 -04:00
jffs2 Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
jfs Various bug fixes and cleanups for JFS 2025-03-27 13:17:39 -07:00
kernfs Driver core updates for 6.15-rc1 2025-04-01 11:02:03 -07:00
lockd sysctl: Fixes nsm_local_state bounds 2025-03-10 09:11:13 -04:00
minix Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
netfs netfs: Fix netfs_unbuffered_read() to return ssize_t rather than int 2025-03-19 10:04:23 +01:00
nfs NFSD 6.15 Release Notes 2025-03-31 17:28:17 -07:00
nfs_common fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning 2025-03-10 09:11:04 -04:00
nfsd NFSD 6.15 Release Notes 2025-03-31 17:28:17 -07:00
nilfs2 Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
nls move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
notify vfs-6.15-rc1.mount 2025-03-24 09:34:10 -07:00
ntfs3 Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
ocfs2 - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
omfs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
openpromfs
orangefs orangefs: one fixup 2025-03-27 13:14:39 -07:00
overlayfs vfs-6.15-rc1.async.dir 2025-03-24 10:47:14 -07:00
proc - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
pstore pstore update for v6.15-rc1 2025-03-24 15:43:28 -07:00
qnx4
qnx6 fs/qnx6: Fix building with GCC 15 2024-12-03 10:40:36 +01:00
quota treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
ramfs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
romfs
smb 10 ksmbd SMB3 server fixes 2025-03-31 17:42:26 -07:00
squashfs squashfs: fix invalid pointer dereference in squashfs_cache_delete 2025-03-16 17:40:24 -07:00
sysfs kernfs: Use RCU to access kernfs_node::name. 2025-02-15 17:46:32 +01:00
tests
tracefs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
ubifs This update includes the following changes: 2025-03-29 10:01:55 -07:00
udf - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
ufs Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
unicode unicode: kunit: change tests filename and path 2025-02-12 14:00:11 -08:00
vboxsf vfs-6.15-rc1.async.dir 2025-03-24 10:47:14 -07:00
verity Revert "fsverity: relax build time dependency on CRYPTO_SHA256" 2025-02-17 11:34:15 -08:00
xfs - The 7 patch series "powerpc/crash: use generic crashkernel 2025-04-01 10:06:52 -07:00
zonefs iomap: pass private data to iomap_page_mkwrite 2025-02-06 13:02:15 +01:00
aio.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
anon_inodes.c add a string-to-qstr constructor 2025-01-27 19:25:45 -05:00
attr.c fs: handle delegated timestamps in setattr_copy_mgtime 2024-10-10 10:20:51 +02:00
backing-file.c tree-wide: s/revert_creds_light()/revert_creds()/g 2024-12-02 11:25:09 +01:00
bad_inode.c Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
binfmt_elf_fdpic.c binfmt_elf_fdpic: fix variable set but not used warning 2025-03-07 20:07:33 -08:00
binfmt_elf.c binfmt_elf: Use note name macros 2025-02-10 16:47:07 -08:00
binfmt_flat.c binfmt_flat: Fix integer overflow bug on 32 bit systems 2025-01-10 08:49:05 -08:00
binfmt_misc.c execve updates for v6.14-rc1 2025-01-20 13:27:58 -08:00
binfmt_script.c
bpf_fs_kfuncs.c bpf: fs/xattr: Add BPF kfuncs to set and remove xattrs 2025-02-13 19:35:32 -08:00
buffer.c - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
char_dev.c fs: Reorganize kerneldoc parameter names 2024-10-22 11:16:57 +02:00
compat_binfmt_elf.c binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4 2024-10-17 18:38:49 +01:00
coredump.c Summary 2025-03-26 21:02:05 -07:00
d_path.c
dax.c - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
dcache.c Summary 2025-03-26 21:02:05 -07:00
direct-io.c
drop_caches.c fs: drop_caches: move sysctl to fs/drop_caches.c 2025-02-07 16:53:04 +01:00
eventfd.c make use of anon_inode_getfile_fmode() 2025-02-21 10:25:31 +01:00
eventpoll.c Networking changes for 6.15. 2025-03-26 21:48:21 -07:00
exec.c binfmt: Remove loader from linux_binprm struct 2025-02-24 11:30:16 -08:00
fcntl.c fs: get rid of __FMODE_NONOTIFY kludge 2024-12-09 11:34:29 +01:00
fhandle.c exportfs: add permission method 2024-12-17 09:16:11 +01:00
file_table.c vfs-6.15-rc1.file 2025-03-24 13:19:17 -07:00
file.c vfs-6.15-rc1.file 2025-03-24 13:19:17 -07:00
filesystems.c
fs_context.c fs: fc_log replace magic number 7 with ARRAY_SIZE() 2024-12-22 11:29:52 +01:00
fs_parser.c bcachefs: add support for true/false & yes/no in bool-type options 2024-12-21 01:36:17 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c fs: fs-writeback: move sysctl to fs/fs-writeback.c 2025-02-07 16:53:04 +01:00
fsopen.c fs: support O_PATH fds with FSCONFIG_SET_FD 2025-02-12 10:02:10 +01:00
init.c VFS: Change vfs_mkdir() to return the dentry. 2025-03-05 11:52:50 +01:00
inode.c fs: call inode_sb_list_add() outside of inode hash lock 2025-03-20 13:06:51 +01:00
internal.h vfs-6.15-rc1.file 2025-03-24 13:19:17 -07:00
ioctl.c ioctl: Fix return type of several functions from long to int 2025-02-21 10:25:32 +01:00
Kconfig - The 6 patch series "Enable strict percpu address space checks" from 2025-04-01 09:29:18 -07:00
Kconfig.binfmt
kernel_read_file.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
libfs.c vfs-6.15-rc1.pidfs 2025-03-24 10:16:37 -07:00
locks.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
Makefile sysv: Remove the filesystem 2025-02-21 10:32:47 +01:00
mbcache.c
mnt_idmapping.c statmount: allow to retrieve idmappings 2025-02-12 12:12:27 +01:00
mount.h vfs-6.15-rc1.mount.namespace 2025-03-24 11:41:41 -07:00
mpage.c fs/buffer fs/mpage: remove large folio restriction 2025-02-24 11:44:44 +01:00
namei.c vfs-6.15-rc1.file 2025-03-24 13:19:17 -07:00
namespace.c vfs-6.15-rc1.mount.namespace 2025-03-24 11:41:41 -07:00
nsfs.c vfs-6.15-rc1.nsfs 2025-03-24 11:38:12 -07:00
open.c vfs-6.15-rc1.file 2025-03-24 13:19:17 -07:00
pidfs.c vfs-6.15-rc1.pidfs 2025-03-24 10:16:37 -07:00
pipe.c Merge patch series "pipe: Trivial cleanups" 2025-03-10 08:55:13 +01:00
pnode.c vfs-6.15-rc1.mount.namespace 2025-03-24 11:41:41 -07:00
pnode.h mount: handle mount propagation for detached mount trees 2025-03-04 09:29:54 +01:00
posix_acl.c acl: Annotate struct posix_acl with __counted_by() 2024-10-22 11:16:59 +02:00
proc_namespace.c
read_write.c fs: don't needlessly acquire f_lock 2025-02-21 10:25:32 +01:00
readdir.c introduce "fd_pos" class, convert fdget_pos() users to it. 2024-11-03 01:28:06 -05:00
remap_range.c convert vfs_dedupe_file_range(). 2024-11-03 01:28:07 -05:00
select.c select: Fix unbalanced user_access_end() 2025-01-13 16:24:16 +01:00
seq_file.c fs: Reorganize kerneldoc parameter names 2024-10-22 11:16:57 +02:00
signalfd.c make use of anon_inode_getfile_fmode() 2025-02-21 10:25:31 +01:00
splice.c fs/splice: Use pipe_buf() helper to retrieve pipe buffer 2025-03-10 08:55:05 +01:00
stack.c
stat.c fs/stat.c: avoid harmless garbage value problem in vfs_statx_path() 2025-02-07 10:27:24 +01:00
statfs.c fdget_raw() users: switch to CLASS(fd_raw) 2024-11-03 01:28:06 -05:00
super.c vfs-6.15-rc1.misc 2025-03-24 09:13:50 -07:00
sync.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
sysctls.c treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
timerfd.c A treewide hrtimer timer cleanup 2025-03-25 10:54:15 -07:00
userfaultfd.c mm/userfaultfd: fix release hang over concurrent GUP 2025-04-01 15:14:42 -07:00
utimes.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
xattr.c xattr: remove redundant check on variable err 2024-11-06 13:00:01 -05:00