Commit Graph

1786 Commits

Author SHA1 Message Date
Linus Torvalds
dd6c438c3e vfs-7.1-rc1.fixes
Please consider pulling these changes from the signed vfs-7.1-rc1.fixes tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaeqfYAAKCRCRxhvAZXjc
 oltyAP4y1SFYvmoy2mPM3jrSbYuT2rX0q4OZ/GDbuWOvir/bcgEAoPI9JHraS1+2
 xFj/7JJFWzuDXlFoaX6g+nv42pfatgU=
 =BnjA
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.1-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - eventpoll: fix ep_remove() UAF and follow-up cleanup

 - fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference
   error

 - writeback: Fix use after free in inode_switch_wbs_work_fn()

 - fuse: reject oversized dirents in page cache

 - fs: aio: reject partial mremap to avoid Null-pointer-dereference
   error

 - nstree: fix func. parameter kernel-doc warnings

 - fs: Handle multiply claimed blocks more gracefully with mmb

* tag 'vfs-7.1-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  eventpoll: drop vestigial epi->dying flag
  eventpoll: drop dead bool return from ep_remove_epi()
  eventpoll: refresh eventpoll_release() fast-path comment
  eventpoll: move f_lock acquisition into ep_remove_file()
  eventpoll: fix ep_remove struct eventpoll / struct file UAF
  eventpoll: move epi_fget() up
  eventpoll: rename ep_remove_safe() back to ep_remove()
  eventpoll: drop vestigial __ prefix from ep_remove_{file,epi}()
  eventpoll: kill __ep_remove()
  eventpoll: split __ep_remove()
  eventpoll: use hlist_is_singular_node() in __ep_remove()
  fs: Handle multiply claimed blocks more gracefully with mmb
  nstree: fix func. parameter kernel-doc warnings
  fs: aio: reject partial mremap to avoid Null-pointer-dereference error
  fuse: reject oversized dirents in page cache
  writeback: Fix use after free in inode_switch_wbs_work_fn()
  fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference error
2026-04-23 17:08:04 -07:00
Samuel Page
51a8de6c50
fuse: reject oversized dirents in page cache
fuse_add_dirent_to_cache() computes a serialized dirent size from the
server-controlled namelen field and copies the dirent into a single
page-cache page. The existing logic only checks whether the dirent fits
in the remaining space of the current page and advances to a fresh page
if not. It never checks whether the dirent itself exceeds PAGE_SIZE.

As a result, a malicious FUSE server can return a dirent with
namelen=4095, producing a serialized record size of 4120 bytes. On 4 KiB
page systems this causes memcpy() to overflow the cache page by 24 bytes
into the following kernel page.

Reject dirents that cannot fit in a single page before copying them into
the readdir cache.

Fixes: 69e3455115 ("fuse: allow caching readdir")
Cc: stable@vger.kernel.org # v6.16+
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
Reported-by: Qi Tang <tpluszz77@gmail.com>
Reported-by: Zijun Hu <nightu@northwestern.edu>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260420090139.662772-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-24 00:34:58 +02:00
Linus Torvalds
acf6c670e4 fuse update for 7.1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCadzViQAKCRDh3BK/laaZ
 PLTDAP0e90Z6Dm7GvZ8+kbLrK9uHvgf9Lwu0HX2SaShpLpnESgEAu5K2r5NpMVEe
 1A6odzHuwZl8xU2tjc36o1hT0CGAkQk=
 =N3Gq
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse update from Miklos Szeredi:

 - Fix possible hang in virtiofs when cleaning up a DAX inode (Sergio
   Lopez)

 - Fix a warning when using large folio as the source of SPLICE_F_MOVE
   on the fuse device (Bernd)

 - Fix uninitialized value found by KMSAN (Luis Henriques)

 - Fix synchronous INIT hang (Miklos)

 - Fix race between inode initialization and FUSE_NOTIFY_INVAL_INODE
   (Horst)

 - Allow fd to be closed after passing fuse device fd to
   fsconfig(..., "fd", ...) (Miklos)

 - Support FSCONFIG_SET_FD for "fd" option (Miklos)

 - Misc fixes and cleanups

* tag 'fuse-update-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (21 commits)
  fuse: support FSCONFIG_SET_FD for "fd" option
  fuse: clean up device cloning
  fuse: don't require /dev/fuse fd to be kept open during mount
  fuse: add refcount to fuse_dev
  fuse: create fuse_dev on /dev/fuse open instead of mount
  fuse: check connection state on notification
  fuse: fuse_dev_ioctl_clone() should wait for device file to be initialized
  fuse: fix inode initialization race
  fuse: abort on fatal signal during sync init
  fuse: fix uninit-value in fuse_dentry_revalidate()
  fuse: use offset_in_page() for page offset calculations
  fuse: use DIV_ROUND_UP() for page count calculations
  fuse: simplify logic in fuse_notify_store() and fuse_retrieve()
  fuse: validate outarg offset and size in notify store/retrieve
  fuse: Check for large folio with SPLICE_F_MOVE
  fuse: quiet down complaints in fuse_conn_limit_write
  fuse: drop unnecessary argument from fuse_lookup_init()
  fuse: fix premature writetrhough request for large folio
  fuse: refactor duplicate queue teardown operation
  virtiofs: add FUSE protocol validation
  ...
2026-04-15 19:04:21 -07:00
Linus Torvalds
3ba310f2a3 lsm/stable-7.1 PR 20260410
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmnZeioUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXN15w//e6Tgou0wgffKb+vZP+9xC53bQzmL
 Z0en5gdfifbqeIvj6LzSdUlIlsDUw65S42eGhqIYwk5oYNd2lwFMXd16fggakc/n
 TDdF1/7WTwcwMFKJxtew5tcE3pjwC96F6bqF9YmDdcNycjuQ5cbsJ/56hQsWZYxo
 g8y5y3fmWrkQ28gst2NJiR6XQx7acFc3S2FRKZc8mldkDjrmb9gN9WdwWJ6/nw+A
 xnNm6BdVjZ/gnrQliO4eL4J5T1ijvy3gddW3rXdytcIoH8js/pcZh3BpfTVWlzs+
 5KLPy9Tm39g3BwNx5cHrmzz1Ug6fCqIJyJzPK0M3q3vr+7w1kEWZnM5IUQwdQPsg
 dVmLBvhrvnKNBnMXd53seQJm33UkcKPpfWbaYQFpUC1ZocUiQDvE3eCH+q4SIwjo
 kwB6Ycc27O3PjXzMtwQv5a1oKeOHuNKr0YCOAoOs1bshcJ4lTzVwqe4StuLeoN8z
 7G+sfoT4JpM/izPTQjF+tNRcDECvX7j8b42BJcGx8Zf+JiP4HC1xBmLOg4egLc6m
 hxwaT5ipLhBL95eNCp16bqVrw5vVzZ+HbtDCkXJmU+grdsuTsp0bUGmm1DjpsFVk
 l/PyMDCvMNzi3uuNI9v9usQblv57XK6oNVBqcuoqejEDkEuX3MBSkov7DZr9nLLO
 JO1EvsWAjQdyXeQ=
 =giHS
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull LSM updates from Paul Moore:
 "We only have five patches in the LSM tree, but three of the five are
  for an important bugfix relating to overlayfs and the mmap() and
  mprotect() access controls for LSMs. Highlights below:

   - Fix problems with the mmap() and mprotect() LSM hooks on overlayfs

     As we are dealing with problems both in mmap() and mprotect() there
     are essentially two components to this fix, spread across three
     patches with all marked for stable.

     The simplest portion of the fix is the creation of a new LSM hook,
     security_mmap_backing_file(), that is used to enforce LSM mmap()
     access controls on backing files in the stacked/overlayfs case. The
     existing security_mmap_file() does not have visibility past the
     user file. You can see from the associated SELinux hook callback
     the code is fairly straightforward.

     The mprotect() fix is a bit more complicated as there is no way in
     the mprotect() code path to inspect both the user and backing
     files, and bolting on a second file reference to vm_area_struct
     wasn't really an option.

     The solution taken here adds a LSM security blob and associated
     hooks to the backing_file struct that LSMs can use to capture and
     store relevant information from the user file. While the necessary
     SELinux information is relatively small, a single u32, I expect
     other LSMs to require more than that, and a dedicated backing_file
     LSM blob provides a storage mechanism without negatively impacting
     other filesystems.

     I want to note that other LSMs beyond SELinux have been involved in
     the discussion of the fixes presented here and they are working on
     their own related changes using these new hooks, but due to other
     issues those patches will be coming at a later date.

   - Use kstrdup_const()/kfree_const() for securityfs symlink targets

   - Resolve a handful of kernel-doc warnings in cred.h"

* tag 'lsm-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  selinux: fix overlayfs mmap() and mprotect() access checks
  lsm: add backing_file LSM hooks
  fs: prepare for adding LSM blob to backing_file
  securityfs: use kstrdup_const() to manage symlink targets
  cred: fix kernel-doc warnings in cred.h
2026-04-13 15:17:28 -07:00
Linus Torvalds
0f00132132 vfs-7.1-rc1.integrity
Please consider pulling these changes from the signed vfs-7.1-rc1.integrity tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc
 ogmqAQCMD0V+pHwBAGwPQYBPc6Tf4LCxqNAmt/kypYsdVWkweQEAxbbeBPUigid5
 QeO0zCunikgIjMyJDXINpUPgsLYvyAo=
 =q2L6
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.1-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs integrity updates from Christian Brauner:
 "This adds support to generate and verify integrity information (aka
  T10 PI) in the file system, instead of the automatic below the covers
  support that is currently used.

  The implementation is based on refactoring the existing block layer PI
  code to be reusable for this use case, and then adding relatively
  small wrappers for the file system use case. These are then used in
  iomap to implement the semantics, and wired up in XFS with a small
  amount of glue code.

  Compared to the baseline this does not change performance for writes,
  but increases read performance up to 15% for 4k I/O, with the benefit
  decreasing with larger I/O sizes as even the baseline maxes out the
  device quickly on my older enterprise SSD"

* tag 'vfs-7.1-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  xfs: support T10 protection information
  iomap: support T10 protection information
  iomap: support ioends for buffered reads
  iomap: add a bioset pointer to iomap_read_folio_ops
  ntfs3: remove copy and pasted iomap code
  iomap: allow file systems to hook into buffered read bio submission
  iomap: only call into ->submit_read when there is a read_ctx
  iomap: pass the iomap_iter to ->submit_read
  iomap: refactor iomap_bio_read_folio_range
  block: pass a maxlen argument to bio_iov_iter_bounce
  block: add fs_bio_integrity helpers
  block: make max_integrity_io_size public
  block: prepare generation / verification helpers for fs usage
  block: add a bdev_has_integrity_csum helper
  block: factor out a bio_integrity_setup_default helper
  block: factor out a bio_integrity_action helper
2026-04-13 10:40:26 -07:00
Paul Moore
6af36aeb14 lsm: add backing_file LSM hooks
Stacked filesystems such as overlayfs do not currently provide the
necessary mechanisms for LSMs to properly enforce access controls on the
mmap() and mprotect() operations.  In order to resolve this gap, a LSM
security blob is being added to the backing_file struct and the following
new LSM hooks are being created:

 security_backing_file_alloc()
 security_backing_file_free()
 security_mmap_backing_file()

The first two hooks are to manage the lifecycle of the LSM security blob
in the backing_file struct, while the third provides a new mmap() access
control point for the underlying backing file.  It is also expected that
LSMs will likely want to update their security_file_mprotect() callback
to address issues with their mprotect() controls, but that does not
require a change to the security_file_mprotect() LSM hook.

There are a three other small changes to support these new LSM hooks:
* Pass the user file associated with a backing file down to
alloc_empty_backing_file() so it can be included in the
security_backing_file_alloc() hook.
* Add getter and setter functions for the backing_file struct LSM blob
as the backing_file struct remains private to fs/file_table.c.
* Constify the file struct field in the LSM common_audit_data struct to
better support LSMs that need to pass a const file struct pointer into
the common LSM audit code.

Thanks to Arnd Bergmann for identifying the missing EXPORT_SYMBOL_GPL()
and supplying a fixup.

Cc: stable@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-unionfs@vger.kernel.org
Cc: linux-erofs@lists.ozlabs.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-04-03 16:53:50 -04:00
Miklos Szeredi
2339f9cc9f fuse: support FSCONFIG_SET_FD for "fd" option
This is not only cleaner to use in userspace (no need to sprintf the fd to
a string) but also allows userspace to detect that the devfd can be closed
after the fsconfig call.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2026-04-02 20:53:00 +02:00
Miklos Szeredi
4ae404afd9 fuse: clean up device cloning
- fuse_mutex is not needed for device cloning, because fuse_dev_install()
   uses cmpxcg() to set fud->fc, which prevents races between clone/mount
   or clone/clone.  This makes the logic simpler

 - Drop fc->dev_count.  This is only used to check in release if the device
   is the last clone, but checking list_empty(&fc->devices) is equivalent
   after removing the released device from the list.  Removing the fuse_dev
   before calling fuse_abort_conn() is okay, since the processing and io
   lists are now empty for this device.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-04-02 20:52:59 +02:00
Miklos Szeredi
d42eb23b2e fuse: don't require /dev/fuse fd to be kept open during mount
With the new mount API the sequence of syscalls would be:

        fs_fd = fsopen("fuse", 0);
	snprintf(opt, sizeof(opt), "%i", devfd);
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "fd", opt, 0);
	/* ... */
	fsconfig(fs_fd, FSCONFIG_CMD_CREATE, 0, 0, 0);

Current mount code just stores the value of devfd in the fs_context and
uses it in during FSCONFIG_CMD_CREATE, which is inelegant.

Instead grab a reference to the underlying fuse_dev, and use that during
the filesystem creation.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-04-02 20:43:25 +02:00
Miklos Szeredi
e9bf38500e fuse: add refcount to fuse_dev
This will make it possible to grab the fuse_dev and subsequently release
the file that it came from.

In the above case, fud->fc will be set to FUSE_DEV_FC_DISCONNECTED to
indicate that this is no longer a functional device.

When trying to assign an fc to such a disconnected fuse_dev, the fc is set
to the disconnected state.

Use atomic operations xchg() and cmpxchg() to prevent races.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-04-02 20:43:24 +02:00
Miklos Szeredi
a8dd5f1b73 fuse: create fuse_dev on /dev/fuse open instead of mount
Allocate struct fuse_dev when opening the device.  This means that unlike
before, ->private_data is always set to a valid pointer.

The use of USE_DEV_SYNC_INIT magic pointer for the private_data is now
replaced with a simple bool sync_init member.

If sync INIT is not set, I/O on the device returns error before mount.
Keep this behavior by checking for the ->fc member.  If fud->fc is set, the
mount has succeeded.  Testing this used READ_ONCE(file->private_data) and
smp_mb() to try and provide the necessary semantics.  Switch this to
smp_store_release() and smp_load_acquire().

Setting fud->fc is protected by fuse_mutex, this is unchanged.

Will need this later so the /dev/fuse open file reference is not held
during FSCONFIG_CMD_CREATE.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2026-04-02 20:43:24 +02:00
Miklos Szeredi
e45f591f70 fuse: check connection state on notification
Check if the connection is fully initialized and connected before trying to
process a notification form the fuse server.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-04-02 20:43:24 +02:00
Miklos Szeredi
da6fcc6dbd fuse: fuse_dev_ioctl_clone() should wait for device file to be initialized
Use fuse_get_dev() not __fuse_get_dev() on the old fd, since in the case of
synchronous INIT the caller will want to wait for the device file to be
available for cloning, just like I/O wants to wait instead of returning an
error.

Fixes: dfb84c3307 ("fuse: allow synchronous FUSE_INIT")
Cc: stable@vger.kernel.org # v6.18
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-04-02 20:29:34 +02:00
Horst Birthelmer
aff12041b4 fuse: fix inode initialization race
Fix a race between fuse_iget() and fuse_reverse_inval_inode() where
invalidation can arrive while an inode is being initialized, causing
the invalidation to be lost.
By keeping the inode state I_NEW as long as the attributes are not valid
the invalidation can wait until the inode is fully initialized.

Suggested-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-04-01 12:12:58 +02:00
Miklos Szeredi
204aa22a68 fuse: abort on fatal signal during sync init
When sync init is used and the server exits for some reason (error, crash)
while processing FUSE_INIT, the filesystem creation will hang.  The reason
is that while all other threads will exit, the mounting thread (or process)
will keep the device fd open, which will prevent an abort from happening.

This is a regression from the async mount case, where the mount was done
first, and the FUSE_INIT processing afterwards, in which case there's no
such recursive syscall keeping the fd open.

Fixes: dfb84c3307 ("fuse: allow synchronous FUSE_INIT")
Cc: stable@vger.kernel.org # v6.18
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bernd@bsbernd.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-24 15:26:32 +01:00
Joanne Koong
76f9377cd2
writeback: don't block sync for filesystems with no data integrity guarantees
Add a SB_I_NO_DATA_INTEGRITY superblock flag for filesystems that cannot
guarantee data persistence on sync (eg fuse). For superblocks with this
flag set, sync kicks off writeback of dirty inodes but does not wait
for the flusher threads to complete the writeback.

This replaces the per-inode AS_NO_DATA_INTEGRITY mapping flag added in
commit f9a49aa302 ("fs/writeback: skip AS_NO_DATA_INTEGRITY mappings
in wait_sb_inodes()"). The flag belongs at the superblock level because
data integrity is a filesystem-wide property, not a per-inode one.
Having this flag at the superblock level also allows us to skip having
to iterate every dirty inode in wait_sb_inodes() only to skip each inode
individually.

Prior to this commit, mappings with no data integrity guarantees skipped
waiting on writeback completion but still waited on the flusher threads
to finish initiating the writeback. Waiting on the flusher threads is
unnecessary. This commit kicks off writeback but does not wait on the
flusher threads. This change properly addresses a recent report [1] for
a suspend-to-RAM hang seen on fuse-overlayfs that was caused by waiting
on the flusher threads to finish:

Workqueue: pm_fs_sync pm_fs_sync_work_fn
Call Trace:
 <TASK>
 __schedule+0x457/0x1720
 schedule+0x27/0xd0
 wb_wait_for_completion+0x97/0xe0
 sync_inodes_sb+0xf8/0x2e0
 __iterate_supers+0xdc/0x160
 ksys_sync+0x43/0xb0
 pm_fs_sync_work_fn+0x17/0xa0
 process_one_work+0x193/0x350
 worker_thread+0x1a1/0x310
 kthread+0xfc/0x240
 ret_from_fork+0x243/0x280
 ret_from_fork_asm+0x1a/0x30
 </TASK>

On fuse this is problematic because there are paths that may cause the
flusher thread to block (eg if systemd freezes the user session cgroups
first, which freezes the fuse daemon, before invoking the kernel
suspend. The kernel suspend triggers ->write_node() which on fuse issues
a synchronous setattr request, which cannot be processed since the
daemon is frozen. Or if the daemon is buggy and cannot properly complete
writeback, initiating writeback on a dirty folio already under writeback
leads to writeback_get_folio() -> folio_prepare_writeback() ->
unconditional wait on writeback to finish, which will cause a hang).
This commit restores fuse to its prior behavior before tmp folios were
removed, where sync was essentially a no-op.

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a-asuvfrbKXbEwwDSctvemF+6zfhdnuzO65Pt8HsFSRw@mail.gmail.com/T/#m632c4648e9cafc4239299887109ebd880ac6c5c1

Fixes: 0c58a97f91 ("fuse: remove tmp folio for writebacks and internal rb tree")
Reported-by: John <therealgraysky@proton.me>
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260320005145.2483161-2-joannelkoong@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-20 14:18:56 +01:00
Christoph Hellwig
4d25c7d688
iomap: pass the iomap_iter to ->submit_read
This provides additional context for file systems.

Rename the fuse instance to match the method name while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260223132021.292832-10-hch@lst.de
Tested-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-10 10:29:03 +01:00
Luis Henriques
5a6baf2046 fuse: fix uninit-value in fuse_dentry_revalidate()
fuse_dentry_revalidate() may be called with a dentry that didn't had
->d_time initialised.  The issue was found with KMSAN, where lookup_open()
calls __d_alloc(), followed by d_revalidate(), as shown below:

=====================================================
BUG: KMSAN: uninit-value in fuse_dentry_revalidate+0x150/0x13d0 fs/fuse/dir.c:394
 fuse_dentry_revalidate+0x150/0x13d0 fs/fuse/dir.c:394
 d_revalidate fs/namei.c:1030 [inline]
 lookup_open fs/namei.c:4405 [inline]
 open_last_lookups fs/namei.c:4583 [inline]
 path_openat+0x1614/0x64c0 fs/namei.c:4827
 do_file_open+0x2aa/0x680 fs/namei.c:4859
[...]

Uninit was created at:
 slab_post_alloc_hook mm/slub.c:4466 [inline]
 slab_alloc_node mm/slub.c:4788 [inline]
 kmem_cache_alloc_lru_noprof+0x382/0x1280 mm/slub.c:4807
 __d_alloc+0x55/0xa00 fs/dcache.c:1740
 d_alloc_parallel+0x99/0x2740 fs/dcache.c:2604
 lookup_open fs/namei.c:4398 [inline]
 open_last_lookups fs/namei.c:4583 [inline]
 path_openat+0x135f/0x64c0 fs/namei.c:4827
 do_file_open+0x2aa/0x680 fs/namei.c:4859
[...]
=====================================================

Reported-by: syzbot+fdebb2dc960aa56c600a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69917e0d.050a0220.340abe.02e2.GAE@google.com
Fixes: 2396356a94 ("fuse: add more control over cache invalidation behaviour")
Signed-off-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 17:43:34 +01:00
Joanne Koong
8d306cbffc fuse: use offset_in_page() for page offset calculations
Replace open-coded (x & ~PAGE_MASK) with offset_in_page().

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Horst Birthelmer <hbirthelmer@ddn.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 17:43:33 +01:00
Joanne Koong
dcfd95cb50 fuse: use DIV_ROUND_UP() for page count calculations
Use DIV_ROUND_UP() instead of manually computing round-up division
calculations.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Horst Birthelmer <hbirthelmer@ddn.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 17:43:33 +01:00
Joanne Koong
25307ca50b fuse: simplify logic in fuse_notify_store() and fuse_retrieve()
Simplify the folio parsing logic in fuse_notify_store() and
fuse_retrieve().

In particular, calculate the index by tracking pos, which allows us to
remove calculating nr_pages, and use "pos" in place of outarg's offset
field.

Suggested-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 17:43:26 +01:00
Joanne Koong
65161470f9 fuse: validate outarg offset and size in notify store/retrieve
Add validation checking for outarg offset and outarg size values passed
in by the server. MAX_LFS_FILESIZE is the maximum file size supported.
The fuse_notify_store_out and fuse_notify_retrieve_out structs take in
a uint64_t offset.

Add logic to ensure:
* outarg.offset is less than MAX_LFS_FILESIZE
* outarg.offset + outarg.size cannot exceed MAX_LFS_FILESIZE
* potential uint64_t overflow is fixed when adding outarg.offset and
  outarg.size.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 10:05:39 +01:00
Bernd Schubert
59ba47b6be fuse: Check for large folio with SPLICE_F_MOVE
xfstest generic/074 and generic/075 complain result in kernel
warning messages / page dumps.
This is easily reproducible (on 6.19) with
CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_ALWAYS=y
CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_ALWAYS=y

This just adds a test for large folios fuse_try_move_folio
with the same page copy fallback, but to avoid the warnings
from fuse_check_folio().

Cc: stable@vger.kernel.org
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 10:05:39 +01:00
Darrick J. Wong
129a45f975 fuse: quiet down complaints in fuse_conn_limit_write
gcc 15 complains about an uninitialized variable val that is passed by
reference into fuse_conn_limit_write:

 control.c: In function ‘fuse_conn_congestion_threshold_write’:
 include/asm-generic/rwonce.h:55:37: warning: ‘val’ may be used uninitialized [-Wmaybe-uninitialized]
    55 |         *(volatile typeof(x) *)&(x) = (val);                            \
       |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
 include/asm-generic/rwonce.h:61:9: note: in expansion of macro ‘__WRITE_ONCE’
    61 |         __WRITE_ONCE(x, val);                                           \
       |         ^~~~~~~~~~~~
 control.c:178:9: note: in expansion of macro ‘WRITE_ONCE’
   178 |         WRITE_ONCE(fc->congestion_threshold, val);
       |         ^~~~~~~~~~
 control.c:166:18: note: ‘val’ was declared here
   166 |         unsigned val;
       |                  ^~~

Unfortunately there's enough macro spew involved in kstrtoul_from_user
that I think gcc gives up on its analysis and sprays the above warning.
AFAICT it's not actually a bug, but we could just zero-initialize the
variable to enable using -Wmaybe-uninitialized to find real problems.

Previously we would use some weird uninitialized_var annotation to quiet
down the warnings, so clearly this code has been like this for quite
some time.

Cc: stable@vger.kernel.org # v5.9
Fixes: 3f649ab728 ("treewide: Remove uninitialized_var() usage")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 10:05:39 +01:00
Luis Henriques
f595dda929 fuse: drop unnecessary argument from fuse_lookup_init()
Remove the fuse_conn argument from function fuse_lookup_init() as it isn't
used since commit 21f621741a ("fuse: fix LOOKUP vs INIT compat
handling").

Signed-off-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 10:05:39 +01:00
Jingbo Xu
5223e0470e fuse: fix premature writetrhough request for large folio
When large folio is enabled and the initial folio offset exceeds
PAGE_SIZE, e.g. the position resides in the second page of a large
folio, after the folio copying the offset (in the page) won't be updated
to 0 even though the expected range is successfully copied until the end
of the folio.  In this case fuse_fill_write_pages() exits prematurelly
before the request has reached the max_write/max_pages limit.

Fix this by eliminating page offset entirely and use folio offset
instead.

Fixes: d60a6015e1 ("fuse: support large folios for writethrough writes")
Reviewed-by: Horst Birthelmer <hbirthelmer@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-03-03 10:05:39 +01:00
Yuto Ohnuki
9587fde0da fuse: refactor duplicate queue teardown operation
Extract common queue iteration and teardown logic into
fuse_uring_teardown_all_queues() helper function to eliminate code
duplication between fuse_uring_async_stop_queues() and
fuse_uring_stop_queues().

This is a pure refactoring with no functional changes, intended to
improve maintainability.

Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
Reviewed-by: Bernd Schubert <bernd@bsbernd.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-02-27 15:16:34 +01:00
Yuto Ohnuki
68b69fa0ed virtiofs: add FUSE protocol validation
Add virtio_fs_verify_response() to validate that the server properly
follows the FUSE protocol by checking:

- Response length is at least sizeof(struct fuse_out_header).
- oh.len matches the actual response length.
- oh.unique matches the request's unique identifier.

On validation failure, set error to -EIO and normalize oh.len to prevent
underflow in copy_args_from_argbuf().

Addresses the TODO comment in virtio_fs_request_complete().

Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-02-27 15:00:18 +01:00
Sergio Lopez
42fbb31310 fuse: mark DAX inode releases as blocking
Commit 26e5c67deb ("fuse: fix livelock in synchronous file put from
fuseblk workers") made fputs on closing files always asynchronous.

As cleaning up DAX inodes may require issuing a number of synchronous
request for releasing the mappings, completing the release request from
the worker thread may lead to it hanging like this:

[   21.386751] Workqueue: events virtio_fs_requests_done_work
[   21.386769] Call trace:
[   21.386770]  __switch_to+0xe4/0x140
[   21.386780]  __schedule+0x294/0x72c
[   21.386787]  schedule+0x24/0x90
[   21.386794]  request_wait_answer+0x184/0x298
[   21.386799]  __fuse_simple_request+0x1f4/0x320
[   21.386805]  fuse_send_removemapping+0x80/0xa0
[   21.386810]  dmap_removemapping_list+0xac/0xfc
[   21.386814]  inode_reclaim_dmap_range.constprop.0+0xd0/0x204
[   21.386820]  fuse_dax_inode_cleanup+0x28/0x5c
[   21.386825]  fuse_evict_inode+0x120/0x190
[   21.386834]  evict+0x188/0x320
[   21.386847]  iput_final+0xb0/0x20c
[   21.386854]  iput+0xa0/0xbc
[   21.386862]  fuse_release_end+0x18/0x2c
[   21.386868]  fuse_request_end+0x9c/0x2c0
[   21.386872]  virtio_fs_request_complete+0x150/0x384
[   21.386879]  virtio_fs_requests_done_work+0x18c/0x37c
[   21.386885]  process_one_work+0x15c/0x2e8
[   21.386891]  worker_thread+0x278/0x480
[   21.386898]  kthread+0xd0/0xdc
[   21.386902]  ret_from_fork+0x10/0x20

Here, the virtio-fs worker_thread is waiting on request_wait_answer()
for a reply from the virtio-fs server that is already in the virtqueue
but will never be processed since it's that same worker thread the one
in charge of consuming the elements from the virtqueue.

To address this issue, when relesing a DAX inode mark the operation as
potentially blocking. Doing this will ensure these release requests are
processed on a different worker thread.

Signed-off-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2026-02-27 15:00:17 +01:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
8934827db5 kmalloc_obj treewide refactoring for v7.0-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaZl14wAKCRA2KwveOeQk
 uz8aAQCBFLYlij3Y3ivVADkBxuVF3xECaznFya41ENYsBwlHdwEArXqMyNrw+DiG
 TvWCK/tiddNmGIRpI2sxBFzyRpsHfAY=
 =rVD3
 -----END PGP SIGNATURE-----

Merge tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kmalloc_obj conversion from Kees Cook:
 "This does the tree-wide conversion to kmalloc_obj() and friends using
  coccinelle, with a subsequent small manual cleanup of whitespace
  alignment that coccinelle does not handle.

  This uncovered a clang bug in __builtin_counted_by_ref(), so the
  conversion is preceded by disabling that for current versions of
  clang.  The imminent clang 22.1 release has the fix.

  I've done allmodconfig build tests for x86_64, arm64, i386, and arm. I
  did defconfig builds for alpha, m68k, mips, parisc, powerpc, riscv,
  s390, sparc, sh, arc, csky, xtensa, hexagon, and openrisc"

* tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kmalloc_obj: Clean up after treewide replacements
  treewide: Replace kmalloc with kmalloc_obj for non-scalar types
  compiler_types: Disable __builtin_counted_by_ref for Clang
2026-02-21 11:02:58 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Govindarajulu Varadarajan
ea129e55c9 io_uring: Add size check for sqe->cmd
For SQE128, sqe->cmd provides 80 bytes for uring_cmd. Add macro to
check if size of user struct does not exceed 80 bytes at compile time.
User doesn't have to track this manually during development.

Replace io_uring_sqe_cmd() inline func with macro and add
io_uring_sqe128_cmd() which checks struct
size for 16 bytes cmd and 80 bytes cmd respectively.

Signed-off-by: Govindarajulu Varadarajan <govind.varadar@gmail.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-19 07:26:26 -07:00
Arnd Bergmann
b29a7a8eee fs: fuse: fix max() of incompatible types
The 'max()' value of a 'long long' and an 'unsigned int' is problematic
if the former is negative:

In function 'fuse_wr_pages',
    inlined from 'fuse_perform_write' at fs/fuse/file.c:1347:27:
include/linux/compiler_types.h:652:45: error: call to '__compiletime_assert_390' declared with attribute error: min(((pos + len - 1) >> 12) - (pos >> 12) + 1, max_pages) signedness error
  652 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |                                             ^

Use a temporary variable to make it clearer what is going on here.

Fixes: 0f5bb0cfb0 ("fs: use min() or umin() instead of min_t()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-09 15:19:43 -08:00
Linus Torvalds
9e355113f0 vfs-7.0-rc1.misc
Please consider pulling these changes from the signed vfs-7.0-rc1.misc tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49QAKCRCRxhvAZXjc
 ojrZAQD1VJzY46r5FnAVf4jlEHyjIbDnZCP/n+c4x6XnqpU6EQEAgB0yAtAGP6+u
 SBuytElqHoTT5VtmEXTAabCNQ9Ks8wo=
 =JwZz
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains a mix of VFS cleanups, performance improvements, API
  fixes, documentation, and a deprecation notice.

  Scalability and performance:

   - Rework pid allocation to only take pidmap_lock once instead of
     twice during alloc_pid(), improving thread creation/teardown
     throughput by 10-16% depending on false-sharing luck. Pad the
     namespace refcount to reduce false-sharing

   - Track file lock presence via a flag in ->i_opflags instead of
     reading ->i_flctx, avoiding false-sharing with ->i_readcount on
     open/close hot paths. Measured 4-16% improvement on 24-core
     open-in-a-loop benchmarks

   - Use a consume fence in locks_inode_context() to match the
     store-release/load-consume idiom, eliminating a hardware fence on
     some architectures

   - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent
     false-sharing

   - Remove a redundant DCACHE_MANAGED_DENTRY check in
     __follow_mount_rcu() that never fires since the caller already
     verifies it, eliminating a 100% mispredicted branch

   - Fix a 100% mispredicted likely() in devcgroup_inode_permission()
     that became wrong after a prior code reorder

  Bug fixes and correctness:

   - Make insert_inode_locked() wait for inode destruction instead of
     skipping, fixing a corner case where two matching inodes could
     exist in the hash

   - Move f_mode initialization before file_ref_init() in alloc_file()
     to respect the SLAB_TYPESAFE_BY_RCU ordering contract

   - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with
     no buffers attached, preventing a null pointer dereference when
     AS_RELEASE_ALWAYS is set but no release_folio op exists

   - Fix select restart_block to store end_time as timespec64, avoiding
     truncation of tv_sec on 32-bit architectures

   - Make dump_inode() use get_kernel_nofault() to safely access inode
     and superblock fields, matching the dump_mapping() pattern

  API modernization:

   - Make posix_acl_to_xattr() allocate the buffer internally since
     every single caller was doing it anyway. Reduces boilerplate and
     unnecessary error checking across ~15 filesystems

   - Replace deprecated simple_strtoul() with kstrtoul() for the
     ihash_entries, dhash_entries, mhash_entries, and mphash_entries
     boot parameters, adding proper error handling

   - Convert chardev code to use guard(mutex) and __free(kfree) cleanup
     patterns

   - Replace min_t() with min() or umin() in VFS code to avoid silently
     truncating unsigned long to unsigned int

   - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers
     already check the flag

  Deprecation:

   - Begin deprecating legacy BSD process accounting (acct(2)). The
     interface has numerous footguns and better alternatives exist
     (eBPF)

  Documentation:

   - Fix and complete kernel-doc for struct export_operations, removing
     duplicated documentation between ReST and source

   - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait()

  Testing:

   - Add a kunit test for initramfs cpio handling of entries with
     filesize > PATH_MAX

  Misc:

   - Add missing <linux/init_task.h> include in fs_struct.c"

* tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  posix_acl: make posix_acl_to_xattr() alloc the buffer
  fs: make insert_inode_locked() wait for inode destruction
  initramfs_test: kunit test for cpio.filesize > PATH_MAX
  fs: improve dump_inode() to safely access inode fields
  fs: add <linux/init_task.h> for 'init_fs'
  docs: exportfs: Use source code struct documentation
  fs: move initializing f_mode before file_ref_init()
  exportfs: Complete kernel-doc for struct export_operations
  exportfs: Mark struct export_operations functions at kernel-doc
  exportfs: Fix kernel-doc output for get_name()
  acct(2): begin the deprecation of legacy BSD process accounting
  device_cgroup: remove branch hint after code refactor
  VFS: fix __start_dirop() kernel-doc warnings
  fs: Describe @isnew parameter in ilookup5_nowait()
  fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu
  fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS
  select: store end_time as timespec64 in restart block
  chardev: Switch to guard(mutex) and __free(kfree)
  namespace: Replace simple_strtoul with kstrtoul to parse boot params
  dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries
  ...
2026-02-09 15:13:05 -08:00
Linus Torvalds
3304b3fedd vfs-7.0-rc1.iomap
Please consider pulling these changes from the signed vfs-7.0-rc1.iomap tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
 oqSJAP43kijhiHYTVRurju8VWzLuY2yWweL5z/2i/w4b0Vh4TgD+OfeOnf/zSYvR
 HEvf5iq1QtlaYZq8njSYOc8DlWkQvQ4=
 =OKKM
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs iomap updates from Christian Brauner:

 - Erofs page cache sharing preliminaries:

   Plumb a void *private parameter through iomap_read_folio() and
   iomap_readahead() into iomap_iter->private, matching iomap DIO. Erofs
   uses this to replace a bogus kmap_to_page() call, as preparatory work
   for page cache sharing.

 - Fix for invalid folio access:

   Fix an invalid folio access when a folio without iomap_folio_state
   is fully submitted to the IO helper — the helper may call
   folio_end_read() at any time, so ctx->cur_folio must be invalidated
   after full submission.

* tag 'vfs-7.0-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iomap: fix invalid folio access after folio_end_read()
  erofs: hold read context in iomap_iter if needed
  iomap: stash iomap read ctx in the private field of iomap_iter
2026-02-09 15:08:16 -08:00
Linus Torvalds
aa2a0fcd4c vfs-7.0-rc1.leases
Please consider pulling these changes from the signed vfs-7.0-rc1.leases tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc
 olR/AP40iNOTRn7LosXbRWqGGZqzy9v64QYoLzk3QdsWuGmbRAD/egNQzof8mkAf
 IscefWTOjY7xyDzmEBEBnfHftgMiEwM=
 =zre0
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs lease updates from Christian Brauner:
 "This contains updates for lease support to require filesystems to
  explicitly opt-in to lease support

  Currently kernel_setlease() falls through to generic_setlease() when a
  a filesystem does not define ->setlease(), silently granting lease
  support to every filesystem regardless of whether it is prepared for
  it.

  This is a poor default: most filesystems never intended to support
  leases, and the silent fallthrough makes it impossible to distinguish
  "supports leases" from "never thought about it".

  This inverts the default. It adds explicit

	.setlease = generic_setlease;

  assignments to every in-tree filesystem that should retain lease
  support, then changes kernel_setlease() to return -EINVAL when
  ->setlease is NULL.

  With the new default in place, simple_nosetlease() is redundant and
  is removed along with all references to it"

* tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
  fuse: add setlease file operation
  fs: remove simple_nosetlease()
  filelock: default to returning -EINVAL when ->setlease operation is NULL
  xfs: add setlease file operation
  ufs: add setlease file operation
  udf: add setlease file operation
  tmpfs: add setlease file operation
  squashfs: add setlease file operation
  overlayfs: add setlease file operation
  orangefs: add setlease file operation
  ocfs2: add setlease file operation
  ntfs3: add setlease file operation
  nilfs2: add setlease file operation
  jfs: add setlease file operation
  jffs2: add setlease file operation
  gfs2: add a setlease file operation
  fat: add setlease file operation
  f2fs: add setlease file operation
  exfat: add setlease file operation
  ext4: add setlease file operation
  ...
2026-02-09 11:59:07 -08:00
Linus Torvalds
fcb70a56f4 vfs-6.19-rc8.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaXc4IwAKCRCRxhvAZXjc
 oo0jAQDOV580l4wHiY6eT1QGY2QYa7u8fYDOi6mqfgHa+EH5twD9ETnQ0xQHIKYP
 oruFJXLf3ihBBsum+pTpAO2XFVjM7Qs=
 =pM8o
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

 - Fix the the buggy conversion of fuse_reverse_inval_entry() introduced
   during the creation rework

 - Disallow nfs delegation requests for directories by setting
   simple_nosetlease()

 - Require an opt-in for getting readdir flag bits outside of S_DT_MASK
   set in d_type

 - Fix scheduling delayed writeback work by only scheduling when the
   dirty time expiry interval is non-zero and cancel the delayed work if
   the interval is set to zero

 - Use rounded_jiffies_interval for dirty time work

 - Check the return value of sb_set_blocksize() for romfs

 - Wait for batched folios to be stable in __iomap_get_folio()

 - Use private naming for fuse hash size

 - Fix the stale dentry cleanup to prevent a race that causes a UAF

* tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: document d_dispose_if_unused()
  fuse: shrink once after all buckets have been scanned
  fuse: clean up fuse_dentry_tree_work()
  fuse: add need_resched() before unlocking bucket
  fuse: make sure dentry is evicted if stale
  fuse: fix race when disposing stale dentries
  fuse: use private naming for fuse hash size
  writeback: use round_jiffies_relative for dirtytime_work
  iomap: wait for batched folios to be stable in __iomap_get_folio
  romfs: check sb_set_blocksize() return value
  docs: clarify that dirtytime_expire_seconds=0 disables writeback
  writeback: fix 100% CPU usage when dirtytime_expire_interval is 0
  readdir: require opt-in for d_type flags
  vboxsf: don't allow delegations to be set on directories
  ceph: don't allow delegations to be set on directories
  gfs2: don't allow delegations to be set on directories
  9p: don't allow delegations to be set on directories
  smb/client: properly disallow delegations on directories
  nfs: properly disallow delegation requests on directories
  fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
2026-01-26 09:30:48 -08:00
Joanne Koong
f9a49aa302 fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
Above the while() loop in wait_sb_inodes(), we document that we must wait
for all pages under writeback for data integrity.  Consequently, if a
mapping, like fuse, traditionally does not have data integrity semantics,
there is no need to wait at all; we can simply skip these inodes.

This restores fuse back to prior behavior where syncs are no-ops.  This
fixes a user regression where if a system is running a faulty fuse server
that does not reply to issued write requests, this causes wait_sb_inodes()
to wait forever.

Link: https://lkml.kernel.org/r/20260105211737.4105620-2-joannelkoong@gmail.com
Fixes: 0c58a97f91 ("fuse: remove tmp folio for writebacks and internal rb tree")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Tested-by: J. Neuschäfer <j.neuschaefer@gmx.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Bernd Schubert <bschubert@ddn.com>
Cc: Bonaccorso Salvatore <carnil@debian.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-19 12:30:01 -08:00
Miklos Szeredi
fa79401a9c
fuse: shrink once after all buckets have been scanned
In fuse_dentry_tree_work() move the shrink_dentry_list() out from the loop.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260114145344.468856-6-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 19:15:14 +01:00
Miklos Szeredi
3926746b55
fuse: clean up fuse_dentry_tree_work()
- Change time_after64() time_before64(), since the latter is exclusively
  used in this file to compare dentry/inode timeout with current time.

- Move the break statement from the else branch to the if branch, reducing
  indentation.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260114145344.468856-5-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 19:15:14 +01:00
Miklos Szeredi
09f7a43ae5
fuse: add need_resched() before unlocking bucket
In fuse_dentry_tree_work() no need to unlock/lock dentry_hash[i].lock on
each iteration.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260114145344.468856-4-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 19:15:14 +01:00
Miklos Szeredi
1e2c1af1be
fuse: make sure dentry is evicted if stale
d_dispose_if_unused() may find the dentry with a positive refcount, in
which case it won't be put on the dispose list even though it has already
timed out.

"Reinstall" the d_delete() callback, which was optimized out in
fuse_dentry_settime().  This will result in the dentry being evicted as
soon as the refcount hits zero.

Fixes: ab84ad5973 ("fuse: new work queue to periodically invalidate expired dentries")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260114145344.468856-3-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 19:15:14 +01:00
Miklos Szeredi
cb8d2bdcb8
fuse: fix race when disposing stale dentries
In fuse_dentry_tree_work() just before d_dispose_if_unused() the dentry
could get evicted, resulting in UAF.

Move unlocking dentry_hash[i].lock to after the dispose.  To do this,
fuse_dentry_tree_del_node() needs to be moved from fuse_dentry_prune() to
fuse_dentry_release() to prevent an ABBA deadlock.

The lock ordering becomes:

 -> dentry_bucket.lock
    -> dentry.d_lock

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Closes: https://lore.kernel.org/all/20251206014242.GO1712166@ZenIV/
Fixes: ab84ad5973 ("fuse: new work queue to periodically invalidate expired dentries")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260114145344.468856-2-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 19:15:14 +01:00
Jens Axboe
4973d95679
fuse: use private naming for fuse hash size
With a mix of include dependencies, the compiler warns that:

fs/fuse/dir.c:35:9: warning: ?HASH_BITS? redefined
   35 | #define HASH_BITS       5
      |         ^~~~~~~~~
In file included from ./include/linux/io_uring_types.h:5,
                 from ./include/linux/bpf.h:34,
                 from ./include/linux/security.h:35,
                 from ./include/linux/fs_context.h:14,
                 from fs/fuse/dir.c:13:
./include/linux/hashtable.h:28:9: note: this is the location of the previous definition
   28 | #define HASH_BITS(name) ilog2(HASH_SIZE(name))
      |         ^~~~~~~~~
fs/fuse/dir.c:36:9: warning: ?HASH_SIZE? redefined
   36 | #define HASH_SIZE       (1 << HASH_BITS)
      |         ^~~~~~~~~
./include/linux/hashtable.h:27:9: note: this is the location of the previous definition
   27 | #define HASH_SIZE(name) (ARRAY_SIZE(name))
      |         ^~~~~~~~~

Hence rename the HASH_SIZE/HASH_BITS in fuse, by prefixing them with
FUSE_ instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://patch.msgid.link/195c9525-281c-4302-9549-f3d9259416c6@kernel.dk
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 10:55:44 +01:00
Miklos Szeredi
6cbfdf8947
posix_acl: make posix_acl_to_xattr() alloc the buffer
Without exception all caller do that.  So move the allocation into the
helper.

This reduces boilerplate and removes unnecessary error checking.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://patch.msgid.link/20260115122341.556026-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16 10:51:12 +01:00
Hongbo Li
8806f27924
iomap: stash iomap read ctx in the private field of iomap_iter
It's useful to get filesystem-specific information using the
existing private field in the @iomap_iter passed to iomap_{begin,end}
for advanced usage for iomap buffered reads, which is much like the
current iomap DIO.

For example, EROFS needs it to:

 - implement an efficient page cache sharing feature, since iomap
   needs to apply to anon inode page cache but we'd like to get the
   backing inode/fs instead, so filesystem-specific private data is
   needed to keep such information;

 - pass in both struct page * and void * for inline data to avoid
   kmap_to_page() usage (which is bogus).

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://patch.msgid.link/20260109102856.598531-2-lihongbo22@huawei.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-14 16:31:41 +01:00
Jeff Layton
056a96e65f
fuse: add setlease file operation
Add the setlease file_operation to fuse_file_operations, pointing to
generic_setlease.  A future patch will change the default behavior to
reject lease attempts with -EINVAL when there is no setlease file
operation defined. Add generic_setlease to retain the ability to set
leases on this filesystem.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260112130121.25965-1-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13 09:56:11 +01:00
Jeff Layton
51e49111c0
fs: remove simple_nosetlease()
Setting ->setlease() to a NULL pointer now has the same effect as
setting it to simple_nosetlease(). Remove all of the setlease
file_operations that are set to simple_nosetlease, and the function
itself.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260108-setlease-6-20-v1-24-ea4dec9b67fa@kernel.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 10:55:48 +01:00
NeilBrown
cab0123751
fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
The recent conversion of fuse_reverse_inval_entry() to use
start_removing() was wrong.
As Val Packett points out the original code did not call ->lookup
while the new code does.  This can lead to a deadlock.

Rather than using full_name_hash() and d_lookup() as the old code
did, we can use try_lookup_noperm() which combines these.  Then
the result can be given to start_removing_dentry() to get the required
locks for removal.  We then double check that the name hasn't
changed.

As 'dir' needs to be used several times now, we load the dput() until
the end, and initialise to NULL so dput() is always safe.

Reported-by: Val Packett <val@packett.cool>
Closes: https://lore.kernel.org/all/6713ea38-b583-4c86-b74a-bea55652851d@packett.cool
Fixes: c9ba789dad ("VFS: introduce start_creating_noperm() and start_removing_noperm()")
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/176454037897.634289.3566631742434963788@noble.neil.brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12 10:39:58 +01:00