linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-12 16:18:45 +02:00

Author	SHA1	Message	Date
Linus Torvalds	bdcb864c71	- 9p access flag fix (cannot change access flag since new mount API implem) - some minor cleanup -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmnrRHgACgkQq06b7GqY 5nDgGw/7BG4ry89+Q2UR5T+/zqNj/kISpemz2F0Q62lPiQEIv3IvAAGErsGR6DiR dGvIzvsPLUAWv8gzwhfCn9zFCO92gmMRav5BxOhFpaHfqy6TTJIqe7DoRjbjjjhR 7wUId3WZ8U1V9Z7Ea3oE2gpf/rR+iwACu/O0U1ZYhuJIcmJRsBIWGnWsaZr+PWSf VuM74YrdGcdrgMkB2hGxezjZ16MBGekWmVBjbWbQVWiUg1FBiewug9syUlJxgEnv WatEdPu/gWcfGj9bY3RAAlUnP5YJRX22xfmLNVayBOV1LThvKDRd7RKicyDFLIWe 2NbvIaEtyMvupS8w+n0fsDFqL4/yRTwO6p26YV5nDOOMWr5yrQ6bNNmMJrJvJroP rtG5ww8c7mKLv7wBDJua7m6IIwHxAjzpmWbvX51+ap6uN2oQL7BWbGT3NrDuRVFj CblbXS4GzTMo5EbmOjuqO0HbA9a2gAPn6g8ElxFfKBUOJx6U86hDJ3CDfRs5i8ft SUXXJQYJqyhniK7INPvINSoIYn/+5cQavyFay+LgMBPZ8yNY9qSZVskb/Q5AugWI LdPoJ9DthzMzaSEiEn/sJcq8FGt8w7cOlOABYCMRazSAGWEGIwzHxOtXZp8UfifG ULbtc9uAGxZClbaO7p2T7cWHveAz+kvvhQn0+LxLJZ+/RzuRc/w= =miGY -----END PGP SIGNATURE----- Merge tag '9p-for-7.1-rc1' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - 9p access flag fix (cannot change access flag since new mount API implem) - some minor cleanup * tag '9p-for-7.1-rc1' of https://github.com/martinetd/linux: 9p/trans_xen: replace simple_strto* with kstrtouint 9p/trans_xen: make cleanup idempotent after dataring alloc errors 9p: document missing enum values in kernel-doc comments 9p: fix access mode flags being ORed instead of replaced 9p: fix memory leak in v9fs_init_fs_context error path	2026-04-24 13:37:26 -07:00
Pierre Barre	da2346a48a	9p: fix access mode flags being ORed instead of replaced Since commit `1f3e4142c0` ("9p: convert to the new mount API"), v9fs_apply_options() applies parsed mount flags with \|= onto flags already set by v9fs_session_init(). For 9P2000.L, session_init sets V9FS_ACCESS_CLIENT as the default, so when the user mounts with "access=user", both bits end up set. Access mode checks compare against exact values, so having both bits set matches neither mode. This causes v9fs_fid_lookup() to fall through to the default switch case, using INVALID_UID (nobody/65534) instead of current_fsuid() for all fid lookups. Root is then unable to chown or perform other privileged operations. Fix by clearing the access mask before applying the user's choice. Fixes: `1f3e4142c0` ("9p: convert to the new mount API") Signed-off-by: Pierre Barre <pierre@barre.sh> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Message-ID: <0ddc72da-d196-4f01-8755-0086f670e779@app.fastmail.com> Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2026-04-16 02:56:29 +00:00
Sasha Levin	0fd76f1be2	9p: fix memory leak in v9fs_init_fs_context error path Move the assignments of fc->ops and fc->fs_private to right after the kzalloc, before any fallible operations. Previously these were assigned at the end of the function, after the kstrdup calls for uname and aname. If either kstrdup failed, the error path would set fc->need_free but leave fc->ops NULL, so put_fs_context() would never call v9fs_free_fc() to free the allocated context and any already-duplicated strings. Fixes: `1f3e4142c0` ("9p: convert to the new mount API") Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Sasha Levin <sashal@kernel.org> Message-ID: <20260225135745.351984-1-sashal@kernel.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2026-04-16 01:53:42 +00:00
Jeff Layton	0b2600f81c	treewide: change inode->i_ino from unsigned long to u64 On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-06 14:31:28 +01:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	9e355113f0	vfs-7.0-rc1.misc Please consider pulling these changes from the signed vfs-7.0-rc1.misc tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49QAKCRCRxhvAZXjc ojrZAQD1VJzY46r5FnAVf4jlEHyjIbDnZCP/n+c4x6XnqpU6EQEAgB0yAtAGP6+u SBuytElqHoTT5VtmEXTAabCNQ9Ks8wo= =JwZz -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains a mix of VFS cleanups, performance improvements, API fixes, documentation, and a deprecation notice. Scalability and performance: - Rework pid allocation to only take pidmap_lock once instead of twice during alloc_pid(), improving thread creation/teardown throughput by 10-16% depending on false-sharing luck. Pad the namespace refcount to reduce false-sharing - Track file lock presence via a flag in ->i_opflags instead of reading ->i_flctx, avoiding false-sharing with ->i_readcount on open/close hot paths. Measured 4-16% improvement on 24-core open-in-a-loop benchmarks - Use a consume fence in locks_inode_context() to match the store-release/load-consume idiom, eliminating a hardware fence on some architectures - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent false-sharing - Remove a redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu() that never fires since the caller already verifies it, eliminating a 100% mispredicted branch - Fix a 100% mispredicted likely() in devcgroup_inode_permission() that became wrong after a prior code reorder Bug fixes and correctness: - Make insert_inode_locked() wait for inode destruction instead of skipping, fixing a corner case where two matching inodes could exist in the hash - Move f_mode initialization before file_ref_init() in alloc_file() to respect the SLAB_TYPESAFE_BY_RCU ordering contract - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with no buffers attached, preventing a null pointer dereference when AS_RELEASE_ALWAYS is set but no release_folio op exists - Fix select restart_block to store end_time as timespec64, avoiding truncation of tv_sec on 32-bit architectures - Make dump_inode() use get_kernel_nofault() to safely access inode and superblock fields, matching the dump_mapping() pattern API modernization: - Make posix_acl_to_xattr() allocate the buffer internally since every single caller was doing it anyway. Reduces boilerplate and unnecessary error checking across ~15 filesystems - Replace deprecated simple_strtoul() with kstrtoul() for the ihash_entries, dhash_entries, mhash_entries, and mphash_entries boot parameters, adding proper error handling - Convert chardev code to use guard(mutex) and __free(kfree) cleanup patterns - Replace min_t() with min() or umin() in VFS code to avoid silently truncating unsigned long to unsigned int - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers already check the flag Deprecation: - Begin deprecating legacy BSD process accounting (acct(2)). The interface has numerous footguns and better alternatives exist (eBPF) Documentation: - Fix and complete kernel-doc for struct export_operations, removing duplicated documentation between ReST and source - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait() Testing: - Add a kunit test for initramfs cpio handling of entries with filesize > PATH_MAX Misc: - Add missing <linux/init_task.h> include in fs_struct.c" * tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) posix_acl: make posix_acl_to_xattr() alloc the buffer fs: make insert_inode_locked() wait for inode destruction initramfs_test: kunit test for cpio.filesize > PATH_MAX fs: improve dump_inode() to safely access inode fields fs: add <linux/init_task.h> for 'init_fs' docs: exportfs: Use source code struct documentation fs: move initializing f_mode before file_ref_init() exportfs: Complete kernel-doc for struct export_operations exportfs: Mark struct export_operations functions at kernel-doc exportfs: Fix kernel-doc output for get_name() acct(2): begin the deprecation of legacy BSD process accounting device_cgroup: remove branch hint after code refactor VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS select: store end_time as timespec64 in restart block chardev: Switch to guard(mutex) and __free(kfree) namespace: Replace simple_strtoul with kstrtoul to parse boot params dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries ...	2026-02-09 15:13:05 -08:00
Miklos Szeredi	6cbfdf8947	posix_acl: make posix_acl_to_xattr() alloc the buffer Without exception all caller do that. So move the allocation into the helper. This reduces boilerplate and removes unnecessary error checking. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260115122341.556026-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-16 10:51:12 +01:00
Jeff Layton	51e49111c0	fs: remove simple_nosetlease() Setting ->setlease() to a NULL pointer now has the same effect as setting it to simple_nosetlease(). Remove all of the setlease file_operations that are set to simple_nosetlease, and the function itself. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-24-ea4dec9b67fa@kernel.org Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-12 10:55:48 +01:00
Jeff Layton	5d65a70bd0	9p: don't allow delegations to be set on directories With the advent of directory leases, it's necessary to set the ->setlease() handler in directory file_operations to properly deny them. Fixes: `e6d28ebc17` ("filelock: push the S_ISREG check down to ->setlease handlers") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260107-setlease-6-19-v1-3-85f034abcc57@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-01-12 10:54:47 +01:00
Linus Torvalds	bbbf7f3284	- fix a bug with O_APPEND in cached mode causing data to be written multiple times on server - use kvmalloc for trans_fd to avoid problems with large msize and fragmented memory This should hopefully be used in more transports when time allows - convert to new mount API - minor cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmk1dvwACgkQq06b7GqY 5nCRWA//a4qCTs/8FRS7N0Mz5Jg84VZ2JPnVN6iydLKbFDkgUL8JXI723VmApb6D wR21yRm7VuWpnGVdfPF6BtjZV7cYXEEwukfLqkXtOwx/WaRREKpN0sOciXMd0htg ZgnhhrabCOiSAYJYb9/29sNwhvfweQi0BeAFdIEAPrVonFUYXRzFS0v4AiBCs5PY n8X6aoshViAG05MZycB2VYaCxT45I+8YNCXtSsT/uX+3BP1FuRYMRluAYCLu/TU8 oKyFjkpIri01211OEORx6gs5CDeCv0LpELfk5EW2QF/mz4oW3/4bAchg22NgNV6x 0OCbgTqwlSJVETCZfZso/TV8efMlk1rLxSA0xjQY9r1lA26BTubrNadnC2W2nhSv GpPbDu6s3Cj3WD7P2InGtXxzUIZDCm8kHfHjzbbzgwOS6jl8SaDSnc6J2HtKNESL T55hqzqzv4POFQKrgznaQcaDW7imftOA+9xv+k+j5DTDKS9LLxiKS7+dyyzYAyyX sCjOd/T7JMvXIp4TxwbROn2+VBVYVO3ZaaKdK8e+qVKvWooP3iorGXAg0O0wZYJV tLE5zioZiRngzhqAczDIpJAe5Qd/SIi6W+sAOpKLSvVVKGf/akr0wH0KxI5z7ox8 uBStVXTOoh52qQr0L7nnBIUJ2VJLJUt9TCzwwvaLM5B1TyPgxh8= =7ST9 -----END PGP SIGNATURE----- Merge tag '9p-for-6.19-rc1' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - fix a bug with O_APPEND in cached mode causing data to be written multiple times on server - use kvmalloc for trans_fd to avoid problems with large msize and fragmented memory This should hopefully be used in more transports when time allows - convert to new mount API - minor cleanups * tag '9p-for-6.19-rc1' of https://github.com/martinetd/linux: 9p: fix new mount API cache option handling 9p: fix cache/debug options printing in v9fs_show_options 9p: convert to the new mount API 9p: create a v9fs_context structure to hold parsed options net/9p: move structures and macros to header files fs/fs_parse: add back fsparam_u32hex fs/9p: delete unnnecessary condition fs/9p: Don't open remote file with APPEND mode when writeback cache is used net/9p: cleanup: change p9_trans_module->def to bool 9p: Use kvmalloc for message buffers on supported transports	2025-12-07 08:29:09 -08:00
Eric Sandeen	3e281113f8	9p: fix new mount API cache option handling After commit `4eb3117888`, 9p needs to be able to accept numerical cache= mount options as well as the string "shortcuts" because the option is printed numerically in /proc/mounts rather than by string. This was missed in the mount API conversion, which used an enum for the shortcuts and therefore could not handle a numeric equivalent as an argument to the cache option. Fix this by removing the enum and reverting to the slightly more open-coded option handling for Opt_cache, with the reinstated get_cache_mode() helper. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Message-ID: <48cdeec9-5bb9-4c7a-a203-39bb8e0ef443@redhat.com> Tested-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-12-05 12:54:05 +00:00
Eric Sandeen	f044561331	9p: fix cache/debug options printing in v9fs_show_options commit `4eb3117888` changed the cache= option to accept either string shortcuts or bitfield values. It also changed /proc/mounts to emit the option as the hexadecimal numeric value rather than the shortcut string. However, by printing "cache=%x" without the leading 0x, shortcuts such as "cache=loose" will emit "cache=f" and 'f' is not a string that is parseable by kstrtoint(), so remounting may fail if a remount with "cache=f" is attempted. debug=%x has had the same problem since options have been displayed in `c4fac91004` ("9p: Implement show_options") Fix these by adding the 0x prefix to the hexadecimal value shown in /proc/mounts. Fixes: `4eb3117888` ("fs/9p: Rework cache modes and add new options to Documentation") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Message-ID: <54b93378-dcf1-4b04-922d-c8b4393da299@redhat.com> [Dominique: use %#x at Al Viro's suggestion, also handle debug] Tested-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-12-05 12:53:16 +00:00
Linus Torvalds	afdf0fb340	vfs-6.19-rc1.fs_header -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc oq2EAQD09y/qVU81E7Qg7Cn4n5/3WTlnQjx0aSvhb4p6dFUcFwD+K9uVJNP8x8tA xTaPt59nZbEX9BIAwtLChSPa4CZsnwM= =XrvE -----END PGP SIGNATURE----- Merge tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fs header updates from Christian Brauner: "This contains initial work to start splitting up fs.h. Begin the long-overdue work of splitting up the monolithic fs.h header. The header has grown to over 3000 lines and includes types and functions for many different subsystems, making it difficult to navigate and causing excessive compilation dependencies. This series introduces new focused headers for superblock-related code: - Rename fs_types.h to fs_dirent.h to better reflect its actual content (directory entry types) - Add fs/super_types.h containing superblock type definitions - Add fs/super.h containing superblock function declarations This is the first step in a longer effort to modularize the VFS headers. Cleanups: - Inode Field Layout Optimization (Mateusz Guzik) Move inode fields used during fast path lookup closer together to improve cache locality during path resolution. - current_umask() Optimization (Mateusz Guzik) Inline current_umask() and move it to fs_struct.h. This improves performance by avoiding function call overhead for this frequently-used function, and places it in a more appropriate header since it operates on fs_struct" * tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: move inode fields used during fast path lookup closer together fs: inline current_umask() and move it to fs_struct.h fs: add fs/super.h header fs: add fs/super_types.h header fs: rename fs_types.h to fs_dirent.h	2025-12-01 14:18:01 -08:00
Linus Torvalds	ebaeabfa5a	vfs-6.19-rc1.writeback -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc or4UAP9FbpFsZd0DpsYnKuv7kFepl291PuR0x2dKmseJ/wcf8AEAzI8FR5wd/fey 25ZNdExoUojAOj5wVn+jUep3u54jBws= =/toi -----END PGP SIGNATURE----- Merge tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull writeback updates from Christian Brauner: "Features: - Allow file systems to increase the minimum writeback chunk size. The relatively low minimal writeback size of 4MiB means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. This adds a superblock field that allows the file system to override the default size, and sets it to the zone size for zoned XFS. - Add logging for slow writeback when it exceeds sysctl_hung_task_timeout_secs. This helps identify tasks waiting for a long time and pinpoint potential issues. Recording the starting jiffies is also useful when debugging a crashed vmcore. - Wake up waiting tasks when finishing the writeback of a chunk Cleanups: - filemap_* writeback interface cleanups. Adding filemap_fdatawrite_wbc ended up being a mistake, as all but the original btrfs caller should be using better high level interfaces instead. This series removes all these low-level interfaces, switches btrfs to a more specific interface, and cleans up other too low-level interfaces. With this the writeback_control that is passed to the writeback code is only initialized in three places. - Remove __filemap_fdatawrite, __filemap_fdatawrite_range, and filemap_fdatawrite_wbc - Add filemap_flush_nr helper for btrfs - Push struct writeback_control into start_delalloc_inodes in btrfs - Rename filemap_fdatawrite_range_kick to filemap_flush_range - Stop opencoding filemap_fdatawrite_range in 9p, ocfs2, and mm - Make wbc_to_tag() inline and use it in fs" * tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Make wbc_to_tag() inline and use it in fs. xfs: set s_min_writeback_pages for zoned file systems writeback: allow the file system to override MIN_WRITEBACK_PAGES writeback: cleanup writeback_chunk_size mm: rename filemap_fdatawrite_range_kick to filemap_flush_range mm: remove __filemap_fdatawrite_range mm: remove filemap_fdatawrite_wbc mm: remove __filemap_fdatawrite mm,btrfs: add a filemap_flush_nr helper btrfs: push struct writeback_control into start_delalloc_inodes btrfs: use the local tmp_inode variable in start_delalloc_inodes ocfs2: don't opencode filemap_fdatawrite_range in ocfs2_journal_submit_inode_data_buffers 9p: don't opencode filemap_fdatawrite_range in v9fs_mmap_vm_close mm: don't opencode filemap_fdatawrite_range in filemap_invalidate_inode writeback: Add logging for slow writeback (exceeds sysctl_hung_task_timeout_secs) writeback: Wake up waiting tasks when finishing the writeback of a chunk.	2025-12-01 09:20:51 -08:00
Linus Torvalds	9368f0f941	vfs-6.19-rc1.inode -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM= =zo/J -----END PGP SIGNATURE----- Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...	2025-12-01 09:02:34 -08:00
Mateusz Guzik	5b8ed52866	fs: inline current_umask() and move it to fs_struct.h There is no good reason to have this as a func call, other than avoiding the churn of adding fs_struct.h as needed. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251104170448.630414-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-05 22:51:23 +01:00
Eric Sandeen	1f3e4142c0	9p: convert to the new mount API Convert 9p to the new mount API. This patch consolidates all parsing into fs/9p/v9fs.c, which stores all results into a filesystem context which can be passed to the various transports as needed. Some of the parsing helper functions such as get_cache_mode() have been eliminated in favor of using the new mount API's enum param type, for simplicity. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Message-ID: <20251010214222.1347785-5-sandeen@redhat.com> [ Dominique: handled source explicitly as per follow-up discussion ] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-11-03 16:49:53 +09:00
Dan Carpenter	52df783f33	fs/9p: delete unnnecessary condition We already know that "retval" is negative, so there is no need to check again. Also the statement is not indented far enough. Delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: `43c36a56cc` ("Revert "fs/9p: Refresh metadata in d_revalidate for uncached mode too"") Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Message-ID: <aPtiSJl8EwSfVvqN@stanley.mountain> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-11-03 16:49:53 +09:00
Tingmao Wang	a63dd8fd13	fs/9p: Don't open remote file with APPEND mode when writeback cache is used When page cache is used, writebacks are done on a page granularity, and it is expected that the underlying filesystem (such as v9fs) should respect the write position. However, currently v9fs will passthrough O_APPEND to the server even on cached mode. This causes data corruption if a sync or fstat gets between two writes to the same file. This patch removes the APPEND flag from the open request we send to the server when writeback caching is involved. I believe keeping server-side APPEND is probably fine for uncached mode (even if two fds are opened, one without O_APPEND and one with it, this should still be fine since they would use separate fid for the writes). Signed-off-by: Tingmao Wang <m@maowtm.org> Fixes: `4eb3117888` ("fs/9p: Rework cache modes and add new options to Documentation") Message-ID: <20251102235631.8724-1-m@maowtm.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-11-03 16:49:53 +09:00
Christoph Hellwig	3c2e5cee5e	9p: don't opencode filemap_fdatawrite_range in v9fs_mmap_vm_close Use filemap_fdatawrite_range instead of opencoding the logic using filemap_fdatawrite_wbc. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-3-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-10-29 15:50:41 +01:00
Dominique Martinet	43c36a56cc	Revert "fs/9p: Refresh metadata in d_revalidate for uncached mode too" This reverts commit `290434474c`. That commit broke cache=mmap, a mode that doesn't cache metadata, but still has writeback cache. In commit `290434474c` ("fs/9p: Refresh metadata in d_revalidate for uncached mode too") we considered metadata cache to be enough to not look at the server, but in writeback cache too looking at the server size would make the vfs consider the file has been truncated before the data has been flushed out, making the following repro fail (nothing is ever read back, the resulting file ends up with no data written) ``` #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> char buf[4096]; int main(int argc, char *argv[]) { int ret, i; int fdw, fdr; if (argc < 2) return 1; fdw = openat(AT_FDCWD, argv[1], O_RDWR\|O_CREAT\|O_EXCL\|O_CLOEXEC, 0600); if (fdw < 0) { fprintf(stderr, "cannot open fdw\n"); return 1; } write(fdw, buf, sizeof(buf)); fdr = openat(AT_FDCWD, argv[1], O_RDONLY\|O_CLOEXEC); if (fdr < 0) { fprintf(stderr, "cannot open fdr\n"); close(fdw); return 1; } for (i = 0; i < 10; i++) { ret = read(fdr, buf, sizeof(buf)); fprintf(stderr, "i: %d, read returns %d\n", i, ret); } close(fdr); close(fdw); return 0; } ``` There is a fix for this particular reproducer but it looks like there are other problems around metadata refresh (e.g. around file rename), so revert this to avoid d_revalidate in uncached mode for now. Reported-by: Song Liu <song@kernel.org> Link: https://lkml.kernel.org/r/CAHzjS_u_SYdt5=2gYO_dxzMKXzGMt-TfdE_ueowg-Hq5tRCAiw@mail.gmail.com Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Link: https://lore.kernel.org/bpf/CAEf4BzZbCE4tLoDZyUf_aASpgAGFj75QMfSXX4a4dLYixnOiLg@mail.gmail.com/ Fixes: `290434474c` ("fs/9p: Refresh metadata in d_revalidate for uncached mode too") Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-10-22 14:25:27 +09:00
Mateusz Guzik	b4dbfd8653	Coccinelle-based conversion to use ->i_state accessors All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 \| flag2) @@ expression inode, flags; @@ - inode->i_state \|= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-10-20 20:22:26 +02:00
Linus Torvalds	80b7065ec1	Bunch of unrelated fixes - polling fix for trans fd that ought to have been fixed otherwise back in March, but apparently came back somewhere else... - USB transport buffer overflow fix - Some dentry lifetime rework to handle metadata update for currently opened files in uncached mode, or inode type change in cached mode - a double-put on invalid flush found by syzbot - and finally /sys/fs/9p/caches not advancing buffer and overwriting itself for large contents Thanks to everyone involved! -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmjns5cACgkQq06b7GqY 5nDGLg/+Ltsiwj51CDwyJxFPX4ki1jM3j2iA8BNpdjjzCev6Jp1vuWBPiV+sTtgM zKmIw9joyvzaIXKzA3h7tMe7h7y8H4p7KZyD3YSDSNW5XIb5UYXeQwYVw39OxENk nVQkQ41H+bPepmz3R0t/F4la9myVLBtfdcPWJN+JfkOLgdeemCGf+TShJ6F+9Qaq PTd3UKfdv8JRlKQmeLH7pPlBoWYRoC2Tq3pZbyC3D1A50bWkTfxusW0k3MUzRyHO t/jNbhHt0ai+densQ6O5M+ALMI7KIIzR+obVaBHfzSUcwFeGhmW2x84Mo0qX6YMF 0B/fY3mCXi8QO6my1hfwmbRtY5mEV967wM/uF7E/SuvyNVizOMBZn8FgWfVPVszr ix1iNBgQJdoOEMbHvDfCAtGNorFrMiybyLvoabL7YCwFo68LdYqj2br3/feajoYM 5g46nUcidH4fVBLtiBGE8w0ko8aa3zLB4iu/fdATuEZ1r+gPt40SWo1mKahoCgY3 BOtYxEZTnd4l4GPJDQIaYgVckgSiex0xcE9pRJ5LtBwdyw9g4jwfb39WHx4MT2sE u5xWDbadU/ZM6ppLJ/iBkotvU9D7ohKTviN1IfCwjqL7IMBNz5c+nBwLQZWK4YTR 2ySKpv1VOMROglt1KvnHfdpmdJ+CjAJLLNE+XSz9AHXinK5v8aM= =bUN1 -----END PGP SIGNATURE----- Merge tag '9p-for-6.18-rc1' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: "A bunch of unrelated fixes: - polling fix for trans fd that ought to have been fixed otherwise back in March, but apparently came back somewhere else... - USB transport buffer overflow fix - Some dentry lifetime rework to handle metadata update for currently opened files in uncached mode, or inode type change in cached mode - a double-put on invalid flush found by syzbot - and finally /sys/fs/9p/caches not advancing buffer and overwriting itself for large contents Thanks to everyone involved!" * tag '9p-for-6.18-rc1' of https://github.com/martinetd/linux: 9p: sysfs_init: don't hardcode error to ENOMEM 9p: fix /sys/fs/9p/caches overwriting itself 9p: clean up comment typos 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN net/9p: fix double req put in p9_fd_cancelled net/9p: Fix buffer overflow in USB transport layer fs/9p: Add p9_debug(VFS) in d_revalidate fs/9p: Invalidate dentry if inode type change detected in cached mode fs/9p: Refresh metadata in d_revalidate for uncached mode too	2025-10-09 11:56:59 -07:00
Linus Torvalds	829745b75a	finish_no_open calling conventions change Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaNh82wAKCRBZ7Krx/gZQ 67qmAP9l34sMvmagYDve+C+BtndI5FO2N8vHCB0VuWpgDvM7XgD/Zy4Iuup0fePh paGQgpwjYQ7/ghc4KjOUAw4EpfJ+SAE= =cjj2 -----END PGP SIGNATURE----- Merge tag 'pull-finish_no_open' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull finish_no_open updates from Al Viro: "finish_no_open calling conventions change to simplify callers" * tag 'pull-finish_no_open' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: slightly simplify nfs_atomic_open() simplify gfs2_atomic_open() simplify fuse_atomic_open() simplify nfs_atomic_open_v23() simplify vboxsf_dir_atomic_open() simplify cifs_atomic_open() 9p: simplify v9fs_vfs_atomic_open_dotl() 9p: simplify v9fs_vfs_atomic_open() allow finish_no_open(file, ERR_PTR(-E...))	2025-10-03 10:59:31 -07:00
Randall P. Embry	528f218b31	9p: sysfs_init: don't hardcode error to ENOMEM v9fs_sysfs_init() always returned -ENOMEM on failure; return the actual sysfs_create_group() error instead. Signed-off-by: Randall P. Embry <rpembry@gmail.com> Message-ID: <20250926-v9fs_misc-v1-3-a8b3907fc04d@codewreck.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-09-27 21:44:38 +09:00
Randall P. Embry	86db0c32f1	9p: fix /sys/fs/9p/caches overwriting itself caches_show() overwrote its buffer on each iteration, so only the last cache tag was visible in sysfs output. Properly append with snprintf(buf + count, …). Signed-off-by: Randall P. Embry <rpembry@gmail.com> Message-ID: <20250926-v9fs_misc-v1-2-a8b3907fc04d@codewreck.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-09-27 21:44:36 +09:00
Randall P. Embry	623fa18f6c	9p: clean up comment typos Fix a few minor typos in comments (e.g. "trasnport" → "transport"). Signed-off-by: Randall P. Embry <rpembry@gmail.com> Message-ID: <20250926-v9fs_misc-v1-1-a8b3907fc04d@codewreck.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-09-27 21:44:31 +09:00
Al Viro	f681e72e27	9p: simplify v9fs_vfs_atomic_open_dotl() again, preexisting aliases will always be positive Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-16 23:59:38 -04:00
Al Viro	fb3d71972b	9p: simplify v9fs_vfs_atomic_open() if v9fs_vfs_lookup() returns a preexisting alias, it is guaranteed to be positive. IOW, in that case we will immediately return finish_no_open(), leaving only the case res == NULL past that point. Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-16 23:59:38 -04:00
Mateusz Guzik	f99b391778	fs: rename generic_delete_inode() and generic_drop_inode() generic_delete_inode() is rather misleading for what the routine is doing. inode_just_drop() should be much clearer. The new naming is inconsistent with generic_drop_inode(), so rename that one as well with inode_ as the suffix. No functional changes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-15 16:09:42 +02:00
Tingmao Wang	c667c54c58	fs/9p: Add p9_debug(VFS) in d_revalidate This was a useful debugging / validation aid, and can explain why a GETATTR request is made. Signed-off-by: Tingmao Wang <m@maowtm.org> Message-ID: <00829a99549e33d26139fa4d756c466629f13e00.1743956147.git.m@maowtm.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-08-23 15:34:46 +09:00
Tingmao Wang	0172a93474	fs/9p: Invalidate dentry if inode type change detected in cached mode This is an extension of the last commit to cached mode as well. While server-side changes when using cached mode is not expected, when it does happen we can get things like -EOPNOTSUPP. With this change at least when we realize the inode has updated (for example after a `touch` on the client), we can get a new one. Signed-off-by: Tingmao Wang <m@maowtm.org> Message-ID: <01afd3c77d5cda181780dc931baa8f3fc54562c8.1743956147.git.m@maowtm.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-08-23 15:34:46 +09:00
Tingmao Wang	290434474c	fs/9p: Refresh metadata in d_revalidate for uncached mode too Currently if another process keeps a file open, due to existing dentry in the dcache, other processes will not see updated metadata of that file if it is changed on the server, even in uncached mode. This can also manifest as -ENODATA when reading a file that has shrunk on the server (even if it's re-opened in another process), or -ENOTSUPP if the file has changed type (e.g. regular file to directory) on the server. We can end up in a situation where both `readdir` or `read` fails until the file is closed by all processes using it. This commit fixes that, and invalidates the dentry altogether if the inode type is changed (for uncached mode). Signed-off-by: Tingmao Wang <m@maowtm.org> Message-ID: <bfac417f65cc1d6812be822f8913f0d4ba0c1052.1743956147.git.m@maowtm.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-08-23 15:34:46 +09:00
Linus Torvalds	7031769e10	vfs-6.17-rc1.mmap_prepare -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO pcYE40CacaekD8rFWwYUyszpgmv6ewc= =wCwp -----END PGP SIGNATURE----- Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in `c84bf6dd2b` ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit `bb666b7c27` ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_mmap() users to .mmap_prepare() fs: convert simple use of generic_file__mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare	2025-07-28 13:43:25 -07:00
Lorenzo Stoakes	9d5403b103	fs: convert most other generic_file_*mmap() users to .mmap_prepare() Update nearly all generic_file_mmap() and generic_file_readonly_mmap() callers to use generic_file_mmap_prepare() and generic_file_readonly_mmap_prepare() respectively. We update blkdev, 9p, afs, erofs, ext2, nfs, ntfs3, smb, ubifs and vboxsf file systems this way. Remaining users we cannot yet update are ecryptfs, fuse and cramfs. The former two are nested file systems that must support any underlying file ssytem, and cramfs inserts a mixed mapping which currently requires a VMA. Once all file systems have been converted to mmap_prepare(), we can then update nested file systems. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/08db85970d89b17a995d2cffae96fb4cc462377f.1750099179.git.lorenzo.stoakes@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-19 13:56:57 +02:00
Lorenzo Stoakes	951ea2f484	fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() Since commit `c84bf6dd2b` ("mm: introduce new .mmap_prepare() file callback"), the f_op->mmap() hook has been deprecated in favour of f_op->mmap_prepare(). We have provided generic .mmap_prepare() equivalents, so update all file systems that specify these directly in their file_operations structures. This updates 9p, adfs, affs, bfs, fat, hfs, hfsplus, hostfs, hpfs, jffs2, jfs, minix, omfs, ramfs and ufs file systems directly. It updates generic_ro_fops which impacts qnx4, cramfs, befs, squashfs, frebxfs, qnx6, efs, romfs, erofs and isofs file systems. There are remaining file systems which use generic hooks in a less direct way which we address in a subsequent commit. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/c7dc90e44a9e75e750939ea369290d6e441a18e6.1750099179.git.lorenzo.stoakes@oracle.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-06-17 13:47:45 +02:00
Al Viro	61a4fa39a3	9p: don't bother with always_delete_dentry just set DCACHE_DONTCACHE for "don't cache" mounts... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-06-11 13:41:05 -04:00
Al Viro	05fb0e6664	new helper: set_default_d_op() ... to be used instead of manually assigning to ->s_d_op. All in-tree filesystem converted (and field itself is renamed, so any out-of-tree ones in need of conversion will be caught by compiler). Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-06-10 22:21:16 -04:00
Linus Torvalds	0fb34422b5	vfs-6.16-rc1.netfs -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPUAAKCRCRxhvAZXjc ouMEAQCrviYPG/WMtPTH7nBIbfVQTfNEXt/TvN7u7OjXb+RwRAEAwe9tLy4GrS/t GuvUPWAthbhs77LTvxj6m3Gf49BOVgQ= =6FqN -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs updates from Christian Brauner: - The main API document has been extensively updated/rewritten - Fix an oops in write-retry due to mis-resetting the I/O iterator - Fix the recording of transferred bytes for short DIO reads - Fix a request's work item to not require a reference, thereby avoiding the need to get rid of it in BH/IRQ context - Fix waiting and waking to be consistent about the waitqueue used - Remove NETFS_SREQ_SEEK_DATA_READ, NETFS_INVALID_WRITE, NETFS_ICTX_WRITETHROUGH, NETFS_READ_HOLE_CLEAR, NETFS_RREQ_DONT_UNLOCK_FOLIOS, and NETFS_RREQ_BLOCKED - Reorder structs to eliminate holes - Remove netfs_io_request::ractl - Only provide proc_link field if CONFIG_PROC_FS=y - Remove folio_queue::marks3 - Fix undifferentiation of DIO reads from unbuffered reads * tag 'vfs-6.16-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs: Fix undifferentiation of DIO reads from unbuffered reads netfs: Fix wait/wake to be consistent about the waitqueue used netfs: Fix the request's work item to not require a ref netfs: Fix setting of transferred bytes with short DIO reads netfs: Fix oops in write-retry from mis-resetting the subreq iterator fs/netfs: remove unused flag NETFS_RREQ_BLOCKED fs/netfs: remove unused flag NETFS_RREQ_DONT_UNLOCK_FOLIOS folio_queue: remove unused field `marks3` fs/netfs: declare field `proc_link` only if CONFIG_PROC_FS=y fs/netfs: remove `netfs_io_request.ractl` fs/netfs: reorder struct fields to eliminate holes fs/netfs: remove unused enum choice NETFS_READ_HOLE_CLEAR fs/netfs: remove unused flag NETFS_ICTX_WRITETHROUGH fs/netfs: remove unused source NETFS_INVALID_WRITE fs/netfs: remove unused flag NETFS_SREQ_SEEK_DATA_READ	2025-06-02 15:04:06 -07:00
David Howells	db26d62d79	netfs: Fix undifferentiation of DIO reads from unbuffered reads On cifs, "DIO reads" (specified by O_DIRECT) need to be differentiated from "unbuffered reads" (specified by cache=none in the mount parameters). The difference is flagged in the protocol and the server may behave differently: Windows Server will, for example, mandate that DIO reads are block aligned. Fix this by adding a NETFS_UNBUFFERED_READ to differentiate this from NETFS_DIO_READ, parallelling the write differentiation that already exists. cifs will then do the right thing. Fixes: `016dc8516a` ("netfs: Implement unbuffered/DIO read support") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/3444961.1747987072@warthog.procyon.org.uk Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> cc: Steve French <sfrench@samba.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-23 10:35:03 +02:00
David Howells	20d72b00ca	netfs: Fix the request's work item to not require a ref When the netfs_io_request struct's work item is queued, it must be supplied with a ref to the work item struct to prevent it being deallocated whilst on the queue or whilst it is being processed. This is tricky to manage as we have to get a ref before we try and queue it and then we may find it's already queued and is thus already holding a ref - in which case we have to try and get rid of the ref again. The problem comes if we're in BH or IRQ context and need to drop the ref: if netfs_put_request() reduces the count to 0, we have to do the cleanup - but the cleanup may need to wait. Fix this by adding a new work item to the request, ->cleanup_work, and dispatching that when the refcount hits zero. That can then synchronously cancel any outstanding work on the main work item before doing the cleanup. Adding a new work item also deals with another problem upstream where it's sometimes changing the work func in the put function and requeuing it - which has occasionally in the past caused the cleanup to happen incorrectly. As a bonus, this allows us to get rid of the 'was_async' parameter from a bunch of functions. This indicated whether the put function might not be permitted to sleep. Fixes: `3d3c950467` ("netfs: Provide readahead and readpage netfs helpers") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250519090707.2848510-4-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Steve French <stfrench@microsoft.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-21 14:35:20 +02:00
Matthew Wilcox (Oracle)	03ddd7725e	9p: Add a migrate_folio method The migration code used to be able to migrate dirty 9p folios by writing them back using writepage. When the writepage method was removed, we neglected to add a migrate_folio method, which means that dirty 9p folios have been unmovable ever since. This reduced our success at defragmenting memory on machines which use 9p heavily. Fixes: `80105ed2fd` (9p: Use netfslib read/write_iter) Cc: stable@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: v9fs@lists.linux.dev Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250402150005.2309458-2-willy@infradead.org Acked-by: Dominique Martinet <asmadeus@codewreck.org> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 09:36:48 +02:00
Linus Torvalds	bdafff62ae	9p update for 6.15-rc1 - fix handling of bogus (negative/too long) replies - fix crash on mkdir with ACLs (... looks like nobody is using ACLs with semi-recent kernels...) - ipv6 support for trans=tcp - minor concurrency fix to make syzbot happy - minor cleanup -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmfuAoIACgkQq06b7GqY 5nBb9w/+K9WnU4MdSTFSXDJ+ZZTY//fPpFaUTqHl1hTeRjmIBtBdngy9ASvnPrPj n6DHnd+qkdFV6cMvs5wPUskRxJZuRDugzZMAd6yzjJoRNPmNFN2Ux7EXWEdFwvFG mk4EJtzgiZhp7XWlNzQeMziuDmMZJijzLsd4zVYNo9fNKEh5jLKjKWyHTVRxfuCc i22Y8oUgcghK0YSSLoL59xF4nRrvn57DBF3wnrW6pqVvVQ05NJRH4fNgXp4wW497 jxQq01ela7IgNUoMgib7F0ov1fu8pSEd95T+fzcqynZCePQ9rzDbvt3MR7rjJuqo /VXwW7N3KT6DrQG6Wu21B9VcfBeWjdbtJ/GWGVp8d2iP04Sv0escx53qETZSD0iZ pMIZLthJuXlq9dmxZ/j+BPLlbm7uAFPbP15/O9Un5xVvrisANFm1TPvM77btnrEP KovWfooheoUrK6DmkKbkzS5HJH2ko4CASAG7c8GL+R1hXwVDswC06cecyvXaKQQK Um4nOe59hRqbqWXmIEs4jssoUjfg8MfuX71DvX0p6+r1WR+eySieG2HiTz/mTj0q /27cCWlAvjYxa42opxASAD1/HvW2tZfcPKtSQbh/3s0FBpTVqbof3fxmnTjcb0Po V7WpuRSD7DnmawjbQQLXznUQokagO23/ySO1vARnluKyGwsn5yI= =Q0mE -----END PGP SIGNATURE----- Merge tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: - fix handling of bogus (negative/too long) replies - fix crash on mkdir with ACLs (... looks like nobody is using ACLs with semi-recent kernels...) - ipv6 support for trans=tcp - minor concurrency fix to make syzbot happy - minor cleanup * tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux: docs: fs/9p: Add missing "not" in cache documentation 9p: Use hashtable.h for hash_errmap Documentation/fs/9p: fix broken link 9p/trans_fd: mark concurrent read and writes to p9_conn->err 9p/net: return error on bogus (longer than requested) replies 9p/net: fix improper handling of bogus negative read/write replies fs/9p: fix NULL pointer dereference on mkdir net/9p/fd: support ipv6 for trans=tcp	2025-04-03 15:35:46 -07:00
Christian Schoenebeck	3f61ac7c65	fs/9p: fix NULL pointer dereference on mkdir When a 9p tree was mounted with option 'posixacl', parent directory had a default ACL set for its subdirectories, e.g.: setfacl -m default:group:simpsons:rwx parentdir then creating a subdirectory crashed 9p client, as v9fs_fid_add() call in function v9fs_vfs_mkdir_dotl() sets the passed 'fid' pointer to NULL (since `dafbe68973`) even though the subsequent v9fs_set_create_acl() call expects a valid non-NULL 'fid' pointer: [ 37.273191] BUG: kernel NULL pointer dereference, address: 0000000000000000 ... [ 37.322338] Call Trace: [ 37.323043] <TASK> [ 37.323621] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) [ 37.324448] ? page_fault_oops (arch/x86/mm/fault.c:714) [ 37.325532] ? search_module_extables (kernel/module/main.c:3733) [ 37.326742] ? p9_client_walk (net/9p/client.c:1165) 9pnet [ 37.328006] ? search_bpf_extables (kernel/bpf/core.c:804) [ 37.329142] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:686 arch/x86/mm/fault.c:1488 arch/x86/mm/fault.c:1538) [ 37.330196] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:574) [ 37.331330] ? p9_client_walk (net/9p/client.c:1165) 9pnet [ 37.332562] ? v9fs_fid_xattr_get (fs/9p/xattr.c:30) 9p [ 37.333824] v9fs_fid_xattr_set (fs/9p/fid.h:23 fs/9p/xattr.c:121) 9p [ 37.335077] v9fs_set_acl (fs/9p/acl.c:276) 9p [ 37.336112] v9fs_set_create_acl (fs/9p/acl.c:307) 9p [ 37.337326] v9fs_vfs_mkdir_dotl (fs/9p/vfs_inode_dotl.c:411) 9p [ 37.338590] vfs_mkdir (fs/namei.c:4313) [ 37.339535] do_mkdirat (fs/namei.c:4336) [ 37.340465] __x64_sys_mkdir (fs/namei.c:4354) [ 37.341455] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) [ 37.342447] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fix this by simply swapping the sequence of these two calls in v9fs_vfs_mkdir_dotl(), i.e. calling v9fs_set_create_acl() before v9fs_fid_add(). Fixes: `dafbe68973` ("9p fid refcount: cleanup p9_fid_put calls") Reported-by: syzbot+5b667f9a1fee4ba3775a@syzkaller.appspotmail.com Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com> Message-ID: <E1tsiI6-002iMG-Kh@kylie.crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>	2025-03-17 07:03:11 +09:00
NeilBrown	88d5baf690	Change inode_operations.mkdir to return struct dentry * Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-27 20:00:17 +01:00
Linus Torvalds	d3d90cc289	Provide stable parent and name to ->d_revalidate() instances Most of the filesystem methods where we care about dentry name and parent have their stability guaranteed by the callers; ->d_revalidate() is the major exception. It's easy enough for callers to supply stable values for expected name and expected parent of the dentry being validated. That kills quite a bit of boilerplate in ->d_revalidate() instances, along with a bunch of races where they used to access ->d_name without sufficient precautions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5gkoQAKCRBZ7Krx/gZQ 6w9FAP4nyxNNWMjE1TwuWR/DNDMYYuw/qn/miZ88B5BUM8hzqgD/W2SjRvcbSaIm xSIYpbtKgtqNU34P1PU+dBvL8Utz2AE= =TWY8 -----END PGP SIGNATURE----- Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs d_revalidate updates from Al Viro: "Provide stable parent and name to ->d_revalidate() instances Most of the filesystem methods where we care about dentry name and parent have their stability guaranteed by the callers; ->d_revalidate() is the major exception. It's easy enough for callers to supply stable values for expected name and expected parent of the dentry being validated. That kills quite a bit of boilerplate in ->d_revalidate() instances, along with a bunch of races where they used to access ->d_name without sufficient precautions" * tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: fix ->rename_sem exclusion orangefs_d_revalidate(): use stable parent inode and name passed by caller ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller nfs: fix ->d_revalidate() UAF on ->d_name accesses nfs{,4}_lookup_validate(): use stable parent inode passed by caller gfs2_drevalidate(): use stable parent inode and name passed by caller fuse_dentry_revalidate(): use stable parent inode and name passed by caller vfat_revalidate{,_ci}(): use stable parent inode passed by caller exfat_d_revalidate(): use stable parent inode passed by caller fscrypt_d_revalidate(): use stable parent inode passed by caller ceph_d_revalidate(): propagate stable name down into request encoding ceph_d_revalidate(): use stable parent inode passed by caller afs_d_revalidate(): use stable name and parent inode passed by caller Pass parent directory inode and expected name to ->d_revalidate() generic_ci_d_compare(): use shortname_storage ext4 fast_commit: make use of name_snapshot primitives dissolve external_name.u into separate members make take_dentry_name_snapshot() lockless dcache: back inline names with a struct-wrapped array of unsigned long make sure that DNAME_INLINE_LEN is a multiple of word size	2025-01-30 09:13:35 -08:00
Al Viro	30d61efe11	9p: fix ->rename_sem exclusion 9p wants to be able to build a path from given dentry to fs root and keep it valid over a blocking operation. ->s_vfs_rename_mutex would be a natural candidate, but there are places where we need that and where we have no way to tell if ->s_vfs_rename_mutex is already held deeper in callchain. Moreover, it's only held for cross-directory renames; name changes within the same directory happen without it. Solution: * have d_move() done in ->rename() rather than in its caller * maintain a 9p-private rwsem (per-filesystem) * hold it exclusive over the relevant part of ->rename() * hold it shared over the places where we want the path. That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move() and d_exchange() calls under filesystem's control. However, there's also __d_unalias(), which isn't covered by any of that. If ->lookup() hits a directory inode with preexisting dentry elsewhere (due to e.g. rename done on server behind our back), d_splice_alias() called by ->lookup() will move/rename that alias. Add a couple of optional methods, so that __d_unalias() would do if alias->d_op->d_unalias_trylock != NULL if (!alias->d_op->d_unalias_trylock(alias)) fail (resulting in -ESTALE from lookup) __d_move(...) if alias->d_op->d_unalias_unlock != NULL alias->d_unalias_unlock(alias) where it currently does __d_move(). 9p instances do down_write_trylock() and up_write() of ->rename_mutex. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-01-27 19:25:24 -05:00
Al Viro	5be1fa8abd	Pass parent directory inode and expected name to ->d_revalidate() ->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-01-27 19:25:23 -05:00
David Howells	e2d46f2ec3	netfs: Change the read result collector to only use one work item Change the way netfslib collects read results to do all the collection for a particular read request using a single work item that walks along the subrequest queue as subrequests make progress or complete, unlocking folios progressively rather than doing the unlock in parallel as parallel requests come in. The code is remodelled to be more like the write-side code, though only using a single stream. This makes it more directly comparable and thus easier to duplicate fixes between the two sides. This has a number of advantages: (1) It's simpler. There doesn't need to be a complex donation mechanism to handle mismatches between the size and alignment of subrequests and folios. The collector unlocks folios as the subrequests covering each complete. (2) It should cause less scheduler overhead as there's a single work item in play unlocking pages in parallel when a read gets split up into a lot of subrequests instead of one per subrequest. Whilst the parallellism is nice in theory, in practice, the vast majority of loads are sequential reads of the whole file, so committing a bunch of threads to unlocking folios out of order doesn't help in those cases. (3) It should make it easier to implement content decryption. A folio cannot be decrypted until all the requests that contribute to it have completed - and, again, most loads are sequential and so, most of the time, we want to begin decryption sequentially (though it's great if the decryption can happen in parallel). There is a disadvantage in that we're losing the ability to decrypt and unlock things on an as-things-arrive basis which may affect some applications. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-28-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-12-20 22:34:08 +01:00

1 2 3 4 5 ...

861 Commits