linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-12 16:18:45 +02:00

Author	SHA1	Message	Date
Linus Torvalds	53b6156308	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmnvhacACgkQnJ2qBz9k QNn5hwgAvh1eiRgLrIQ/dd+jpOQ2SbwAN0XiD/SI+verBhF+SC7Y6idGNz0LvYEp OQ+OZmMJK/HOKzcUHAK13LtDUmBU/qTtfEFysiMqJ48QSmHlrYVGuDMSdWhjhZ9r 2rcQ/aGf4C/rLxSGEsH9Rv3WOp/7ZSUTBLB1RojIJmYnjmXsFr1EHpUZyh7ACun7 at18okN4HIjxpwtuB/A15So3vQIaI6tTkH01v11nMSYh1OAaZgs9m+4p3+ZzuW1Y ecAWtuHLDfuLqblg0FxhxGx0tiHODCTWmFghTgVAaXs8OjTpLAEzzOQ7VloGqKdB a/er1rYJOeyV8x3R+rvTL4R2b+d4xA== =2Au7 -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Three fixes for fsnotify / fanotify" * tag 'fsnotify_for_v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: fix inode reference leak in fsnotify_recalc_mask() fanotify: Fix spelling mistake "enforecement" -> "enforcement" fanotify: fix false positive on permission events	2026-04-27 16:40:24 -07:00
Linus Torvalds	292a2bcd17	Fixing livelocks in shrink_dcache_tree() If shrink_dcache_tree() finds a dentry in the middle of being killed by another thread, it has to wait until the victim finishes dying, gets detached from the tree and ceases to pin its parent. The way we used to deal with that amounted to busy-wait; unfortunately, it's not just inefficient but can lead to reliably reproducible hard livelocks. Solved by having shrink_dentry_tree() attach a completion to such dentry, with dentry_unlist() calling complete() on all objects attached to it. With a bit of care it can be done without growing struct dentry or adding overhead in normal case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaec/ugAKCRBZ7Krx/gZQ 6y77AP9l/IL36Ic45p45FCTirA6LFyIvZ5Gixm3Xk64Pi1Y3nAEAqQ5UOVnJc907 RyrCpcI6vnO8a67MptchOxK9d1bIxQw= =A8JL -----END PGP SIGNATURE----- Merge tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache busy loop updates from Al Viro: "Fix livelocks in shrink_dcache_tree() If shrink_dcache_tree() finds a dentry in the middle of being killed by another thread, it has to wait until the victim finishes dying, gets detached from the tree and ceases to pin its parent. The way we used to deal with that amounted to busy-wait; unfortunately, it's not just inefficient but can lead to reliably reproducible hard livelocks. Solved by having shrink_dentry_tree() attach a completion to such dentry, with dentry_unlist() calling complete() on all objects attached to it. With a bit of care it can be done without growing struct dentry or adding overhead in normal case" * tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of busy-waiting in shrink_dcache_tree() dcache.c: more idiomatic "positives are not allowed" sanity checks struct dentry: make ->d_u anonymous for_each_alias(): helper macro for iterating through dentries of given inode	2026-04-21 07:30:44 -07:00
Amir Goldstein	4aca914ac1	fsnotify: fix inode reference leak in fsnotify_recalc_mask() fsnotify_recalc_mask() fails to handle the return value of __fsnotify_recalc_mask(), which may return an inode pointer that needs to be released via fsnotify_drop_object() when the connector's HAS_IREF flag transitions from set to cleared. This manifests as a hung task with the following call trace: INFO: task umount:1234 blocked for more than 120 seconds. Call Trace: __schedule schedule fsnotify_sb_delete generic_shutdown_super kill_anon_super cleanup_mnt task_work_run do_exit do_group_exit The race window that triggers the iref leak: Thread A (adding mark) Thread B (removing mark) ────────────────────── ──────────────────────── fsnotify_add_mark_locked(): fsnotify_add_mark_list(): spin_lock(conn->lock) add mark_B(evictable) to list spin_unlock(conn->lock) return /* ---- gap: no lock held ---- / fsnotify_detach_mark(mark_A): spin_lock(mark_A->lock) clear ATTACHED flag on mark_A spin_unlock(mark_A->lock) fsnotify_put_mark(mark_A) fsnotify_recalc_mask(): spin_lock(conn->lock) __fsnotify_recalc_mask(): / mark_A skipped: ATTACHED cleared / / only mark_B(evictable) remains / want_iref = false has_iref = true / not yet cleared / -> HAS_IREF transitions true -> false -> returns inode pointer spin_unlock(conn->lock) / BUG: return value discarded! * iput() and fsnotify_put_sb_watched_objects() * are never called */ Fix this by deferring the transition true -> false of HAS_IREF flag from fsnotify_recalc_mask() (Thread A) to fsnotify_put_mark() (thread B). Fixes: `c3638b5b13` ("fsnotify: allow adding an inode mark without pinning inode") Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/CAOQ4uxiPsbHb0o5voUKyPFMvBsDkG914FYDcs4C5UpBMNm0Vcg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz>	2026-04-20 19:16:55 +02:00
Ethan Carter Edwards	ae974ca6f0	fanotify: Fix spelling mistake "enforecement" -> "enforcement" There is a spelling mistake in a comment. Fix it. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Link: https://patch.msgid.link/20260418-fanotify-typo-v1-1-03ea48cb44ba@ethancedwards.com Signed-off-by: Jan Kara <jack@suse.cz>	2026-04-20 09:40:32 +02:00
Miklos Szeredi	7746e3bd4c	fanotify: fix false positive on permission events fsnotify_get_mark_safe() may return false for a mark on an unrelated group, which results in bypassing the permission check. Fix by skipping over detached marks that are not in the current group. CC: stable@vger.kernel.org Fixes: `abc77577a6` ("fsnotify: Provide framework for dropping SRCU lock in ->handle_event") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260410144950.156160-1-mszeredi@redhat.com Signed-off-by: Jan Kara <jack@suse.cz>	2026-04-16 13:31:04 +02:00
Linus Torvalds	c4ef28fe97	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmnfYyQACgkQnJ2qBz9k QNnbiAgAvGkEHJpyQDD+Z1MmqK4S49vpCapM8D9W+xuGXaHWrVRey5Lt1Z+NeNch zOjWllYMlsp38Dyr9Btl+C2egBBrVOQDPoZxWOWrqFiduMug6TIJajImiIFovVue XQ/fbhVrJGKGGDpgFsWztGq5Oef8vjHJ/jXeQ9OWn8gS0jvC8caZCYWct58fOgq8 coT3VLJDI+h/3cf+Bfe3M3Lc62iLfiUChytGhFgh09xmyStigGR/uWVKhw+nfKrh NwB5EQDL8HR+yPlBuFSP4Gyxf/G/GKZFqqYilgqOS4Ms4oepxq17MH4DwyJmePIP yR3H5jiSR077JqtFF4W6x3UXr7+6fg== =avB+ -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "A couple of small fsnotify fixes and cleanups" * tag 'fsnotify_for_v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: replace deprecated strcpy in fanotify_info_copy_{name,name2} fsnotify: inotify: pass mark connector to fsnotify_recalc_mask() fanotify: call fanotify_events_supported() before path_permission() and security_path_notify() fanotify: avoid/silence premature LSM capability checks inotify: fix watch count leak when fsnotify_add_inode_mark_locked() fails	2026-04-15 19:18:51 -07:00
Al Viro	408d8af01f	for_each_alias(): helper macro for iterating through dentries of given inode Most of the places using d_alias are loops iterating through all aliases for given inode; introduce a helper macro (for_each_alias(dentry, inode)) and convert open-coded instances of such loop to it. They are easier to read that way and it reduces the noise on the next steps. You _must_ hold inode->i_lock over that thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-04-02 03:45:02 -04:00
Thorsten Blum	0fdbe84553	fanotify: replace deprecated strcpy in fanotify_info_copy_{name,name2} strcpy() has been deprecated [1] because it performs no bounds checking on the destination buffer, which can lead to buffer overflows. Replace it with the safer strscpy(). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20260321210544.519259-4-thorsten.blum@linux.dev Signed-off-by: Jan Kara <jack@suse.cz>	2026-03-23 10:10:59 +01:00
Jeff Layton	0b2600f81c	treewide: change inode->i_ino from unsigned long to u64 On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-06 14:31:28 +01:00
Sun Jian	4520b96b81	fsnotify: inotify: pass mark connector to fsnotify_recalc_mask() fsnotify_recalc_mask() expects a plain struct fsnotify_mark_connector *, but inode->i_fsnotify_marks is an __rcu pointer. Use fsn_mark->connector instead to avoid sparse "different address spaces" warnings. Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com> Link: https://patch.msgid.link/20260214051217.1381363-1-sun.jian.kdev@gmail.com Signed-off-by: Jan Kara <jack@suse.cz>	2026-02-26 15:19:31 +01:00
Ondrej Mosnacek	66052a768d	fanotify: call fanotify_events_supported() before path_permission() and security_path_notify() The latter trigger LSM (e.g. SELinux) checks, which will log a denial when permission is denied, so it's better to do them after validity checks to avoid logging a denial when the operation would fail anyway. Fixes: `0b3b094ac9` ("fanotify: Disallow permission events for proc filesystem") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20260216150625.793013-3-omosnace@redhat.com Signed-off-by: Jan Kara <jack@suse.cz>	2026-02-26 15:18:41 +01:00
Ondrej Mosnacek	0d5ee33734	fanotify: avoid/silence premature LSM capability checks Make sure calling capable()/ns_capable() actually leads to access denied when false is returned, because these functions emit an audit record when a Linux Security Module denies the capability, which makes it difficult to avoid allowing/silencing unnecessary permissions in security policies (namely with SELinux). Where the return value just used to set a flag, use the non-auditing ns_capable_noaudit() instead. Fixes: `7cea2a3c50` ("fanotify: support limited functionality for unprivileged users") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260216150625.793013-2-omosnace@redhat.com Signed-off-by: Jan Kara <jack@suse.cz>	2026-02-26 15:18:31 +01:00
Chia-Ming Chang	6a320935fa	inotify: fix watch count leak when fsnotify_add_inode_mark_locked() fails When fsnotify_add_inode_mark_locked() fails in inotify_new_watch(), the error path calls inotify_remove_from_idr() but does not call dec_inotify_watches() to undo the preceding inc_inotify_watches(). This leaks a watch count, and repeated failures can exhaust the max_user_watches limit with -ENOSPC even when no watches are active. Prior to commit `1cce1eea0a` ("inotify: Convert to using per-namespace limits"), the watch count was incremented after fsnotify_add_mark_locked() succeeded, so this path was not affected. The conversion moved inc_inotify_watches() before the mark insertion without adding the corresponding rollback. Add the missing dec_inotify_watches() call in the error path. Fixes: `1cce1eea0a` ("inotify: Convert to using per-namespace limits") Cc: stable@vger.kernel.org Signed-off-by: Chia-Ming Chang <chiamingc@synology.com> Signed-off-by: robbieko <robbieko@synology.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://patch.msgid.link/20260224093442.3076294-1-chiamingc@synology.com Signed-off-by: Jan Kara <jack@suse.cz>	2026-02-26 15:11:50 +01:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Jan Kara	a05fc7edd9	fsnotify: Use connector list for destroying inode marks Instead of iterating all inodes belonging to a superblock to find inode marks and remove them on umount, iterate all inode connectors for the superblock. This may be substantially faster since there are generally much less inodes with fsnotify marks than all inodes. It also removes one use of sb->s_inodes list which we strive to ultimately remove. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>	2026-01-23 13:26:37 +01:00
Jan Kara	94bd01253c	fsnotify: Track inode connectors for a superblock Introduce a linked list tracking all inode connectors for a superblock. We will use this list when the superblock is getting shutdown to properly clean up all the inode marks instead of relying on scanning all inodes in the superblock which can get rather slow. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>	2026-01-23 13:26:20 +01:00
Linus Torvalds	9a903e6d96	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmlEKB4ACgkQnJ2qBz9k QNkY9gf6Av2Dz1zJiPdICxLBWxFYIWmw+tqzV9ZjpKkSV8K0jJ2wqfoqbh2LZ8AN Lh0uUMw8wvxQYtnEcvrKHVwd6zjng2GtzIi8nO6IxeBOQTwyuxxGvj6YfKxD9ffg AgpJ1oPmJz6/UiBeRGX/IobXkh3ZHHbP8M094RLjoHUekbzz0bIMTBpkXXZK04Bs iysFptvASPQ14D/bXou5HwP/egET+VprCgyGfQzsyQELK+Cijt9P07aVk7mdMyv2 E45atP97TjtgJE018WMKL6LpO8j2mma7a2K/CosL9MslucuLfL8+QX+i2ZVhyuNo akchA3L1ugAfkxUDRVMrbim/rDBAGA== =tktL -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Two fsnotify fixes. The fix from Ahelenia makes sure we generate event when modifying inode flags, the fix from Amir disables sending of events from device inodes to their parent directory as it could concievably create a usable side channel attack in case of some devices and so far we aren't aware of anybody depending on the functionality" * tag 'fsnotify_for_v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs: send fsnotify_xattr()/IN_ATTRIB from vfs_fileattr_set()/chattr(1) fsnotify: do not generate ACCESS/MODIFY events on child for special files	2025-12-19 07:41:17 +12:00
Amir Goldstein	635bc4def0	fsnotify: do not generate ACCESS/MODIFY events on child for special files inotify/fanotify do not allow users with no read access to a file to subscribe to events (e.g. IN_ACCESS/IN_MODIFY), but they do allow the same user to subscribe for watching events on children when the user has access to the parent directory (e.g. /dev). Users with no read access to a file but with read access to its parent directory can still stat the file and see if it was accessed/modified via atime/mtime change. The same is not true for special files (e.g. /dev/null). Users will not generally observe atime/mtime changes when other users read/write to special files, only when someone sets atime/mtime via utimensat(). Align fsnotify events with this stat behavior and do not generate ACCESS/MODIFY events to parent watchers on read/write of special files. The events are still generated to parent watchers on utimensat(). This closes some side-channels that could be possibly used for information exfiltration [1]. [1] https://snee.la/pdf/pubs/file-notification-attacks.pdf Reported-by: Sudheendra Raghav Neela <sneela@tugraz.at> CC: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2025-12-15 10:18:46 +01:00
Linus Torvalds	1b5dd29869	vfs-6.19-rc1.fd_prepare.fs -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZwAKCRCRxhvAZXjc op0AAP4oNVJkFyvgKoPos5K2EGFB8M7merGhpYtsOoeg8UK6OwD/UySQErHsXQDR sUDDa5uFOhfrkcfM8REtAN4wF8p5qAc= =QgFD -----END PGP SIGNATURE----- Merge tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fd prepare updates from Christian Brauner: "This adds the FD_ADD() and FD_PREPARE() primitive. They simplify the common pattern of get_unused_fd_flags() + create file + fd_install() that is used extensively throughout the kernel and currently requires cumbersome cleanup paths. FD_ADD() - For simple cases where a file is installed immediately: fd = FD_ADD(O_CLOEXEC, vfio_device_open_file(device)); if (fd < 0) vfio_device_put_registration(device); return fd; FD_PREPARE() - For cases requiring access to the fd or file, or additional work before publishing: FD_PREPARE(fdf, O_CLOEXEC, sync_file->file); if (fdf.err) { fput(sync_file->file); return fdf.err; } data.fence = fd_prepare_fd(fdf); if (copy_to_user((void __user )arg, &data, sizeof(data))) return -EFAULT; return fd_publish(fdf); The primitives are centered around struct fd_prepare. FD_PREPARE() encapsulates all allocation and cleanup logic and must be followed by a call to fd_publish() which associates the fd with the file and installs it into the caller's fdtable. If fd_publish() isn't called, both are deallocated automatically. FD_ADD() is a shorthand that does fd_publish() immediately and never exposes the struct to the caller. I've implemented this in a way that it's compatible with the cleanup infrastructure while also being usable separately. IOW, it's centered around struct fd_prepare which is aliased to class_fd_prepare_t and so we can make use of all the basica guard infrastructure" tag 'vfs-6.19-rc1.fd_prepare.fs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) io_uring: convert io_create_mock_file() to FD_PREPARE() file: convert replace_fd() to FD_PREPARE() vfio: convert vfio_group_ioctl_get_device_fd() to FD_ADD() tty: convert ptm_open_peer() to FD_ADD() ntsync: convert ntsync_obj_get_fd() to FD_PREPARE() media: convert media_request_alloc() to FD_PREPARE() hv: convert mshv_ioctl_create_partition() to FD_ADD() gpio: convert linehandle_create() to FD_PREPARE() pseries: port papr_rtas_setup_file_interface() to FD_ADD() pseries: convert papr_platform_dump_create_handle() to FD_ADD() spufs: convert spufs_gang_open() to FD_PREPARE() papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE() spufs: convert spufs_context_open() to FD_PREPARE() net/socket: convert __sys_accept4_file() to FD_ADD() net/socket: convert sock_map_fd() to FD_ADD() net/kcm: convert kcm_ioctl() to FD_PREPARE() net/handshake: convert handshake_nl_accept_doit() to FD_PREPARE() secretmem: convert memfd_secret() to FD_ADD() memfd: convert memfd_create() to FD_ADD() bpf: convert bpf_token_create() to FD_PREPARE() ...	2025-12-01 17:32:07 -08:00
Linus Torvalds	9368f0f941	vfs-6.19-rc1.inode -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZAAKCRCRxhvAZXjc omMSAP9GLhavxyWQ24Q+49CNWWRQWDY1wTOiUK2BwtIvZ0YEcAD8D1dAiMckL5pC RwEAVA5p+y+qi+bZP0KXCBxQddoTIQM= =zo/J -----END PGP SIGNATURE----- Merge tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...	2025-12-01 09:02:34 -08:00
Christian Brauner	7129098f4f	fanotify: convert fanotify_init() to FD_PREPARE() Christian Brauner <brauner@kernel.org> says: The fix sent in [1] was squashed into this commit. Link: https://lore.kernel.org/20251127201618.2115275-1-kuniyu@google.com [1] Reported-by: syzbot+321168dfa622eda99689@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/6928b121.a70a0220.d98e3.0110.GAE@google.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-8-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-11-28 12:42:31 +01:00
Mateusz Guzik	b4dbfd8653	Coccinelle-based conversion to use ->i_state accessors All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 \| flag2) @@ expression inode, flags; @@ - inode->i_state \|= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-10-20 20:22:26 +02:00
Jakub Acs	a7c4bb43bf	fs/notify: call exportfs_encode_fid with s_umount Calling intotify_show_fdinfo() on fd watching an overlayfs inode, while the overlayfs is being unmounted, can lead to dereferencing NULL ptr. This issue was found by syzkaller. Race Condition Diagram: Thread 1 Thread 2 -------- -------- generic_shutdown_super() shrink_dcache_for_umount sb->s_root = NULL \| \| vfs_read() \| inotify_fdinfo() \| * inode get from mark * \| show_mark_fhandle(m, inode) \| exportfs_encode_fid(inode, ..) \| ovl_encode_fh(inode, ..) \| ovl_check_encode_origin(inode) \| * deref i_sb->s_root * \| \| v fsnotify_sb_delete(sb) Which then leads to: [ 32.133461] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 32.134438] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] [ 32.135032] CPU: 1 UID: 0 PID: 4468 Comm: systemd-coredum Not tainted 6.17.0-rc6 #22 PREEMPT(none) <snip registers, unreliable trace> [ 32.143353] Call Trace: [ 32.143732] ovl_encode_fh+0xd5/0x170 [ 32.144031] exportfs_encode_inode_fh+0x12f/0x300 [ 32.144425] show_mark_fhandle+0xbe/0x1f0 [ 32.145805] inotify_fdinfo+0x226/0x2d0 [ 32.146442] inotify_show_fdinfo+0x1c5/0x350 [ 32.147168] seq_show+0x530/0x6f0 [ 32.147449] seq_read_iter+0x503/0x12a0 [ 32.148419] seq_read+0x31f/0x410 [ 32.150714] vfs_read+0x1f0/0x9e0 [ 32.152297] ksys_read+0x125/0x240 IOW ovl_check_encode_origin derefs inode->i_sb->s_root, after it was set to NULL in the unmount path. Fix it by protecting calling exportfs_encode_fid() from show_mark_fhandle() with s_umount lock. This form of fix was suggested by Amir in [1]. [1]: https://lore.kernel.org/all/CAOQ4uxhbDwhb+2Brs1UdkoF0a3NSdBAOQPNfEHjahrgoKJpLEw@mail.gmail.com/ Fixes: `c45beebfde` ("ovl: support encoding fid from inode with no alias") Signed-off-by: Jakub Acs <acsjakub@amazon.de> Cc: Jan Kara <jack@suse.cz> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-unionfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2025-10-06 16:31:52 +02:00
Linus Torvalds	67f5f11cdf	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmjdDqAACgkQnJ2qBz9k QNmivQf+JFxnxjW1qYfIsQjuQlFDqOH0RYsnhBFTS1oqGKqbbDYEaNsGft3zbhLW 0VbWjUDvO9nRvIFi7SWzNtONuGhwnNEjZBD+q3zrtssNKbk/px/s1Hjexp1P5xl5 pUrqrlDaC3n25pqJygTMIb4DjPAY8kAIrKA2EEtqTTpbDdeHe+gvOtOjhTIkqVza DA85iq9Ytm3PpqDKPLLwUNkCEsfabpxESuXGz4hvoZu72YffKkNZDRfEnkCu3CW9 RniQZvDugw6SKBuqoqGumk6vaxqHgqBZV3Y4Tdi7+hjeZo9NI9KmDmaNL5sDm3pe StTEIa/YIZQiJGIQfTPOTlOT7jAybw== =/Dff -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: - a couple of small cleanups and fixes - implement fanotify watchdog that reports processes that fail to respond to fanotify permission events in a given time * tag 'fsnotify_for_v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: add watchdog for permission events fanotify: Validate the return value of mnt_ns_from_dentry() before dereferencing fsnotify: fix "rewriten"->"rewritten"	2025-10-03 13:23:10 -07:00
Linus Torvalds	b786405685	vfs-6.18-rc1.workqueue -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQYgAKCRCRxhvAZXjc olgGAQDWr4sD7kUt8TxifdAXsQNgyGG8qOUkb/BHHSqJ/5mKvAEAlTwJ+81tgNKT hYYdPyvWdbgW6CnWeiQLi0JjpFvUPQU= =uHwG -----END PGP SIGNATURE----- Merge tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs workqueue updates from Christian Brauner: "This contains various workqueue changes affecting the filesystem layer. Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This replaces the use of system_wq and system_unbound_wq. system_wq is a per-CPU workqueue which isn't very obvious from the name and system_unbound_wq is to be used when locality is not required. So this renames system_wq to system_percpu_wq, and system_unbound_wq to system_dfl_wq. This also adds a new WQ_PERCPU flag to allow the fs subsystem users to explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and WQ_PERCPU flags coexist for one release cycle to allow callers to transition their calls. WQ_UNBOUND will be removed in a next release cycle" * tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: WQ_PERCPU added to alloc_workqueue users fs: replace use of system_wq with system_percpu_wq fs: replace use of system_unbound_wq with system_dfl_wq	2025-09-29 10:27:17 -07:00
Marco Crivellari	7a4f92d39f	fs: replace use of system_unbound_wq with system_dfl_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://lore.kernel.org/20250916082906.77439-2-marco.crivellari@suse.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 16:15:07 +02:00
Miklos Szeredi	b8cf8fda52	fanotify: add watchdog for permission events This is to make it easier to debug issues with AV software, which time and again deadlocks with no indication of where the issue comes from, and the kernel being blamed for the deadlock. Then we need to analyze dumps to prove that the kernel is not in fact at fault. The deadlock comes from recursion: handling the event triggers another permission event, in some roundabout way, obviously, otherwise it would have been found in testing. With this patch a warning is printed when permission event is received by userspace but not answered for more than the timeout specified in /proc/sys/fs/fanotify/watchdog_timeout. The watchdog can be turned off by setting the timeout to zero (which is the default). The timeout is very coarse (T <= t < 2T) but I guess it's good enough for the purpose. Overhead should be minimal. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20250909143053.112171-1-mszeredi@redhat.com Signed-off-by: Jan Kara <jack@suse.cz>	2025-09-11 16:34:50 +02:00
Anderson Nascimento	62e59ffe87	fanotify: Validate the return value of mnt_ns_from_dentry() before dereferencing The function do_fanotify_mark() does not validate if mnt_ns_from_dentry() returns NULL before dereferencing mntns->user_ns. This causes a NULL pointer dereference in do_fanotify_mark() if the path is not a mount namespace object. Fix this by checking mnt_ns_from_dentry()'s return value before dereferencing it. Before the patch $ gcc fanotify_nullptr.c -o fanotify_nullptr $ mkdir A $ ./fanotify_nullptr Fanotify fd: 3 fanotify_mark: Operation not permitted $ unshare -Urm Fanotify fd: 3 Killed int main(void){ int ffd; ffd = fanotify_init(FAN_CLASS_NOTIF \| FAN_REPORT_MNT, 0); if(ffd < 0){ perror("fanotify_init"); exit(EXIT_FAILURE); } printf("Fanotify fd: %d\n",ffd); if(fanotify_mark(ffd, FAN_MARK_ADD \| FAN_MARK_MNTNS, FAN_MNT_ATTACH, AT_FDCWD, "A") < 0){ perror("fanotify_mark"); exit(EXIT_FAILURE); } return 0; } After the patch $ gcc fanotify_nullptr.c -o fanotify_nullptr $ mkdir A $ ./fanotify_nullptr Fanotify fd: 3 fanotify_mark: Operation not permitted $ unshare -Urm Fanotify fd: 3 fanotify_mark: Invalid argument [ 25.694973] BUG: kernel NULL pointer dereference, address: 0000000000000038 [ 25.695006] #PF: supervisor read access in kernel mode [ 25.695012] #PF: error_code(0x0000) - not-present page [ 25.695017] PGD 109a30067 P4D 109a30067 PUD 142b46067 PMD 0 [ 25.695025] Oops: Oops: 0000 [#1] SMP NOPTI [ 25.695032] CPU: 4 UID: 1000 PID: 1478 Comm: fanotify_nullpt Not tainted 6.17.0-rc4 #1 PREEMPT(lazy) [ 25.695040] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 25.695049] RIP: 0010:do_fanotify_mark+0x817/0x950 [ 25.695066] Code: 04 00 00 e9 45 fd ff ff 48 8b 7c 24 48 4c 89 54 24 18 4c 89 5c 24 10 4c 89 0c 24 e8 b3 11 fc ff 4c 8b 54 24 18 4c 8b 5c 24 10 <48> 8b 78 38 4c 8b 0c 24 49 89 c4 e9 13 fd ff ff 8b 4c 24 28 85 c9 [ 25.695081] RSP: 0018:ffffd31c469e3c08 EFLAGS: 00010203 [ 25.695104] RAX: 0000000000000000 RBX: 0000000001000000 RCX: ffff8eb48aebd220 [ 25.695110] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8eb4835e8180 [ 25.695115] RBP: 0000000000000111 R08: 0000000000000000 R09: 0000000000000000 [ 25.695142] R10: ffff8eb48a7d56c0 R11: ffff8eb482bede00 R12: 00000000004012a7 [ 25.695148] R13: 0000000000000110 R14: 0000000000000001 R15: ffff8eb48a7d56c0 [ 25.695154] FS: 00007f8733bda740(0000) GS:ffff8eb61ce5f000(0000) knlGS:0000000000000000 [ 25.695162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.695170] CR2: 0000000000000038 CR3: 0000000136994006 CR4: 00000000003706f0 [ 25.695201] Call Trace: [ 25.695209] <TASK> [ 25.695215] __x64_sys_fanotify_mark+0x1f/0x30 [ 25.695222] do_syscall_64+0x82/0x2c0 ... Fixes: `58f5fbeb36` ("fanotify: support watching filesystems and mounts inside userns") Link: https://patch.msgid.link/CAPhRvkw4ONypNsJrCnxbKnJbYmLHTDEKFC4C_num_5sVBVa8jg@mail.gmail.com Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>	2025-09-05 16:02:55 +02:00
Xichao Zhao	6746c36c94	fsnotify: fix "rewriten"->"rewritten" Trivial fix to spelling mistake in comment text. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Link: https://patch.msgid.link/20250808084213.230592-1-zhao.xichao@vivo.com Signed-off-by: Jan Kara <jack@suse.cz>	2025-09-02 14:37:33 +02:00
Josef Bacik	37b27bd5d6	fs: add an icount_read helper Instead of doing direct access to ->i_count, add a helper to handle this. This will make it easier to convert i_count to a refcount later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/9bc62a84c6b9d6337781203f60837bd98fbc4a96.1756222464.git.josef@toxicpanda.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-01 12:41:09 +02:00
Linus Torvalds	d6084bb815	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmiHoygACgkQnJ2qBz9k QNnQCwf9Et0BcGgtcE9bUPJMlK5AEJ2nsEGT24ajRPcJ9DtcpO5KY8htshPwvRBc KeSW3bjYaBluKK6LmbvDbaDi4lBwFNKh+dfi8PeVwBECNn2DLNv+EOB6VPgxREWw zmC3SL2oh7eGTtJq0h1XeI/s/AFYukOsjKcjoKmT3NyxkCwxYUoY3E1mePutBsW6 /2CjVdQoMUgw1kLcezdccOJi6IkAk5Vfr8gIRFucQiR49YIICYk17bhYBH/Gx5PJ eJkqncq8FLv8gLaaHq5W8lC05nVnpNDFxUxKiN6YcuYUrUC7jFi7ad1EriZP3Zsr y8pDAmVUHgzpiMYiPHhP2zG8xK06TQ== =ZyXo -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "A couple of small improvements for fsnotify subsystem. The most interesting is probably Amir's change modifying the meaning of fsnotify fmode bits (and I spell it out specifically because I know you care about those). There's no change for the common cases of no fsnotify watches or no permission event watches. But when there are permission watches (either for open or for pre-content events) but no FAN_ACCESS_PERM watch (which nobody uses in practice) we are now able optimize away unnecessary cache loads from the read path" * tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook samples: fix building fs-monitor on musl systems fanotify: sanitize handle_type values when reporting fid	2025-07-31 10:31:00 -07:00
Amir Goldstein	0d4c4d4ea4	fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases The most unlikely watched permission event is FAN_ACCESS_PERM, because at the time that it was introduced there were no evictable ignore mark, so subscribing to FAN_ACCESS_PERM would have incured a very high overhead. Yet, when we set the fmode to FMODE_NOTIFY_HSM(), we never skip trying to send FAN_ACCESS_PERM, which is almost always a waste of cycles. We got to this logic because of bundling FAN_OPEN_PERM and FAN_ACCESS_PERM in the same category and because FAN_OPEN_PERM is a commonly used event. By open coding fsnotify_open_perm() in fsnotify_open_perm_and_set_mode(), we no longer need to regard FAN_OPEN_PERM when calculating fmode. This leaves the case of having pre-content events and not having any other permission event in the object masks a more likely case than the other way around. Rework the fmode macros and code so that their meaning now refers only to hooks on an already open file: - FMODE_NOTIFY_NONE() skip all events - FMODE_NOTIFY_ACCESS_PERM() send all permission events including FAN_ACCESS_PERM - FMODE_NOTIFY_HSM() send pre-content permission events Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250708143641.418603-3-amir73il@gmail.com	2025-07-28 18:14:38 +02:00
Amir Goldstein	08da98e1b2	fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook Create helper fsnotify_open_perm_and_set_mode() that moves the fsnotify_open_perm() hook into file_set_fsnotify_mode_from_watchers(). This will allow some more optimizations. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250708143641.418603-2-amir73il@gmail.com	2025-07-28 18:14:38 +02:00
Al Viro	fdfe013347	fix a leak in fcntl_dirnotify() [into #fixes, unless somebody objects] Lifetime of new_dn_mark is controlled by that of its ->fsn_mark, pointed to by new_fsn_mark. Unfortunately, a failure exit had been inserted between the allocation of new_dn_mark and the call of fsnotify_init_mark(), ending up with a leak. Fixes: `1934b21261` "file: reclaim 24 bytes from f_owner" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/20250712171843.GB1880847@ZenIV Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-07-14 10:13:31 +02:00
Amir Goldstein	8631e01c2c	fanotify: sanitize handle_type values when reporting fid Unlike file_handle, type and len of struct fanotify_fh are u8. Traditionally, filesystem return handle_type < 0xff, but there is no enforecement for that in vfs. Add a sanity check in fanotify to avoid truncating handle_type if its value is > 0xff. Fixes: `7cdafe6cc4` ("exportfs: check for error return value from exportfs_encode_*()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250627104835.184495-1-amir73il@gmail.com	2025-06-27 19:17:26 +02:00
Linus Torvalds	db340159f1	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmg4dIsACgkQnJ2qBz9k QNlf7ggAycYHUp9GEkIKtM+kDxSwjcOjJ581/wA3zi3HsgGt/lDDhgeYmJObvoSq g2XcScoMo3ZwjmsO9W5xmr+M9F42y6JIU3ZS4HxD8+TEelRDpL7134+ZIYll2Mdu Z+6TUknX5ve+caNPmJBE6fGYd0TiqKJknrZE4XB5g+1RF0J6/oFbwlW7n83/uM60 MRzj5FyNAkYpL+qijAfXE/tZ4MCIvoi1aZoyQQ9bytRG8VJF4WBxPCWNlchceZoW ncLvXfiHm4W6wsyO5RHbtbyiEVPU//V/BH0blXyy9xDvPUDT50yplzR6XSlypxqO k67z7PG8Bm0afivqM5Yv8DNFnK/0gQ== =LuNr -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "Two fanotify cleanups and support for watching namespace-owned filesystems by namespace admins (most useful for being able to watch for new mounts / unmounts happening within a user namespace)" * tag 'fsnotify_for_v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: support watching filesystems and mounts inside userns fanotify: remove redundant permission checks fanotify: Drop use of flex array in fanotify_fh	2025-05-29 10:34:26 -07:00
Amir Goldstein	58f5fbeb36	fanotify: support watching filesystems and mounts inside userns An unprivileged user is allowed to create an fanotify group and add inode marks, but not filesystem, mntns and mount marks. Add limited support for setting up filesystem, mntns and mount marks by an unprivileged user under the following conditions: 1. User has CAP_SYS_ADMIN in the user ns where the group was created 2.a. User has CAP_SYS_ADMIN in the user ns where the sb was created OR (in case setting up a mntns mark) 2.b. User has CAP_SYS_ADMIN in the user ns associated with the mntns Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250516192803.838659-3-amir73il@gmail.com	2025-05-19 22:46:34 +02:00
Amir Goldstein	90d1238047	fanotify: remove redundant permission checks FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARK flags are already checked as part of the CAP_SYS_ADMIN check for any FANOTIFY_ADMIN_INIT_FLAGS. Remove the individual CAP_SYS_ADMIN checks for these flags. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250516192803.838659-2-amir73il@gmail.com	2025-05-19 22:46:34 +02:00
Jan Kara	b9b410cc18	fanotify: Drop use of flex array in fanotify_fh We use flex array 'buf' in fanotify_fh to contain the file handle content. However the buffer is not a proper flex array. It has a preconfigured fixed size. Furthermore if fixed size is not big enough, we use external separately allocated buffer. Hence don't pretend buf is a flex array since we have to use special accessors anyway and instead just modify the accessors to do the pointer math without flex array. This fixes warnings that flex array is not the last struct member in fanotify_fid_event or fanotify_error_event. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20250513131745.2808-2-jack@suse.cz	2025-05-14 11:06:56 +02:00
Amir Goldstein	c73c67026f	fanotify: fix flush of mntns marks fanotify_mark(fd, FAN_MARK_FLUSH \| FAN_MARK_MNTNS, ...) incorrectly ends up causing removal inode marks. Fixes: `0f46d81f2b` ("fanotify: notify on mount attach and detach") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250418193903.2607617-2-amir73il@gmail.com	2025-04-24 10:58:59 +02:00
Linus Torvalds	fd101da676	vfs-6.15-rc1.mount -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90qAwAKCRCRxhvAZXjc on7lAP0akpIsJMWREg9tLwTNTySI1b82uKec0EAgM6T7n/PYhAD/T4zoY8UYU0Pr qCxwTXHUVT6bkNhjREBkfqq9OkPP8w8= =GxeN -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Mount notifications The day has come where we finally provide a new api to listen for mount topology changes outside of /proc/<pid>/mountinfo. A mount namespace file descriptor can be supplied and registered with fanotify to listen for mount topology changes. Currently notifications for mount, umount and moving mounts are generated. The generated notification record contains the unique mount id of the mount. The listmount() and statmount() api can be used to query detailed information about the mount using the received unique mount id. This allows userspace to figure out exactly how the mount topology changed without having to generating diffs of /proc/<pid>/mountinfo in userspace. - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount api - Support detached mounts in overlayfs Since last cycle we support specifying overlayfs layers via file descriptors. However, we don't allow detached mounts which means userspace cannot user file descriptors received via open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to attach them to a mount namespace via move_mount() first. This is cumbersome and means they have to undo mounts via umount(). Allow them to directly use detached mounts. - Allow to retrieve idmappings with statmount Currently it isn't possible to figure out what idmapping has been attached to an idmapped mount. Add an extension to statmount() which allows to read the idmapping from the mount. - Allow creating idmapped mounts from mounts that are already idmapped So far it isn't possible to allow the creation of idmapped mounts from already idmapped mounts as this has significant lifetime implications. Make the creation of idmapped mounts atomic by allow to pass struct mount_attr together with the open_tree_attr() system call allowing to solve these issues without complicating VFS lookup in any way. The system call has in general the benefit that creating a detached mount and applying mount attributes to it becomes an atomic operation for userspace. - Add a way to query statmount() for supported options Allow userspace to query which mount information can be retrieved through statmount(). - Allow superblock owners to force unmount * tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits) umount: Allow superblock owners to force umount selftests: add tests for mount notification selinux: add FILE__WATCH_MOUNTNS samples/vfs: fix printf format string for size_t fs: allow changing idmappings fs: add kflags member to struct mount_kattr fs: add open_tree_attr() fs: add copy_mount_setattr() helper fs: add vfs_open_tree() helper statmount: add a new supported_mask field samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP selftests: add tests for using detached mount with overlayfs samples/vfs: check whether flag was raised statmount: allow to retrieve idmappings uidgid: add map_id_range_up() fs: allow detached mounts in clone_private_mount() selftests/overlayfs: test specifying layers as O_PATH file descriptors fs: support O_PATH fds with FSCONFIG_SET_FD vfs: add notifications for mount attach and detach fanotify: notify on mount attach and detach ...	2025-03-24 09:34:10 -07:00
Amir Goldstein	95101401bb	fsnotify: use accessor to set FMODE_NONOTIFY_* The FMODE_NONOTIFY_* bits are a 2-bits mode. Open coding manipulation of those bits is risky. Use an accessor file_set_fsnotify_mode() to set the mode. Rename file_set_fsnotify_mode() => file_set_fsnotify_mode_from_watchers() to make way for the simple accessor name. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-07 10:27:26 +01:00
Miklos Szeredi	0f46d81f2b	fanotify: notify on mount attach and detach Add notifications for attaching and detaching mounts. The following new event masks are added: FAN_MNT_ATTACH - Mount was attached FAN_MNT_DETACH - Mount was detached If a mount is moved, then the event is reported with (FAN_MNT_ATTACH \| FAN_MNT_DETACH). These events add an info record of type FAN_EVENT_INFO_TYPE_MNT containing these fields identifying the affected mounts: __u64 mnt_id - the ID of the mount (see statmount(2)) FAN_REPORT_MNT must be supplied to fanotify_init() to receive these events and no other type of event can be received with this report type. Marks are added with FAN_MARK_MNTNS, which records the mount namespace from an nsfs file (e.g. /proc/self/ns/mnt). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129165803.72138-3-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-05 17:21:07 +01:00
Miklos Szeredi	b944249bce	fsnotify: add mount notification infrastructure This is just the plumbing between the event source (fs/namespace.c) and the event consumer (fanotify). In itself it does nothing. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129165803.72138-2-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-02-04 11:14:47 +01:00
Joel Granados	1751f872cc	treewide: const qualify ctl_tables where applicable Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit `78eb4ea25c` ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu <song@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/ Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Joel Granados <joel.granados@kernel.org>	2025-01-28 13:48:37 +01:00
Linus Torvalds	8883957b3c	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmePs7oACgkQnJ2qBz9k QNmHuAf9GkLnY5u1/81xP5V9ukZ4N2yeMW0dydLS5cjWj/St5ELeMAza3jeqtJtD j36vbnmy2c5pPaGLAK8BJpMXT/R2TkmmKD004zcfqF2S3SgbGzdgO1zMZzq9KJpM woRKZtLuglDajedsDEBBcKotBhlN2+C/sQlFuL1mX4zitk9ajr0qYUB1+JqOeg5f qwPsDLT077ADpxd7lVIMcm+OqbduP5KWkBKYHpn7lJcLe1eqVMMzceJroW42zhVG Dq8Iln26bbU9Wx6FSPFCUcHEzHRHUfXmu07HN9U0X++0QgWjrmBQQLooGFB/bR4a edBrPpVas6xE4/brjgFX3gOKtv8xYg== =ewDV -----END PGP SIGNATURE----- Merge tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify pre-content notification support from Jan Kara: "This introduces a new fsnotify event (FS_PRE_ACCESS) that gets generated before a file contents is accessed. The event is synchronous so if there is listener for this event, the kernel waits for reply. On success the execution continues as usual, on failure we propagate the error to userspace. This allows userspace to fill in file content on demand from slow storage. The context in which the events are generated has been picked so that we don't hold any locks and thus there's no risk of a deadlock for the userspace handler. The new pre-content event is available only for users with global CAP_SYS_ADMIN capability (similarly to other parts of fanotify functionality) and it is an administrator responsibility to make sure the userspace event handler doesn't do stupid stuff that can DoS the system. Based on your feedback from the last submission, fsnotify code has been improved and now file->f_mode encodes whether pre-content event needs to be generated for the file so the fast path when nobody wants pre-content event for the file just grows the additional file->f_mode check. As a bonus this also removes the checks whether the old FS_ACCESS event needs to be generated from the fast path. Also the place where the event is generated during page fault has been moved so now filemap_fault() generates the event if and only if there is no uptodate folio in the page cache. Also we have dropped FS_PRE_MODIFY event as current real-world users of the pre-content functionality don't really use it so let's start with the minimal useful feature set" * tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (21 commits) fanotify: Fix crash in fanotify_init(2) fs: don't block write during exec on pre-content watched files fs: enable pre-content events on supported file systems ext4: add pre-content fsnotify hook for DAX faults btrfs: disable defrag on pre-content watched files xfs: add pre-content fsnotify hook for DAX faults fsnotify: generate pre-content permission event on page fault mm: don't allow huge faults for files with pre content watches fanotify: disable readahead if we have pre-content watches fanotify: allow to set errno in FAN_DENY permission response fanotify: report file range info with pre-content events fanotify: introduce FAN_PRE_ACCESS permission event fsnotify: generate pre-content permission event on truncate fsnotify: pass optional file access range in pre-content event fsnotify: introduce pre-content permission events fanotify: reserve event bit of deprecated FAN_DIR_MODIFY fanotify: rename a misnamed constant fanotify: don't skip extra event info if no info_mode is set fsnotify: check if file is actually being watched for pre-content events on open fsnotify: opt-in for permission events at file open time ...	2025-01-23 13:36:06 -08:00
Linus Torvalds	113385c5cc	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmePsi0ACgkQnJ2qBz9k QNmxrAgA0ySsXJkCqU1D1dmRqcM6q5VrpBMFbfzylGsZH0/hwPWZIUllaIdYoQGb l4l1LOUFxOdbbmcxq1smfLuTrfm+6JVIfJnj5TIl6UmvmVd+nJ6S/Wn9BJQEZrMA 0e4wENTfH0sTpyStOypEP2NzBFD1g8VlDzXRLfP6Wfc53FqfeqVAUzT+4IHCtuwW 07FUUYvbbUPzLE6n5mtJCKrd0R1Y+Ooy5pUzMJmln7AYudzjZNdW7gBWX26MpoWL DWENt+fJM/JQfHigDmeSBGclqMRYoQMG+MMrTFVHawEnJCG8ye54KX0+cOFse8so MyecuxK1DZdItObX2ztVrrWLRm0FvQ== =fwuH -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull inotify update from Jan Kara: "A small inotify strcpy() cleanup" * tag 'fsnotify_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Use strscpy() for event->name copies	2025-01-23 13:10:38 -08:00
Jan Kara	0c0214df28	fanotify: Fix crash in fanotify_init(2) The rrror handling in fanotify_init(2) is buggy and overwrites 'fd' before calling put_unused_fd() leading to possible access beyond the end of fd bitmap. Fix it. Reported-by: syzbot+6a3aa63412255587b21b@syzkaller.appspotmail.com Fixes: `ebe559609d` ("fs: get rid of __FMODE_NONOTIFY kludge") Signed-off-by: Jan Kara <jack@suse.cz>	2025-01-06 12:08:42 +01:00
Amir Goldstein	974e3fe0ac	fs: relax assertions on failure to encode file handles Encoding file handles is usually performed by a filesystem >encode_fh() method that may fail for various reasons. The legacy users of exportfs_encode_fh(), namely, nfsd and name_to_handle_at(2) syscall are ready to cope with the possibility of failure to encode a file handle. There are a few other users of exportfs_encode_{fh,fid}() that currently have a WARN_ON() assertion when ->encode_fh() fails. Relax those assertions because they are wrong. The second linked bug report states commit `16aac5ad1f` ("ovl: support encoding non-decodable file handles") in v6.6 as the regressing commit, but this is not accurate. The aforementioned commit only increases the chances of the assertion and allows triggering the assertion with the reproducer using overlayfs, inotify and drop_caches. Triggering this assertion was always possible with other filesystems and other reasons of ->encode_fh() failures and more particularly, it was also possible with the exact same reproducer using overlayfs that is mounted with options index=on,nfs_export=on also on kernels < v6.6. Therefore, I am not listing the aforementioned commit as a Fixes commit. Backport hint: this patch will have a trivial conflict applying to v6.6.y, and other trivial conflicts applying to stable kernels < v6.6. Reported-by: syzbot+ec07f6f5ce62b858579f@syzkaller.appspotmail.com Tested-by: syzbot+ec07f6f5ce62b858579f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@google.com/ Reported-by: Dmitry Safonov <dima@arista.com> Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3F+Juqy_o6oP8uw@mail.gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-12-19 15:18:27 +01:00

1 2 3 4 5 ...

819 Commits