linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-24 15:12:13 +02:00

Author	SHA1	Message	Date
Linus Torvalds	b85900e91c	NFS client updates for Linux 7.1 Highlights include: Bugfixes: - NFS: Fix handling of ENOSPC so that if we have to resend writes, they are written synchronously. - SUNRPC: RDMA transport fixes from Chuck - NFSv4.2: Several fixes for delegated timestamps - NFSv4: Failure to obtain a directory delegation should not cause stat() to fail. - NFSv4: Rename was failing to update timestamps when a directory delegation is held. - NFSv4: Ensure we check rsize/wsize after crossing a NFSv4 filesystem boundary. - NFSv4/pnfs: If the server is down, retry the layout returns on reboot - NFSv4/pnfs: Fallback to MDS could result in a short write being incorrectly logged. Cleanups: - NFS: use memcpy_and_pad in decode_fh -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR8xgHcVzJNfOYElJo6EXfx2a6V0QUCaevSUgAKCRA6EXfx2a6V 0ewIAQD+23uMo5sxY10btKATcBBxswY5YMtN1qQBMyn88N0XfwEAz0+zoEbRv4L2 39goJ/WeJ0/gqhfJV9F+Oe2U1DbsEgM= =l9y/ -----END PGP SIGNATURE----- Merge tag 'nfs-for-7.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Bugfixes: - Fix handling of ENOSPC so that if we have to resend writes, they are written synchronously - SUNRPC RDMA transport fixes from Chuck - Several fixes for delegated timestamps in NFSv4.2 - Failure to obtain a directory delegation should not cause stat() to fail with NFSv4 - Rename was failing to update timestamps when a directory delegation is held on NFSv4 - Ensure we check rsize/wsize after crossing a NFSv4 filesystem boundary - NFSv4/pnfs: - If the server is down, retry the layout returns on reboot - Fallback to MDS could result in a short write being incorrectly logged Cleanups: - Use memcpy_and_pad in decode_fh" * tag 'nfs-for-7.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (21 commits) NFS: Fix RCU dereference of cl_xprt in nfs_compare_super_address NFS: remove redundant __private attribute from nfs_page_class NFSv4.2: fix CLONE/COPY attrs in presence of delegated attributes NFS: fix writeback in presence of errors nfs: use memcpy_and_pad in decode_fh NFSv4.1: Apply session size limits on clone path NFSv4: retry GETATTR if GET_DIR_DELEGATION failed NFS: fix RENAME attr in presence of directory delegations pnfs/flexfiles: validate ds_versions_cnt is non-zero NFS/blocklayout: print each device used for SCSI layouts xprtrdma: Post receive buffers after RPC completion xprtrdma: Scale receive batch size with credit window xprtrdma: Replace rpcrdma_mr_seg with xdr_buf cursor xprtrdma: Decouple frwr_wp_create from frwr_map xprtrdma: Close lost-wakeup race in xprt_rdma_alloc_slot xprtrdma: Avoid 250 ms delay on backlog wakeup xprtrdma: Close sendctx get/put race that can block a transport nfs: update inode ctime after removexattr operation nfs: fix utimensat() for atime with delegated timestamps NFS: improve "Server wrote zero bytes" error ...	2026-04-24 14:20:03 -07:00
Sean Chang	e6614b88d5	NFS: Fix RCU dereference of cl_xprt in nfs_compare_super_address The cl_xprt pointer in struct rpc_clnt is marked as __rcu. Accessing it directly in nfs_compare_super_address() is unsafe and triggers Sparse warnings. Fix this by using rcu_dereference() within an RCU read-side critical section to retrieve the transport pointer. This addresses the sparse warning and ensures atomic access to the pointer, as the transport can be updated via transport switching even while the superblock remains active under sb_lock. Fixes: `7e3fcf61ab` ("nfs: don't share mounts between network namespaces") Signed-off-by: Sean Chang <seanwascoding@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-22 08:53:23 -04:00
Sean Chang	e8a44ae87b	NFS: remove redundant __private attribute from nfs_page_class The nfs_page_class tracepoint uses a pointer for the 'req' field marked with the __private attribute. This causes Sparse to complain about dereferencing a private pointer within the trace ring buffer context, specifically during the TP_fast_assign() operation. This fixes a Sparse warning introduced in commit `b6ef079fd9` ("nfs: more in-depth tracing of writepage events") by removing the redundant __private attribute from the 'req' field. Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Sean Chang <seanwascoding@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-22 08:53:23 -04:00
Olga Kornievskaia	6e7daa3dad	NFSv4.2: fix CLONE/COPY attrs in presence of delegated attributes xfstest generic/407 is failing in 2 ways. It detects that after doing a clone the client does not update it's mtime and it's ctime. CLONE always sends a GETATTR operation and then calls nfs_post_op_update_inode() based on the returned attributes. Because of the delegated attributes the client ignores updating the mtime. Then also, when delegated attributes are present, for the change_attr the server replies with the same values as what the client cached before and thus the generic/407 would flag that. Instead, make sure we invalidate the blocks attr. By adding updating delegated attributes in nfs42_copy_dest_done() both COPY and CLONE would update mtime appropriately. Fixes: `e12912d941` ("NFSv4: Add support for delegated atime and mtime attributes") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-22 08:53:23 -04:00
Olga Kornievskaia	5d3869a41f	NFS: fix writeback in presence of errors After running xfstest generic/751, in certain conditions, can have a writeback IO stuck while experiencing one of the two patterns. Pattern#1: writeback IO experiences ENOSPC on an offset smaller than the filesize. Example, write offset=0 len=4096 how=unstable OK write offset=8192 len=4096 how=unstable OK write offset=12288 len=4096 how=unstable ENOSPC write offset=4096 len=4096 how=unstable ENOSPC client sends a commit and receives a verifier which is different from the last successful write. It marks pages dirty and writeback retries. But it again send writes unstable and gets into the same pattern, running into the ENOSPC error and sending a commit because writes were sent at unstable. Pattern#2: an unstable write followed by a short write and ENOSPC. write offset=0 len=4096 how=unstable OK write offset=4096 len=4096 how=unstable returns OK but count=100 write offset=4197 len=3996 how=stable returns ENOSPC client send a commit and receives a verifier different from the last unstable write. The same behaviour is retried in a loop. Instead, this patch proposes to identify those conditions and mark requests to be done synchronously instead. Previous solution tried to mark it in the nfs_page, however that's not persistent thus instead mark it in the nfs_open_context. Furthermore, the same problem occurs during localio code path so recognize that IO needs to be done sync in that case as well. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-22 08:53:23 -04:00
Thorsten Blum	43ea7036ee	nfs: use memcpy_and_pad in decode_fh Use memcpy_and_pad() instead of memcpy() followed by memset() to simplify decode_fh(). Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-22 08:53:23 -04:00
Linus Torvalds	292a2bcd17	Fixing livelocks in shrink_dcache_tree() If shrink_dcache_tree() finds a dentry in the middle of being killed by another thread, it has to wait until the victim finishes dying, gets detached from the tree and ceases to pin its parent. The way we used to deal with that amounted to busy-wait; unfortunately, it's not just inefficient but can lead to reliably reproducible hard livelocks. Solved by having shrink_dentry_tree() attach a completion to such dentry, with dentry_unlist() calling complete() on all objects attached to it. With a bit of care it can be done without growing struct dentry or adding overhead in normal case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaec/ugAKCRBZ7Krx/gZQ 6y77AP9l/IL36Ic45p45FCTirA6LFyIvZ5Gixm3Xk64Pi1Y3nAEAqQ5UOVnJc907 RyrCpcI6vnO8a67MptchOxK9d1bIxQw= =A8JL -----END PGP SIGNATURE----- Merge tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache busy loop updates from Al Viro: "Fix livelocks in shrink_dcache_tree() If shrink_dcache_tree() finds a dentry in the middle of being killed by another thread, it has to wait until the victim finishes dying, gets detached from the tree and ceases to pin its parent. The way we used to deal with that amounted to busy-wait; unfortunately, it's not just inefficient but can lead to reliably reproducible hard livelocks. Solved by having shrink_dentry_tree() attach a completion to such dentry, with dentry_unlist() calling complete() on all objects attached to it. With a bit of care it can be done without growing struct dentry or adding overhead in normal case" * tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of busy-waiting in shrink_dcache_tree() dcache.c: more idiomatic "positives are not allowed" sanity checks struct dentry: make ->d_u anonymous for_each_alias(): helper macro for iterating through dentries of given inode	2026-04-21 07:30:44 -07:00
Linus Torvalds	36d179fd6b	NFSD 7.1 Release Notes Benjamin Coddington contributed filehandle signing to defend against filehandle-guessing attacks. The server now appends a SipHash-2-4 MAC to each filehandle when the new "sign_fh" export option is enabled. NFSD then verifies filehandles received from clients against the expected MAC; mismatches return NFS error STALE. Chuck Lever converted the entire NLMv4 server-side XDR layer from hand-written C to xdrgen-generated code, spanning roughly thirty patches. XDR functions are generally boilerplate code and are easy to get wrong. The goals of this conversion are improved memory safety, lower maintenance burden, and groundwork for eventual Rust code generation for these functions. Dai Ngo improved pNFS block/SCSI layout robustness with two related changes. SCSI persistent reservation fencing is now tracked per client and per device via an xarray, to avoid both redundant preempt operations on devices already fenced and a potential NFSD deadlock when all nfsd threads are waiting for a layout return. The remaining patches deliver scalability and infrastructure improvements. Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.1 NFSD development cycle. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmnlF50ACgkQM2qzM29m f5dfHBAAi2o1i9/RA6fmxi2qSV7tkg79viuGFRj3c4cjiW8ZqQXos63zmy6BNMFG joEoirdryUETkrrckXP81HKGSWBQqYjaXeklOw8dggQ8g72HGiqcoT3Ua7L9S7A8 /Db6IwZnJcehHO8XwHV4jSAfIZuvC0iiK02tVrVe/l/9GWcG+bS340GgE9Es2IAW copBGlTwQah+eRvy2hP+Eo3vUTP8Rdebp9iYFI12xqx2x3LquFR01PpjCzotqAvV AcvCPa/AGoSOjcL8idloL8F8mSaOCyx15YJH0lm3hRsPtS/VyXWjKvcejWUh/7PH gHi+5VTsSKbUBj3PJQZU6rBQ67KnwVLZ33KkIF2ZNGllvK0yDGM0UfX/TuaEPjUV 6N0UkRprCHJdrULt9XMXmX3Ddnz1xbYT8CaeIDObw3Ix7SJKedvlLTjvsYCYtsQn 5pkHUuHmr/YAF4AQi/JI4ubZhZ+K3YytNS8YiMUkBWDbPoKzo2yrkzwjGjHdUp0y l8LfEjePAcIpuFQZegERA9CnjIeKb66DJe8da0EwtreY+sejm/S8zbBUhMkXjo6u QwdXXeLX3/zni6Op8vRA5JH//S5ovlQFnkUSvHRItSUrDBRVm+wXD7Vnp9bykKcN leqbSvehnV4PIi0URMvN5ox1WNmsOFIZkv9nv8amyOX8PlRmLoA= =iFl6 -----END PGP SIGNATURE----- Merge tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: - filehandle signing to defend against filehandle-guessing attacks (Benjamin Coddington) The server now appends a SipHash-2-4 MAC to each filehandle when the new "sign_fh" export option is enabled. NFSD then verifies filehandles received from clients against the expected MAC; mismatches return NFS error STALE - convert the entire NLMv4 server-side XDR layer from hand-written C to xdrgen-generated code, spanning roughly thirty patches (Chuck Lever) XDR functions are generally boilerplate code and are easy to get wrong. The goals of this conversion are improved memory safety, lower maintenance burden, and groundwork for eventual Rust code generation for these functions. - improve pNFS block/SCSI layout robustness with two related changes (Dai Ngo) SCSI persistent reservation fencing is now tracked per client and per device via an xarray, to avoid both redundant preempt operations on devices already fenced and a potential NFSD deadlock when all nfsd threads are waiting for a layout return. - scalability and infrastructure improvements Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.1 NFSD development cycle. * tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (83 commits) NFSD: Docs: clean up pnfs server timeout docs nfsd: fix comment typo in nfsxdr nfsd: fix comment typo in nfs3xdr NFSD: convert callback RPC program to per-net namespace NFSD: use per-operation statidx for callback procedures svcrdma: Use contiguous pages for RDMA Read sink buffers SUNRPC: Add svc_rqst_page_release() helper SUNRPC: xdr.h: fix all kernel-doc warnings svcrdma: Factor out WR chain linking into helper svcrdma: Add Write chunk WRs to the RPC's Send WR chain svcrdma: Clean up use of rdma->sc_pd->device svcrdma: Clean up use of rdma->sc_pd->device in Receive paths svcrdma: Add fair queuing for Send Queue access SUNRPC: Optimize rq_respages allocation in svc_alloc_arg SUNRPC: Track consumed rq_pages entries svcrdma: preserve rq_next_page in svc_rdma_save_io_pages SUNRPC: Handle NULL entries in svc_rqst_release_pages SUNRPC: Allocate a separate Reply page array SUNRPC: Tighten bounds checking in svc_rqst_replace_page NFSD: Sign filehandles ...	2026-04-20 10:44:02 -07:00
Linus Torvalds	334fbe734e	mm.git review status for linus..mm-stable Everything: Total patches: 368 Reviews/patch: 1.56 Reviewed rate: 74% Excluding DAMON: Total patches: 316 Reviews/patch: 1.77 Reviewed rate: 81% Excluding DAMON and zram: Total patches: 306 Reviews/patch: 1.81 Reviewed rate: 82% Excluding DAMON, zram and maple_tree: Total patches: 276 Reviews/patch: 2.01 Reviewed rate: 91% Significant patch series in this merge: - The 30 patch series "maple_tree: Replace big node with maple copy" from Liam Howlett is mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - The 12 patch series "mm, swap: swap table phase III: remove swap_map" from Kairui Song offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush Yadav adds file seal preservation to LUO's memfd code. - The 2 patch series "mm: zswap: add per-memcg stat for incompressible pages" from Jiayuan Chen adds additional userspace stats reportng to zswap. - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike Rapoport implements some cleanups for our handling of ZERO_PAGE() and zero_pfn. - The 2 patch series "mm/kmemleak: Improve scan_should_stop() implementation" from Zhongqiu Han provides an robustness improvement and some cleanups in the kmemleak code. - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang "improves the khugepaged scan logic and reduces CPU consumption by prioritizing scanning tasks that access memory frequently". - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies Kexec Handover by "transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel" - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's tracepointing. - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation. - The 2 patch series "Fix KASAN support for KHO restored vmalloc regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO restores a vmalloc area. - The 4 patch series "mm: Remove stray references to pagevec" from Tal Zussman provides several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago. - The 17 patch series "mm: Eliminate fake head pages from vmemmap optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page. - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for core layer filters" from SeongJae Park improves two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used. - The 3 patch series "mm/damon: strictly respect min_nr_regions" from SeongJae Park improves DAMON usability by extending the treatment of the min_nr_regions user-settable parameter. - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil Babka is a proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ennsed. - The 16 patch series "mm: cleanups around unmapping / zapping" from David Hildenbrand implements "a bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions". - The 6 patch series "support batched checking of the young flag for MGLRU" from Baolin Wang supports batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64. - The 5 patch series "memcg: obj stock and slab stat caching cleanups" from Johannes Weiner provides memcg cleanup and robustness improvements. - The 5 patch series "Allow order zero pages in page reporting" from Yuvraj Sakshith enhances page_reporting's free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is cleanup work following from the recent conversion of the VMA flags to a bitmap. - The 10 patch series "mm/damon: add optional debugging-purpose sanity checks" from SeongJae Park adds some more developer-facing debug checks into DAMON core. - The 2 patch series "mm/damon: test and document power-of-2 min_region_sz requirement" from SeongJae Park adds an additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling. - The 3 patch series "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time overflow issue in DAMON core. - The 7 patch series "mm/damon: improve/fixup/update ratio calculation, test and documentation" from SeongJae Park is a "batch of misc/minor improvements and fixups" for DAMON. - The 4 patch series "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - The 6 patch series "zram: recompression cleanups and tweaks" from Sergey Senozhatsky provides "a somewhat random mix of fixups, recompression cleanups and improvements" in the zram code. - The 11 patch series "mm/damon: support multiple goal-based quota tuning algorithms" from SeongJae Park extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select. - The 4 patch series "mm: thp: reduce unnecessary start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged. - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes provides some cleanups and slight fixes in the mremap, mmap and vma code. - The 5 patch series "mm/damon: support addr_unit on default monitoring targets for modules" from SeongJae Park extends the use of DAMON core's addr_unit tunable. - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites" from Nico Pache provides cleanups in the khugepaged and is a base for Nico's planned khugepaged mTHP support. - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups" from David Hildenbrand implements code movement and cleanups in the memhotplug and sparsemem code. - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some memhotplug Kconfig support. - The 6 patch series "change young flag check functions to return bool" from Baolin Wang is "a cleanup patchset to change all young flag check functions to return bool". - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL dereference issues" from Josh Law and SeongJae Park fixes a few potential DAMON bugs. - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma code" from "converts a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it". Mainly in the vma code. - The 21 patch series "mm: expand mmap_prepare functionality and usage" from Lorenzo Stoakes "expands the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time". Cleanups, documentation, extension of mmap_prepare into filesystem drivers. - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from Lorenzo Stoakes simplifies and cleans up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ= =1Q7o -----END PGP SIGNATURE----- Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...	2026-04-15 12:59:16 -07:00
Tushar Sariya	8c787b286f	NFSv4.1: Apply session size limits on clone path nfs4_clone_server() builds a child nfs_server for same-server automounted submounts but never calls nfs4_session_limit_rwsize() or nfs4_session_limit_xasize() after nfs_clone_server(). This means the child mount can end up with rsize/wsize values that exceed the negotiated session channel limits, causing NFS4ERR_REQ_TOO_BIG and EIO on servers that enforce tight max_request_size budgets. Top-level mounts go through nfs4_server_common_setup() which calls these limiters after nfs_probe_server(). Apply the same clamping on the clone path for consistency. Fixes: `2b092175f5` ("NFS: Fix inheritance of the block sizes when automounting") Cc: stable@vger.kernel.org Signed-off-by: Tushar Sariya <tushar.97@hotmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 14:56:09 -07:00
Olga Kornievskaia	515af10044	NFSv4: retry GETATTR if GET_DIR_DELEGATION failed Currently, getting a directory delegation is opportinistic and gets added to an existing GETATTR that's trying to retrieve some needed attributes. However, GET_DIRDELEGATION can fail and that currently causes a GETATTR to fail and an error is propagated to the user. Instead, the original GETATTR should be retried without requesting a directory delegation. Also, now chosing to clear asking for the direct delegation for this specific inode. Fixes: `156b094829` ("NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 14:45:24 -07:00
Olga Kornievskaia	4fa7ab8d29	NFS: fix RENAME attr in presence of directory delegations Since commit `6f9bda2337` ("NFS: Fix directory delegation verifier checks") xfstest generic/309 is failing because after the rename (mv) operation, client's mtime/ctime is the same. Update the delegated mtime when directory delegations are present in rename. Fixes: `6f9bda2337` ("NFS: Fix directory delegation verifier checks") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 14:43:28 -07:00
Jenny Guanni Qu	94545ffc0a	pnfs/flexfiles: validate ds_versions_cnt is non-zero nfs4_ff_alloc_deviceid_node() reads version_count from XDR without checking it is non-zero. When a malicious NFS server sends a pNFS LAYOUTGET response with version_count=0, kcalloc(0, ...) returns ZERO_SIZE_PTR (0x10). The subsequent ds_versions[0] access in nfs4_ff_layout_ds_version() and other callers dereferences this invalid pointer, causing an out-of-bounds read. Add a check for version_count == 0 after parsing it from XDR, before the allocation. The OOB read was confirmed with KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] from accessing ZERO_SIZE_PTR. Fixes: `d67ae825a5` ("pnfs/flexfiles: Add the FlexFile Layout Driver") Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com> Reported-by: Dawid Moczadło <dawid@vidocsecurity.com> Tested-by: Jenny Guanni Qu <qguanni@gmail.com> Signed-off-by: Jenny Guanni Qu <qguanni@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 14:04:16 -07:00
Christoph Hellwig	b0ed12538f	NFS/blocklayout: print each device used for SCSI layouts We already print device uses for block layouts, do the same for SCSI layouts as that greatly helps understanding the operation of the client. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 14:00:22 -07:00
Linus Torvalds	b7d74ea0fd	vfs-7.1-rc1.kino Please consider pulling these changes from the signed vfs-7.1-rc1.kino tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc otmnAP4sbsxZQdz2TG2hJuOwnEZOkkxZQOUMc3ERVyZaWXIeTAEA7e5M+8FpoG9n 8ipO76UoaXdGLESrqVdp9EOhLqOW7QY= =uMeJ -----END PGP SIGNATURE----- Merge tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64	2026-04-13 12:19:01 -07:00
Jeff Layton	9c332d7f63	nfs: update inode ctime after removexattr operation xfstest generic/728 fails with delegated timestamps. The client does a removexattr and then a stat to test the ctime, which doesn't change. The stat() doesn't trigger a GETATTR because of the delegated timestamps, so it relies on the cached ctime, which is wrong. The setxattr compound has a trailing GETATTR, which ensures that its ctime gets updated. Follow the same strategy with removexattr. Fixes: `3e1f02123f` ("NFSv4.2: add client side XDR handling for extended attributes") Reported-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 11:46:08 -07:00
Jeff Layton	16d99dce93	nfs: fix utimensat() for atime with delegated timestamps xfstest generic/221 is failing with delegated timestamps enabled. When the client holds a WRITE_ATTRS_DELEG delegation, and a userland process does a utimensat() for only the atime, the ctime is not properly updated. The problem is that the client tries to cache the atime update, but there is no mtime update, so the delegated attribute update never updates the ctime. Delegated timestamps don't have a mechanism to update the ctime in accordance with atime-only changes due to utimensat() and the like. Change the client to issue an RPC in this case, so that the ctime gets properly updated alongside the atime. Fixes: `40f45ab381` ("NFS: Further fixes to attribute delegation a/mtime changes") Reported-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 11:46:01 -07:00
Olga Kornievskaia	3a06bac55b	NFS: improve "Server wrote zero bytes" error When a pnfs error occurs, the IO is retried against the MDS. However, the initial IO leads to the kernel logging "Serer wrote zero bytes" when in fact the MDS IO will not fail and thus the error misleads administrators that the system is experiencing issues. When pnfs IO fails which triggers pnfs_write_done_resent_to_mds() which would end up clearing nfs_pgio_header's pages structure (copying the content into a new one to do new RPC calls to the MDS). Thus, in nfs_writeback_result() when we have no pages to work with no need to try and also therefore skip logging the message about 0bytes. Fixes: `6c75dc0d49` ("NFS: merge _full and _partial write rpc_ops") Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 11:17:31 -07:00
Linus Torvalds	0e58e3f1c5	vfs-7.1-rc1.writeback Please consider pulling these changes from the signed vfs-7.1-rc1.writeback tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc oh/7AP9kfWK3ElRTV/+EfRs3tOAXD2b7VfO5jMbbSIfRjWgW5AEAiMULmQFpLOY8 cPEY/c5E9t8GLZqdsOJL8LQwz8YYAAk= =BtbD -----END PGP SIGNATURE----- Merge tag 'vfs-7.1-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs writeback updates from Christian Brauner: "This introduces writeback helper APIs and converts f2fs, gfs2 and nfs to stop accessing writeback internals directly" * tag 'vfs-7.1-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfs: stop using writeback internals for WB_WRITEBACK accounting gfs2: stop using writeback internals for dirty_exceeded check f2fs: stop using writeback internals for dirty_exceeded checks writeback: prep helpers for dirty-limit and writeback accounting	2026-04-13 10:08:01 -07:00
Trond Myklebust	1805e6b2f4	NFSv4/pnfs: If the server is down, retry the layout returns on reboot If a layout return is embedded in a CLOSE or DELEGRETURN rpc call, and the metadata server reboots, the expectation now is that the client should resend the layout return once the server comes back up. This patch changes the current behaviour of dropping the layouts on the floor, and instead queues them up for retrying. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2026-04-13 09:26:19 -07:00
Christian Brauner	e3b2cf6e5d	kernfs: pass struct ns_common instead of const void * for namespace tags kernfs has historically used const void * to pass around namespace tags used for directory-level namespace filtering. The only current user of this is sysfs network namespace tagging where struct net pointers are cast to void . Replace all const void namespace parameters with const struct ns_common * throughout the kernfs, sysfs, and kobject namespace layers. This includes the kobj_ns_type_operations callbacks, kobject_namespace(), and all sysfs/kernfs APIs that accept or return namespace tags. Passing struct ns_common is needed because various codepaths require access to the underlying namespace. A struct ns_common can always be converted back to the concrete namespace type (e.g., struct net) via container_of() or to_ns_common() in the reverse direction. This is a preparatory change for switching to ns_id-based directory iteration to prevent a KASLR pointer leak through the current use of raw namespace pointers as hash seeds and comparison keys. Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-04-09 14:36:52 +02:00
Tal Zussman	ab5193e919	fs: remove unncessary pagevec.h includes Remove unused pagevec.h includes from .c files. These were found with the following command: grep -rl '#include.pagevec\.h' --include='.c' \| while read f; do grep -qE 'PAGEVEC_SIZE\|folio_batch' "$f" \|\| echo "$f" done There are probably more removal candidates in .h files, but those are more complex to analyze. Link: https://lkml.kernel.org/r/20260225-pagevec_cleanup-v2-2-716868cc2d11@columbia.edu Signed-off-by: Tal Zussman <tz2294@columbia.edu> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:53:06 -07:00
Al Viro	2420067cec	struct dentry: make ->d_u anonymous Making ->d_rcu and (then) ->d_child overlapping dates back to 2006; anon unions support had been added to gcc only in 4.6 (2011) and the minimal gcc version hadn't been bumped to that until 4.19 (2018). These days there's no reason not to keep that union named. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-04-02 03:47:31 -04:00
Al Viro	408d8af01f	for_each_alias(): helper macro for iterating through dentries of given inode Most of the places using d_alias are loops iterating through all aliases for given inode; introduce a helper macro (for_each_alias(dentry, inode)) and convert open-coded instances of such loop to it. They are easier to read that way and it reduces the noise on the next steps. You _must_ hold inode->i_lock over that thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-04-02 03:45:02 -04:00
Andy Shevchenko	f83c8dda45	nfs/blocklayout: Fix compilation error (`make W=1`) in bl_write_pagelist() Clang compiler is not happy about set but unused variable (when dprintk() is no-op): .../blocklayout/blocklayout.c:384:9: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable] Remove a leftover from the previous cleanup. Fixes: `3a6fd1f004` ("pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist") Acked-by: Anna Schumaker <anna.schumkaer@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-03-29 21:25:09 -04:00
Chuck Lever	615384a24b	lockd: Move xdr.h from include/linux/lockd/ to fs/lockd/ The lockd subsystem unnecessarily exposes internal NLM XDR type definitions through the global include path. These definitions are not used by any code outside fs/lockd/, making them inappropriate for include/linux/lockd/. Moving xdr.h to fs/lockd/ narrows the API surface and clarifies that these types are internal implementation details. The comment in linux/lockd/bind.h stating xdr.h was needed for "xdr-encoded error codes" is stale: no lockd API consumers use those codes. Forward declarations for struct nfs_fh and struct file_lock are added to bind.h because their definitions were previously pulled in transitively through xdr.h. Additionally, nfs3proc.c and proc.c need explicit includes of filelock.h for FL_CLOSE and for accessing struct file_lock members, respectively. Built and tested with lockd client/server operations. No functional change. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-03-29 21:25:09 -04:00
Chuck Lever	840621fd2f	NFS: Use nlmclnt_shutdown_rpc_clnt() to safely shut down NLM A race condition exists in shutdown_store() when writing to the sysfs "shutdown" file concurrently with nlm_shutdown_hosts_net(). Without synchronization, the following sequence can occur: 1. shutdown_store() reads server->nlm_host (non-NULL) 2. nlm_shutdown_hosts_net() acquires nlm_host_mutex, calls rpc_shutdown_client(), sets h_rpcclnt to NULL, and potentially frees the host via nlm_gc_hosts() 3. shutdown_store() dereferences the now-stale or freed host Introduce nlmclnt_shutdown_rpc_clnt(), which acquires nlm_host_mutex before accessing h_rpcclnt. This synchronizes with nlm_shutdown_hosts_net() and ensures the rpc_clnt pointer remains valid during the shutdown operation. This change also improves API layering: NFS client code no longer needs to include the internal lockd header to access nlm_host fields. The new helper resides in bind.h alongside other public lockd interfaces. Reported-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2026-03-29 21:25:09 -04:00
Jeff Layton	0b2600f81c	treewide: change inode->i_ino from unsigned long to u64 On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-06 14:31:28 +01:00
Anna Schumaker	4529e00154	NFS: Fix NFS KConfig typos Two issues were noticed after the NFS v4.0 KConfig changes were merged upstream. First, the text of CONFIG_NFS_V4 should not encourage people to select it if they are unsure. Second, the new CONFIG_NFS_V4_0 option should default to "on" instead of "off" to avoid breaking people's setups if they are using NFS v4.0. Reported-by: Niklas Cassel <cassel@kernel.org> Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: `4e02693525` ("NFS: Add a way to disable NFS v4.0 via KConfig") Fixes: `7537db2480` ("NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-02-27 15:42:14 -05:00
Roberto Bergantinos Corpas	410666a298	nfs: return EISDIR on nfs3_proc_create if d_alias is a dir If we found an alias through nfs3_do_create/nfs_add_or_obtain /d_splice_alias which happens to be a dir dentry, we don't return any error, and simply forget about this alias, but the original dentry we were adding and passed as parameter remains negative. This later causes an oops on nfs_atomic_open_v23/finish_open since we supply a negative dentry to do_dentry_open. This has been observed running lustre-racer, where dirs and files are created/removed concurrently with the same name and O_EXCL is not used to open files (frequent file redirection). While d_splice_alias typically returns a directory alias or NULL, we explicitly check d_is_dir() to ensure that we don't attempt to perform file operations (like finish_open) on a directory inode, which triggers the observed oops. Fixes: `7c6c5249f0` ("NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.") Reviewed-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-02-23 11:41:38 -05:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Kundan Kumar	fd15b9c6ec	nfs: stop using writeback internals for WB_WRITEBACK accounting Convert NFS WB_WRITEBACK accounting to writeback helper, eliminating direct access to writeback. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://patch.msgid.link/20260213054634.79785-5-kundan.kumar@samsung.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-02-17 13:25:14 +01:00
Linus Torvalds	7449f86baf	NFS Client Updates for Linux 7.0 New Features: * Use an LRU list for returning unused delegations * Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1 the default Bugfixes: * NFS/localio: Handle short writes by retrying * NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepages * NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit * NFS/localio: remove -EAGAIN handling in nfs_local_doio() * pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN * fs/nfs: Fix a readdir slow-start regression * SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path Other Cleanups and Improvements: * A few other NFS/localio cleanups * Various other delegation handling cleanups from Christoph * Unify security_inode_listsecurity() calls * Improvements to NFSv4 lease handling * Clean up SUNRPC _debug fields when CONFIG_SUNRPC_DEBUG is not set -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmmORX0ACgkQ18tUv7Cl QOtFORAAyCwTst5iEPRJ9rKZ/Kl39zHbA/QUn3CmmVkGlOBj0j7mWRyU5X0vlIQ9 mUF3Ikm1XYpsxPTKBEELVumPkggT2nfsFx5518BrpRTODibzc/CZ10/z7q4qarvI UhdFlt9SRG4RhhOdAaThF6XVUsRSwGwVZo/YyYemCc/evjNVyXa0wfwbDl9l4Nzr 1Sxt2/zeq3Eu4IfrxQpFM+0UuSScmVODSe8Jm4GnmlU/Q7x+onW35IvyuzTkgDwG 8PAeH4b5uADY9VWnTHpvr1fQNnBoEw8b4qr9a7AXQKRIcPGMvgKkdK+f6hOh1cEs +O+L4+uixo7QXudnWC27brZSyHwDIVVaJGPF/kNv4O2GKDyEcbsHtQv2G1+1+PtR FCtRFGpLq2pZxb9SY/s73FKp6a8bd81FAtzAL7iYU+2FDtvEDKss1nG6sQNG1+Z4 G8rI79PoimR4I6Jr5hk4sl8pM8wJVLZdcW+ytrEKl9FC+rFDrP9lVzHYArTFgIky N/IjEflejRfZ9bYIZ9/CYnFZC3Htrm8K9zerCRDsf96tvhxkX8FZM8tuZpHEMIbx Cx8XKCk+ubqLIF2mT+FKOc5T6CUmMiGRNagLkx0h0mbvRSI8HTpgQZGrbYkMk0Hs abUhvH73pRi0LRvkzPHfcNaZ7Y/mFBYfwBMwTUWJzh6CEgXnpks= =CwZq -----END PGP SIGNATURE----- Merge tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "New Features: - Use an LRU list for returning unused delegations - Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1 the default Bugfixes: - NFS/localio: - Handle short writes by retrying - Prevent direct reclaim recursion into NFS via nfs_writepages - Use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit - Remove -EAGAIN handling in nfs_local_doio() - pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN - fs/nfs: Fix a readdir slow-start regression - SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path Other cleanups and improvements: - A few other NFS/localio cleanups - Various other delegation handling cleanups from Christoph - Unify security_inode_listsecurity() calls - Improvements to NFSv4 lease handling - Clean up SUNRPC _debug fields when CONFIG_SUNRPC_DEBUG is not set" * tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (60 commits) SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path nfs: nfs4proc: Convert comma to semicolon SUNRPC: Change list definition method sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unset NFSv4: limit lease period in nfs4_set_lease_period() NFSv4: pass lease period in seconds to nfs4_set_lease_period() nfs: unify security_inode_listsecurity() calls fs/nfs: Fix readdir slow-start regression pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN NFS: fix delayed delegation return handling NFS: simplify error handling in nfs_end_delegation_return NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return NFS: remove the delegation == NULL check in nfs_end_delegation_return NFS: use bool for the issync argument to nfs_end_delegation_return NFS: return void from ->return_delegation NFS: return void from nfs4_inode_make_writeable NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4 NFS: Add a way to disable NFS v4.0 via KConfig NFS: Move sequence slot operations into minorversion operations NFS: Pass a struct nfs_client to nfs4_init_sequence() ...	2026-02-12 17:49:33 -08:00
Linus Torvalds	2831fa8b8b	NFSD 7.0 Release Notes Neil Brown and Jeff Layton contributed a dynamic thread pool sizing mechanism for NFSD. The sunrpc layer now tracks minimum and maximum thread counts per pool, and NFSD adjusts running thread counts based on workload: idle threads exit after a timeout when the pool exceeds its minimum, and new threads spawn automatically when all threads are busy. Administrators control this behavior via the nfsdctl netlink interface. Rick Macklem, FreeBSD NFS maintainer, generously contributed server- side support for the POSIX ACL extension to NFSv4, as specified in draft-ietf-nfsv4-posix-acls. This extension allows NFSv4 clients to get and set POSIX access and default ACLs using native NFSv4 operations, eliminating the need for sideband protocols. The feature is gated by a Kconfig option since the IETF draft has not yet been ratified. Chuck Lever delivered numerous improvements to the xdrgen tool. Error reporting now covers parsing, AST transformation, and invalid declarations. Generated enum decoders validate incoming values against valid enumerator lists. New features include pass-through line support for embedding C directives in XDR specifications, 16-bit integer types, and program number definitions. Several code generation issues were also addressed. When an administrator revokes NFSv4 state for a filesystem via the unlock_fs interface, ongoing async COPY operations referencing that filesystem are now cancelled, with CB_OFFLOAD callbacks notifying affected clients. The remaining patches in this pull request are clean-ups and minor optimizations. Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.0 NFSD development cycle. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmmJ8kAACgkQM2qzM29m f5ejCQ//RdoWNgN1VZdNoUrh1tm1Fhi1YN/RJS26G25OxgTBc3/qtGxrpW+ZAW6+ mIAJ2bT/l66741drki4/x6WJU4OMI/4mJxrLd0WCb1POaeRQWnL1MdzNY+IP/QZv 3DgcTv1T6FKE7pFmAqW0nFPCgaK+vlR+fo4uJognbB6+hZB3HlrLkfeZOWMAmchC y3U6nzrtP+IljAtdzKZ120E+LHp0PtTbJwPCPSt3/FR/dkA0DcjnOS9jybIYlJOu 0ByX24BcrW/c3rJUdL8lL4G7gsPWjdARqczFiN8sufI9Q3zlHOxtYdUT7BNjd+04 jcSKLlAXwcbNcK2f54B/QFKmNxllvoHLB3wo2hfEPig4LQELuxcUHYxmmD4vNKen lp6zmaLq3PiRGlew6eLRFxRxbdLds+9l0xjXV+J+rtQmjppXdXUoVNMm+D+tD6bF T5TUq4WNCGJIrpkR7pdF7uMD51s8fphvaDxOCjhSi3WHAtZAhOR8HFUU97qddM34 KqF6Gph3tN/C4oNb8kKvzxBRpRhHIzKHZbreiu5fZr9pPe9IRBHnn/Dg4p/yYQcw K3/y1EnKrIlprfbFFkY1LzNFpf309uoZTVzwBcMfSJVsFgUqWD7KHJ/rmCJQ/pS6 k0+YLRoUmtUHDYk2QNlstlt7r6FwA0d2GjT8n7viGoNQ3PA7rJQ= =hqla -----END PGP SIGNATURE----- Merge tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd updates from Chuck Lever: "Neil Brown and Jeff Layton contributed a dynamic thread pool sizing mechanism for NFSD. The sunrpc layer now tracks minimum and maximum thread counts per pool, and NFSD adjusts running thread counts based on workload: idle threads exit after a timeout when the pool exceeds its minimum, and new threads spawn automatically when all threads are busy. Administrators control this behavior via the nfsdctl netlink interface. Rick Macklem, FreeBSD NFS maintainer, generously contributed server- side support for the POSIX ACL extension to NFSv4, as specified in draft-ietf-nfsv4-posix-acls. This extension allows NFSv4 clients to get and set POSIX access and default ACLs using native NFSv4 operations, eliminating the need for sideband protocols. The feature is gated by a Kconfig option since the IETF draft has not yet been ratified. Chuck Lever delivered numerous improvements to the xdrgen tool. Error reporting now covers parsing, AST transformation, and invalid declarations. Generated enum decoders validate incoming values against valid enumerator lists. New features include pass-through line support for embedding C directives in XDR specifications, 16-bit integer types, and program number definitions. Several code generation issues were also addressed. When an administrator revokes NFSv4 state for a filesystem via the unlock_fs interface, ongoing async COPY operations referencing that filesystem are now cancelled, with CB_OFFLOAD callbacks notifying affected clients. The remaining patches in this pull request are clean-ups and minor optimizations. Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.0 NFSD development cycle" * tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits) NFSD: Add POSIX ACL file attributes to SUPPATTR bitmasks NFSD: Add POSIX draft ACL support to the NFSv4 SETATTR operation NFSD: Add support for POSIX draft ACLs for file creation NFSD: Add support for XDR decoding POSIX draft ACLs NFSD: Refactor nfsd_setattr()'s ACL error reporting NFSD: Do not allow NFSv4 (N)VERIFY to check POSIX ACL attributes NFSD: Add nfsd4_encode_fattr4_posix_access_acl NFSD: Add nfsd4_encode_fattr4_posix_default_acl NFSD: Add nfsd4_encode_fattr4_acl_trueform_scope NFSD: Add nfsd4_encode_fattr4_acl_trueform Add RPC language definition of NFSv4 POSIX ACL extension NFSD: Add a Kconfig setting to enable support for NFSv4 POSIX ACLs xdrgen: Implement pass-through lines in specifications nfsd: cancel async COPY operations when admin revokes filesystem state nfsd: add controls to set the minimum number of threads per pool nfsd: adjust number of running nfsd threads based on activity sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSY sunrpc: split new thread creation into a separate function sunrpc: introduce the concept of a minimum number of threads per pool sunrpc: track the max number of requested threads in a pool ...	2026-02-12 08:23:53 -08:00
Linus Torvalds	aa2a0fcd4c	vfs-7.0-rc1.leases Please consider pulling these changes from the signed vfs-7.0-rc1.leases tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc olR/AP40iNOTRn7LosXbRWqGGZqzy9v64QYoLzk3QdsWuGmbRAD/egNQzof8mkAf IscefWTOjY7xyDzmEBEBnfHftgMiEwM= =zre0 -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs lease updates from Christian Brauner: "This contains updates for lease support to require filesystems to explicitly opt-in to lease support Currently kernel_setlease() falls through to generic_setlease() when a a filesystem does not define ->setlease(), silently granting lease support to every filesystem regardless of whether it is prepared for it. This is a poor default: most filesystems never intended to support leases, and the silent fallthrough makes it impossible to distinguish "supports leases" from "never thought about it". This inverts the default. It adds explicit .setlease = generic_setlease; assignments to every in-tree filesystem that should retain lease support, then changes kernel_setlease() to return -EINVAL when ->setlease is NULL. With the new default in place, simple_nosetlease() is redundant and is removed along with all references to it" * tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) fuse: add setlease file operation fs: remove simple_nosetlease() filelock: default to returning -EINVAL when ->setlease operation is NULL xfs: add setlease file operation ufs: add setlease file operation udf: add setlease file operation tmpfs: add setlease file operation squashfs: add setlease file operation overlayfs: add setlease file operation orangefs: add setlease file operation ocfs2: add setlease file operation ntfs3: add setlease file operation nilfs2: add setlease file operation jfs: add setlease file operation jffs2: add setlease file operation gfs2: add a setlease file operation fat: add setlease file operation f2fs: add setlease file operation exfat: add setlease file operation ext4: add setlease file operation ...	2026-02-09 11:59:07 -08:00
Linus Torvalds	74554251df	vfs-7.0-rc1.nonblocking_timestamps Please consider pulling these changes from the signed vfs-7.0-rc1.nonblocking_timestamps tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaYX49gAKCRCRxhvAZXjc oqNMAQCjHw9iwYDu63n96QAipWopJb8onqc0rTEvi0OOl1zDNwEAufN3EqTzV3uQ JbNgSwBWD/+ICd2aUOuAX0GgU6teyAQ= =lJlI -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc1.nonblocking_timestamps' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This contains the changes to support non-blocking timestamp updates. Since commit `66fa3cedf1` ("fs: Add async write file modification handling") file_update_time_flags() unconditionally returns -EAGAIN when any timestamp needs updating and IOCB_NOWAIT is set. This makes non-blocking direct writes impossible on file systems with granular enough timestamps, which in practice means all of them. This reworks the timestamp update path to propagate IOCB_NOWAIT through ->update_time so that file systems which can update timestamps without blocking are no longer penalized. With that groundwork in place, the core change passes IOCB_NOWAIT into ->update_time and returns -EAGAIN only when the file system indicates it would block. XFS implements non-blocking timestamp updates by using the new ->sync_lazytime and open-coding generic_update_time without the S_NOWAIT check, since the lazytime path through the generic helpers can never block in XFS" * tag 'vfs-7.0-rc1.nonblocking_timestamps' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: enable non-blocking timestamp updates xfs: implement ->sync_lazytime fs: refactor file_update_time_flags fs: add support for non-blocking timestamp updates fs: add a ->sync_lazytime method fs: factor out a sync_lazytime helper fs: refactor ->update_time handling fat: cleanup the flags for fat_truncate_time nfs: split nfs_update_timestamps fs: allow error returns from generic_update_time fs: remove inode_update_time	2026-02-09 11:25:01 -08:00
Chen Ni	1075e8e826	nfs: nfs4proc: Convert comma to semicolon Replace comma between expressions with semicolons. Using a ',' in place of a ';' can have unintended side effects. Although that is not the case here, it is seems best to use ';' unless ',' is intended. Found by inspection. No functional change intended. Compile tested only. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-02-09 14:24:19 -05:00
Sergey Shtylyov	e29a3e61ee	NFSv4: limit lease period in nfs4_set_lease_period() In nfs4_set_lease_period(), the passed 32-bit lease period in seconds is multiplied by HZ -- that might overflow before being implicitly cast to unsigned long (32/64-bit type), while initializing the lease variable. Cap the lease period at MAX_LEASE_PERIOD (#define'd to 1 hour for now), before multipying to avoid such overflow... Found by Linux Verification Center (linuxtesting.org) with the Svace static analysis tool. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Suggested-by: Trond Myklebust <trondmy@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-02-09 13:39:39 -05:00
Sergey Shtylyov	3d57c44e91	NFSv4: pass lease period in seconds to nfs4_set_lease_period() There's no need to multiply the lease period by HZ at all the call sites of nfs4_set_lease_period() -- it makes more sense to do that only once, inside that function, by passing to it lease period as 32-bit # of seconds instead of 32/64-bit unsigned long # of jiffies... Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-02-09 13:39:39 -05:00
Stephen Smalley	fdc0396b3c	nfs: unify security_inode_listsecurity() calls commit `243fea1346` ("NFSv4.2: fix listxattr to return selinux security label") introduced a direct call to security_inode_listsecurity() in nfs4_listxattr(). However, nfs4_listxattr() already indirectly called security_inode_listsecurity() via nfs4_listxattr_nfs4_label() if CONFIG_NFS_V4_SECURITY_LABEL is enabled and the server has the NFS_CAP_SECURITY_LABEL capability enabled. This duplication was fixed by commit `9acb237def` ("NFSv4.2: another fix for listxattr") by making the second call conditional on NFS_CAP_SECURITY_LABEL not being set by the server. However, the combination of the two changes effectively makes one call to security_inode_listsecurity() in every case - which is the desired behavior since getxattr() always returns a security xattr even if it has to synthesize one. Further, the two different calls produce different xattr name ordering between security.* and user.* xattr names. Unify the two separate calls into a single call and get rid of nfs4_listxattr_nfs4_label() altogether. Link: https://lore.kernel.org/selinux/CAEjxPJ6e8z__=MP5NfdUxkOMQ=EnUFSjWFofP4YPwHqK=Ki5nw@mail.gmail.com/ Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-02-09 13:39:39 -05:00
Sagi Grimberg	42e7c876b1	fs/nfs: Fix readdir slow-start regression Commit `580f236737` ("NFS: Adjust the amount of readahead performed by NFS readdir") reduces the amount of readahead names caching done by the client. The downside of this approach is READDIR now may suffer from a slow-start issue, where initially it will fetch names that fit in a single page, then in 2, 4, 8 until the maximum supported transfer size (usually 1M). This patch tries to take a balanced approach between mitigating the slow-start issue still maintaining some efficiency gains. Fixes: `580f236737` ("NFS: Adjust the amount of readahead performed by NFS readdir") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-02-09 13:39:38 -05:00
Linus Torvalds	bcc8fd3e15	lsm/stable-7.0 PR 20260203 -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmmCurkUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNDWA//RZxjjyY1I0GRDepJXJ8UFEVt4Fdr VsnSKL3o7sf0SAsQj2HCJsJPiwD5fHm2C2gdxh9rFC0bPpMbTVAkwUL7WhP+nkAt LA+UZKYurrk1XF6OctILoY3JcXmynb1Oe3lg6uVcWX5b1uEriqRgGKNcMYLb5fmr D1vZ9LMuZe8WwGTScprQID9FMrZ0TDbdI/vqG7si1W/PCFH7630MPJkmzmjPWvnV xJISKLOG+qbyWoNGLr+VaNjkmA+jPfsXAKWbfNXUGfikP8g/OHpFd70nIzJs8p7J dxZD7w6/kqSGhauQjcX8ov0zKxn83Z2Xt0+4Ldl5vOCWI3r4T3Y8WdarmULbq65n jIN8djDgmCJPqa5zuPmik+womaPk2GmSy1viEJdT4W0iHggTC1snOz1J+BbD+nkh uEZkmcCZbaeEQmfefxIyHDirrFsJvrunWupGrkfxvfFr+QU8H1xNLfMd6CQzvtI4 P5p/KrnP2e58tJqvPxSY315ewUMy73kZU5DUl+Rq6Y4ai415R7vtwwEEkSKWnyja LMdEumc9IrsiBMcLmsj8QwobCr7XJtdCQV5ohR8CPxxcsI/G0pR99e1pckD7l7Qm OG461BKHntU3SFWSiZw+rNWlJuyPcSy5nmUxQvxQHP9pShZPu8rTfYX+CBzrHJk2 OFjAwNJn1N/NfYI= =cCyp -----END PGP SIGNATURE----- Merge tag 'lsm-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Unify the security_inode_listsecurity() calls in NFSv4 While looking at security_inode_listsecurity() with an eye towards improving the interface, we realized that the NFSv4 code was making multiple calls to the LSM hook that could be consolidated into one. - Mark the LSM static branch keys as static - this helps resolve some sparse warnings - Add __rust_helper annotations to the LSM and cred wrapper functions - Remove the unsused set_security_override_from_ctx() function - Minor fixes to some of the LSM kdoc comment blocks * tag 'lsm-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: make keys for static branch static cred: remove unused set_security_override_from_ctx() rust: security: add __rust_helper to helpers rust: cred: add __rust_helper to helpers nfs: unify security_inode_listsecurity() calls lsm: fix kernel-doc struct member names	2026-02-09 10:16:48 -08:00
Olga Kornievskaia	5248d8474e	pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN It is possible to have a task get stuck on waiting on the NFS_LAYOUT_DRAIN in the following scenario 1. cpu a: waiter test NFS_LAYOUT_DRAIN (1) and plh_outstanding (1) 2. cpu b: atomic_dec_and_test() -> clear bit -> wake up 3. cpu c: sets NFS_LAYOUT_DRAIN again 4. cpu a: calls wait_on_bit() sleeps forever. To expand on this we have say 2 outstanding pnfs write IO that get ESTALE which causes both to call pnfs_destroy_layout() and set the NFS_LAYOUT_DRAIN bit but the 1st one doesn't call the pnfs_put_layout_hdr() yet (as that would prevent the 2nd ESTALE write from trying to call pnfs_destroy_layout()). If the 1st ESTALE write is the one that initially sets the NFS_LAYOUT_DRAIN so that new IO on this file initiates new LAYOUTGET. Another new write would find NFS_LAYOUT_DRAIN set and phl_outstanding>0 (step 1) and would wait_on_bit(). LAYOUTGET completes doing step 2. Now, the 2nd of ESTALE writes is calling pnfs_destory_layout() and set the NFS_LAYOUT_DRAIN bit (step 3). Finally, the waiting write wakes up to check the bit and goes back to sleep. The problem revolves around the fact that if NFS_LAYOUT_INVALID_STID was already set, it should not do the work of pnfs_mark_layout_stateid_invalid(), thus NFS_LAYOUT_DRAIN will not be set more than once for an invalid layout. Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: `880265c77a` ("pNFS: Avoid a live lock condition in pnfs_update_layout()") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-02-02 13:17:11 -05:00
Christoph Hellwig	4039fbedcb	NFS: fix delayed delegation return handling Rework this code that was totally busted at least as of my most recent changes. Introduce a separate list for delayed delegations so that they can't get lost and don't clutter up the returns list. Add a missing spin_unlock in the helper marking it as a regular pending return. Fixes: `0ebe655bd0` ("NFS: add a separate delegation return list") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-01-30 16:49:15 -05:00
Christoph Hellwig	94b8886510	NFS: simplify error handling in nfs_end_delegation_return Drop the pointless delegation->lock held over setting multiple atomic bits in different structures, and use separate labels for the delay vs abort cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-01-30 16:49:15 -05:00
Christoph Hellwig	f7550318b2	NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return This will allow to simplify the error handling flow. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-01-30 16:49:15 -05:00
Christoph Hellwig	438c3e47c2	NFS: remove the delegation == NULL check in nfs_end_delegation_return All callers now pass a non-NULL delegation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-01-30 16:49:15 -05:00
Christoph Hellwig	2bd7ebcf9b	NFS: use bool for the issync argument to nfs_end_delegation_return Replace the integer used as boolean with a bool type, and tidy up the prototype and top of function comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2026-01-30 16:49:15 -05:00

1 2 3 4 5 ...

7302 Commits