This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
Rework this code that was totally busted at least as of my most
recent changes. Introduce a separate list for delayed delegations
so that they can't get lost and don't clutter up the returns list.
Add a missing spin_unlock in the helper marking it as a regular
pending return.
Fixes: 0ebe655bd0 ("NFS: add a separate delegation return list")
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Drop the pointless delegation->lock held over setting multiple
atomic bits in different structures, and use separate labels
for the delay vs abort cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
This will allow to simplify the error handling flow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Replace the integer used as boolean with a bool type, and tidy up
the prototype and top of function comment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The caller doesn't check the return value, so drop it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
None of the callers checks the return value, so drop it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Currently nfs_mark_return_unreferenced_delegations marks all open but
not referenced delegations (i.e., those were found by a previous pass)
as return on close, which means that we'll return them on close without
a way out.
Replace this with only iterating delegations that are on the LRU list,
and avoid delegations that are in use by an open files to avoid this.
Delegations that were never referenced while open still are be prime
candidates for return from the LRU if the number of delegations is over
the watermark, or otherwise will be returned by the next
nfs_mark_return_unreferenced_delegations pass after they are closed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Directly returning delegations on close when over the watermark is
rather suboptimal as these delegations are much more likely to be reused
than those that have been unused for a long time. Switch to returning
unused delegations from a new LRU list when we are above the threshold and
there are reclaimable delegations instead.
Pass over referenced delegations during the first pass to give delegations
that aren't in active used by frequently used for stat() or similar another
chance to not be instantly reclaimed. This scheme works the same as the
referenced flags in the VFS inode and dentry caches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Searching for returnable delegations in the per-server delegations list
can be very expensive. While commit e04bbf6b1b ("NFS: Avoid quadratic
search when freeing delegations.") reduced the overhead a bit, the
fact that all the non-returnable delegations have to be searched limits
the amount of optimizations that can be done.
Fix this by introducing a separate list that only contains delegations
scheduled for return.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Remove a level of indentation for the main code path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Nested RCU critical sections are fine and very cheap. Have a local one
in nfs_start_delegation_return so that the function is self-contained
and to prepare for simplifying the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Using the unconditional reference increment means we can take a
reference to a delegation already in the RCU grace period, which could
cause a use after free under very unlikely conditions. Switch to use
refcount_inc_not_zero instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
All callers now hold references to the delegation as part of the lookup,
removing the need for an extra reference for those that are actually
returned which is then dropped in nfs_end_delegation_return.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Currently most work on struct nfs_delegation happens directly under RCU
protection. This is generally fine, despite that long RCU sections are
not good for performance. But for operations later taking a reference
to the delegation to perform blocking work, refcount_inc is used, which
can be racy against dropping the last reference and thus lead to use
after frees in extremely rare cases.
Fix this by taking a reference in nfs4_get_valid_delegation using
refcount_inc_not_zero so that the callers have a stabilized reference
they can work with and can be moved outside the RCU critical section.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
nfs_inode_set_delegation as the only direct caller of
nfs_detach_delegation_locked already check this under cl_lock, so
don't repeat it.
Replace the lockdep coverage for the lock that was implicitly provided by
the rcu_dereference_protected call that is removed with an explicit
lockdep assert to keep the coverage.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
nfs_detach_delegation always returns either the passed in delegation
or NULL, simplify this to a bool return.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Renewal only cares for active delegations and not revoked ones. Replace
the list empty check with reading the active delegation counter to
implement this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Open code nfs_free_delegation in the callers, because having a "free"
function that wraps a revoke and put operation is a bit confusing,
especially when the __free version does the actual freeing triggered by
the last put.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
There is only a single caller, and the function can be condensed into a
single if statement, making it more clear what is being tested there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
This essentially reverts commit 6f9449be53 ("NFS: Fix a soft lockup in
the delegation recovery code") because the code walking the per-server
delegation list has been fixed to just skip inodes for which
nfs_delegation_grab_inode fails, instead of having to restart the entire
series in commit f92214e4c3 ("NFS: Avoid unnecessary rescanning of the
per-server delegation list").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Now that nfs_start_delegation_return_locked is gone, and we have
RCU locking asserts, drop the extra postfix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
And clean up the dereference of the delegation a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
There is only one caller, so fold it into that. With that,
nfs_start_delegation_return
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The only caller dereferences a field in the inode just before calling
nfs4_inode_return_delegation_on_close.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Fold nfs_client_mark_return_all_delegations into
nfs_expire_all_delegations, which is the only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
nfs_client_mark_return_unused_delegation_types is only called by
nfs_expire_unused_delegation_types, so merge the two.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The function nfs_inode_evict_delegation() immediately and synchronously
returns a delegation when called. This means we can't call it from
nfs4_have_delegation(), since that function could be called under a
lock. Instead we should mark the delegation for return and let the state
manager handle it for us.
Fixes: b6d2a520f4 ("NFS: Add a module option to disable directory delegations")
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The state recovery code in nfs_end_delegation_return() is intended to
allow regular files to recover cached open and lock state. It has no
function for directory delegations, and may cause corruption.
Fixes: 156b094829 ("NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When this option is disabled then the client will not request directory
delegations or check if we have one during the revalidation paths.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This patch adds a new flag: NFS_INO_REQ_DIR_DELEG to signal that a
directory wants to request a directory delegation the next time it does
a GETATTR. I have the client request a directory delegation when doing
an access, create, or unlink call since these calls indicate that a user
is working with a directory.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
nfs_delegation_find_inode currently has to walk the entire list of
delegations per inode, which can become pretty large, and can become even
larger when increasing the delegation watermark.
Add a hash table to speed up the delegation lookup, sized as a fraction
of the delegation watermark.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250718081509.2607553-6-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The active delegation watermark was added to avoid overloading servers.
Track the active delegation per-server instead of globally so that clients
talking to multiple servers aren't limited by the global limit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250718081509.2607553-5-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Keep the module_param_named next to the variable declaration instead of
somewhere unrelated, following the best practice in the rest of the
kernel.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250718081509.2607553-4-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reduce a level of indentation for most of the code in this function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250718081509.2607553-3-hch@lst.de
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add a tracepoint in the function that decides whether to return a
delegation to the server.
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250618-nfs-tracepoints-v2-3-540c9fb48da2@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We have tracepoints for setting a delegation and reclaiming them. Add a
tracepoint for when the delegation is being detached from the inode.
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20250618-nfs-tracepoints-v2-2-540c9fb48da2@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The NFS client's list of delegations can grow quite large (well beyond the
delegation watermark) if the server is revoking or there are repeated
events that expire state. Once this happens, the revoked delegations can
cause a performance problem for subsequent walks of the
servers->delegations list when the client tries to test and free state.
If we can determine that the FREE_STATEID operation has completed without
error, we can prune the delegation from the list.
Since the NFS client combines TEST_STATEID with FREE_STATEID in its minor
version operations, there isn't an easy way to communicate success of
FREE_STATEID. Rather than re-arrange quite a number of calling paths to
break out the separate procedures, let's signal the success of FREE_STATEID
by setting the stateid's type.
Set NFS4_FREED_STATEID_TYPE for stateids that have been successfully
discarded from the server, and use that type to signal that the delegation
can be cleaned up.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Check that the delegation is still attached after taking the spin lock
in nfs_start_delegation_return_locked().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The amount of looping through the list of delegations is occasionally
leading to soft lockups. If the state manager was asked to manage the
delayed return of delegations, then only scan those filesystems
containing delegations that were marked as being delayed.
Fixes: be20037725 ("NFSv4: Fix delegation return in cases where we have to retry")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The amount of looping through the list of delegations is occasionally
leading to soft lockups. If the state manager was asked to reap the
expired delegations, it should scan only those filesystems that hold
delegations that need to be reaped.
Fixes: 7f156ef0bf ("NFSv4: Clean up nfs_delegation_reap_expired()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The amount of looping through the list of delegations is occasionally
leading to soft lockups. If the state manager was asked to return
delegations asynchronously, it should only scan those filesystems that
hold delegations that need to be returned.
Fixes: af3b61bf61 ("NFSv4: Clean up nfs_client_return_marked_delegations()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The amount of looping through the list of delegations is occasionally
leading to soft lockups. Avoid at least some loops by not requiring the
NFSv4 state manager to scan for delegations that are marked for
return-on-close. Instead, either mark them for immediate return (if
possible) or else leave it up to nfs4_inode_return_delegation_on_close()
to return them once the file is closed by the application.
Fixes: b757144fd7 ("NFSv4: Be less aggressive about returning delegations for open files")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the file is sillyrenamed, and slated for delete on close, it is
possible for a server reboot to triggeer an open reclaim, with can again
race with the application call to close(). When that happens, the call
to put_nfs_open_context() can trigger a synchronous delegreturn call
which deadlocks because it is not marked as privileged.
Instead, ensure that the call to nfs4_inode_return_delegation_on_close()
catches the delegreturn, and schedules it asynchronously.
Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Fixes: adb4b42d19 ("Return the delegation when deleting sillyrenamed files")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
After the delegation is returned to the NFS server remove it
from the server's delegations list to reduce the time it takes
to scan this list.
Network trace captured while running the below script shows the
time taken to service the CB_RECALL increases gradually due to
the overhead of traversing the delegation list in
nfs_delegation_find_inode_server.
The NFS server in this test is a Solaris server which issues
CB_RECALL when receiving the all-zero stateid in the SETATTR.
mount=/mnt/data
for i in $(seq 1 20)
do
echo $i
mkdir $mount/testtarfile$i
time tar -C $mount/testtarfile$i -xf 5000_files.tar
done
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
If the call to nfs_delegation_grab_inode() fails, we will not have
dropped any locks that require us to rescan the list.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the atime or mtime attributes were delegated, then we need to
propagate their new values back to the server when returning the
delegation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>