linux/Documentation/filesystems
Al Viro 1c18edd1b7 __dentry_kill(): new locking scheme
Currently we enter __dentry_kill() with parent (along with the victim
dentry and victim's inode) held locked.  Then we
	mark dentry refcount as dead
	call ->d_prune()
	remove dentry from hash
	remove it from the parent's list of children
	unlock the parent, don't need it from that point on
	detach dentry from inode, unlock dentry and drop the inode
(via ->d_iput())
	call ->d_release()
	regain the lock on dentry
	check if it's on a shrink list (in which case freeing its empty husk
has to be left to shrink_dentry_list()) or not (in which case we can free it
ourselves).  In the former case, mark it as an empty husk, so that
shrink_dentry_list() would know it can free the sucker.
	drop the lock on dentry
... and usually the caller proceeds to drop a reference on the parent,
possibly retaking the lock on it.

That is painful for a bunch of reasons, starting with the need to take locks
out of order, but not limited to that - the parent of positive dentry can
change if we drop its ->d_lock, so getting these locks has to be done with
care.  Moreover, as soon as dentry is out of the parent's list of children,
shrink_dcache_for_umount() won't see it anymore, making it appear as if
the parent is inexplicably busy.  We do work around that by having
shrink_dentry_list() decrement the parent's refcount first and put it on
shrink list to be evicted once we are done with __dentry_kill() of child,
but that may in some cases lead to ->d_iput() on child called after the
parent got killed.  That doesn't happen in cases where in-tree ->d_iput()
instances might want to look at the parent, but that's brittle as hell.

Solution: do removal from the parent's list of children in the very
end of __dentry_kill().  As the result, the callers do not need to
lock the parent and by the time we really need the parent locked,
dentry is negative and is guaranteed not to be moved around.

It does mean that ->d_prune() will be called with parent not locked.
It also means that we might see dentries in process of being torn
down while going through the parent's list of children; those dentries
will be unhashed, negative and with refcount marked dead.  In practice,
that's enough for in-tree code that looks through the list of children
to do the right thing as-is.  Out-of-tree code might need to be adjusted.

Calling conventions: __dentry_kill(dentry) is called with dentry->d_lock
held, along with ->i_lock of its inode (if any).  It either returns
the parent (locked, with refcount decremented to 0) or NULL (if there'd
been no parent or if refcount decrement for parent hadn't reached 0).

lock_for_kill() is adjusted for new requirements - it doesn't touch
the parent's ->d_lock at all.

Callers adjusted.  Note that for dput() we don't need to bother with
fast_dput() for the parent - we just need to check retain_dentry()
for it, since its ->d_lock is still held since the moment when
__dentry_kill() had taken it to remove the victim from the list of
children.

The kludge with early decrement of parent's refcount in
shrink_dentry_list() is no longer needed - shrink_dcache_for_umount()
sees the half-killed dentries in the list of children for as long
as they are pinning the parent.  They are easily recognized and
accounted for by select_collect(), so we know we are not done yet.

As the result, we always have the expected ordering for ->d_iput()/->d_release()
vs. __dentry_kill() of the parent, no exceptions.  Moreover, the current
rules for shrink lists (one must make sure that shrink_dcache_for_umount()
won't happen while any dentries from the superblock in question are on
any shrink lists) are gone - shrink_dcache_for_umount() will do the
right thing in all cases, taking such dentries out.  Their empty
husks (memory occupied by struct dentry itself + its external name,
if any) will remain on the shrink lists, but they are no obstacles
to filesystem shutdown.  And such husks will get freed as soon as
shrink_dentry_list() of the list they are on gets to them.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25 02:34:49 -05:00
..
caching Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ext4 Documentation: Fix typos 2023-08-18 11:29:03 -06:00
nfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
smb smb3: move Documentation/filesystems/cifs to Documentation/filesystems/smb 2023-05-24 16:29:21 -05:00
spufs Documentation: spufs: correct a duplicate word typo 2022-09-27 13:21:44 -06:00
9p.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
adfs.rst
affs.rst affs: fix basic permission bits to actually work 2020-08-31 12:20:31 +02:00
afs.rst afs: Documentation: correct reference to CONFIG_AFS_FS 2023-07-21 13:46:02 -06:00
api-summary.rst block: move fs/block_dev.c to block/bdev.c 2021-09-07 08:39:40 -06:00
autofs-mount-control.rst autofs: use flexible array in ioctl structure 2023-05-30 16:42:00 -07:00
autofs.rst autofs: use flexible array in ioctl structure 2023-05-30 16:42:00 -07:00
automount-support.rst
befs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
bfs.rst
btrfs.rst MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org 2023-09-08 14:21:27 +02:00
ceph.rst ceph: update documentation regarding snapshot naming limitations 2023-08-24 11:24:36 +02:00
coda.rst Documentation: coda: annotate duplicated words 2020-07-13 10:02:32 -06:00
configfs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
cramfs.rst
dax.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
debugfs.rst debugfs: small Documentation cleaning 2022-11-09 13:58:55 -07:00
devpts.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
directory-locking.rst fs: Lock moved directories 2023-06-02 15:00:18 +02:00
dlmfs.rst docs: update ocfs2-devel mailing list address 2023-07-08 09:29:29 -07:00
dnotify.rst
ecryptfs.rst
efivarfs.rst
erofs.rst erofs: fix inode metadata space layout description in documentation 2023-10-12 11:04:14 +08:00
ext2.rst ext2: remove nobh support 2022-08-02 12:34:04 -04:00
ext3.rst
f2fs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
fiemap.rst
files.rst file: convert to SLAB_TYPESAFE_BY_RCU 2023-10-19 11:02:48 +02:00
fscrypt.rst fscrypt: track master key presence separately from secret 2023-10-16 21:23:45 -07:00
fsverity.rst ovl: Add framework for verity support 2023-08-12 19:02:38 +03:00
fuse-io.rst
fuse.rst fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other 2022-07-21 16:06:19 +02:00
gfs2-glocks.rst gfs2 fixes 2023-09-05 13:00:28 -07:00
gfs2-uevents.rst
gfs2.rst Documentation: Update filesystems/gfs2.rst 2020-12-01 00:25:20 +01:00
hfs.rst
hfsplus.rst
hpfs.rst
idmappings.rst Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
index.rst docs: add maintainer entry profile for XFS 2023-08-10 07:47:53 -07:00
inotify.rst
isofs.rst
journalling.rst jbd2: drop jbd2_fc_init documentation 2020-11-06 23:01:03 -05:00
locking.rst Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
locks.rst docs: fs: locks.rst: update comment about mandatory file locking 2021-10-19 06:48:21 -04:00
mount_api.rst fs_context: drop the unused lsm_flags member 2023-03-16 14:38:28 +01:00
netfs_library.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
nilfs2.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ntfs.rst
ntfs3.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ocfs2-online-filecheck.rst
ocfs2.rst docs: update ocfs2-devel mailing list address 2023-07-08 09:29:29 -07:00
omfs.rst Replace HTTP links with HTTPS ones: OMFS 2020-07-13 11:24:43 -06:00
orangefs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
overlayfs.rst ovl: add support for appending lowerdirs one by one 2023-10-31 00:13:02 +02:00
path-lookup.rst Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
path-lookup.txt
porting.rst __dentry_kill(): new locking scheme 2023-11-25 02:34:49 -05:00
proc.rst doc: Add /proc/bootconfig to proc.rst 2023-10-31 18:46:43 +09:00
qnx6.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
quota.rst quota: Fixup http links in quota doc 2020-07-09 08:14:01 +02:00
ramfs-rootfs-initramfs.rst Documentation/filesystems: ramfs-rootfs-initramfs: use :Author: 2023-05-16 12:55:35 -06:00
relay.rst
romfs.rst
seq_file.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
sharedsubtree.rst Documentation/filesystems: sharedsubtree: add section headings 2023-05-16 12:50:05 -06:00
splice.rst
squashfs.rst
sysfs.rst driver core: bus: mark the struct bus_type for sysfs callbacks as constant 2023-03-23 13:20:40 +01:00
sysv-fs.rst
tmpfs.rst tmpfs,xattr: enable limited user extended attributes 2023-08-10 12:06:04 +02:00
ubifs-authentication.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ubifs.rst Documentation: ubifs: Fix compression idiom 2022-10-10 13:01:10 -06:00
udf.rst udf: Replace HTTP links with HTTPS ones 2020-07-14 14:37:39 +02:00
vfat.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
vfs.rst Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
virtiofs.rst
xfs-delayed-logging-design.rst Documentation: filesystems: correct possessive "its" 2022-09-27 13:21:44 -06:00
xfs-maintainer-entry-profile.rst docs: add maintainer entry profile for XFS 2023-08-10 07:47:53 -07:00
xfs-online-fsck-design.rst Documentation: xfs: Remove repeated word in comments 2023-09-22 05:20:09 -06:00
xfs-self-describing-metadata.rst xfs: document the filesystem metadata checking strategy 2023-04-11 18:59:47 -07:00
zonefs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00