linux/fs
Eric W. Biederman f07288cfb0 mnt: Make propagate_umount less slow for overlapping mount propagation trees
commit 296990deb3 upstream.

Andrei Vagin pointed out that time to executue propagate_umount can go
non-linear (and take a ludicrious amount of time) when the mount
propogation trees of the mounts to be unmunted by a lazy unmount
overlap.

Make the walk of the mount propagation trees nearly linear by
remembering which mounts have already been visited, allowing
subsequent walks to detect when walking a mount propgation tree or a
subtree of a mount propgation tree would be duplicate work and to skip
them entirely.

Walk the list of mounts whose propgatation trees need to be traversed
from the mount highest in the mount tree to mounts lower in the mount
tree so that odds are higher that the code will walk the largest trees
first, allowing later tree walks to be skipped entirely.

Add cleanup_umount_visitation to remover the code's memory of which
mounts have been visited.

Add the functions last_slave and skip_propagation_subtree to allow
skipping appropriate parts of the mount propagation tree without
needing to change the logic of the rest of the code.

A script to generate overlapping mount propagation trees:

$ cat runs.h
set -e
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/1 /mnt/2
mount -t tmpfs zdtm /mnt/1
mount --make-shared /mnt/1
mkdir /mnt/1/1

iteration=10
if [ -n "$1" ] ; then
	iteration=$1
fi

for i in $(seq $iteration); do
	mount --bind /mnt/1/1 /mnt/1/1
done

mount --rbind /mnt/1 /mnt/2

TIMEFORMAT='%Rs'
nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 ))
echo -n "umount -l /mnt/1 -> $nr        "
time umount -l /mnt/1

nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l )
time umount -l /mnt/2

$ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done

Here are the performance numbers with and without the patch:

     mhash |  8192   |  8192  | 1048576 | 1048576
    mounts | before  | after  |  before | after
    ------------------------------------------------
      1025 |  0.040s | 0.016s |  0.038s | 0.019s
      2049 |  0.094s | 0.017s |  0.080s | 0.018s
      4097 |  0.243s | 0.019s |  0.206s | 0.023s
      8193 |  1.202s | 0.028s |  1.562s | 0.032s
     16385 |  9.635s | 0.036s |  9.952s | 0.041s
     32769 | 60.928s | 0.063s | 44.321s | 0.064s
     65537 |         | 0.097s |         | 0.097s
    131073 |         | 0.233s |         | 0.176s
    262145 |         | 0.653s |         | 0.344s
    524289 |         | 2.305s |         | 0.735s
   1048577 |         | 7.107s |         | 2.603s

Andrei Vagin reports fixing the performance problem is part of the
work to fix CVE-2016-6213.

Fixes: a05964f391 ("[PATCH] shared mounts handling: umount")
Reported-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:58 +02:00
..
9p 9p: fix a potential acl leak 2017-05-14 13:32:54 +02:00
adfs
affs affs: fix remount failure when there are no options changed 2016-06-07 18:14:32 -07:00
afs
autofs4 autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL 2017-06-29 12:48:50 +02:00
befs
bfs
btrfs Btrfs: fix truncate down when no_holes feature is enabled 2017-07-05 14:37:18 +02:00
cachefiles FS-Cache: Add missing initialization of ret in cachefiles_write_page() 2015-11-16 20:38:43 -05:00
ceph fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
cifs CIFS: Improve readdir verbosity 2017-06-29 12:48:51 +02:00
coda fs/coda: fix readlink buffer overflow 2015-09-10 13:29:01 -07:00
configfs configfs: Fix race between create_link and configfs_rmdir 2017-06-26 07:13:08 +02:00
cramfs
debugfs debugfs: Make automount point inodes permanently empty 2016-05-04 14:48:41 -07:00
devpts devpts: clean up interface to pty drivers 2016-08-16 09:30:49 +02:00
dlm dlm: free workqueues after the connections 2016-10-22 12:26:56 +02:00
ecryptfs ecryptfs: fix handling of directory opening 2016-09-15 08:27:47 +02:00
efivarfs efi: Make efivarfs entries immutable by default 2016-03-03 15:07:09 -08:00
efs
exofs osd fs: __r4w_get_page rely on PageUptodate for uptodate 2015-12-12 10:15:34 -08:00
exportfs
ext2 posix_acl: Clear SGID bit when setting file permissions 2016-10-31 04:13:58 -06:00
ext4 ext4: check return value of kstrtoull correctly in reserved_clusters_store 2017-07-15 11:57:50 +02:00
f2fs fscrypt: avoid collisions when presenting long encrypted filenames 2017-05-25 14:30:11 +02:00
fat fat: fix using uninitialized fields of fat_inode/fsinfo_inode 2017-03-15 09:57:15 +08:00
freevxfs
fscache FS-Cache: Initialise stores_lock in netfs cookie 2017-06-17 06:39:37 +02:00
fuse fuse: add missing FR_FORCE 2017-03-12 06:37:28 +01:00
gfs2 gfs2: Fix glock rhashtable rcu bug 2017-07-15 11:57:46 +02:00
hfs hfs: fix B-tree corruption after insertion at position 0 2015-09-10 13:29:01 -07:00
hfsplus posix_acl: Clear SGID bit when setting file permissions 2016-10-31 04:13:58 -06:00
hostfs hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() 2016-09-30 10:18:39 +02:00
hpfs hpfs: implement the show_options method 2016-06-01 12:15:54 -07:00
hugetlbfs mm: larger stack guard gap, between vmas 2017-06-26 07:13:11 +02:00
isofs isofs: Do not return EACCES for unknown filesystems 2016-10-28 03:01:34 -04:00
jbd2 jbd2: don't leak modified metadata buffers on an aborted journal 2017-03-12 06:37:26 +01:00
jffs2 posix_acl: Clear SGID bit when setting file permissions 2016-10-31 04:13:58 -06:00
jfs fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
kernfs kernfs: don't depend on d_find_any_alias() when generating notifications 2016-09-24 10:07:36 +02:00
lockd Mainly smaller bugfixes and cleanup. We're still finding some bugs from 2015-11-11 20:11:28 -08:00
logfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
minix
ncpfs ncpfs: fix a braino in OOM handling in ncp_fill_cache() 2016-03-16 08:42:59 -07:00
nfs NFSv4: fix a reference leak caused WARNING messages 2017-07-05 14:37:15 +02:00
nfs_common
nfsd fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
nilfs2 fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
nls
notify fanotify: fix list corruption in fanotify_get_response() 2016-09-30 10:18:37 +02:00
ntfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
ocfs2 ocfs2: o2hb: revert hb threshold to keep compatible 2017-07-05 14:37:22 +02:00
omfs
openpromfs
overlayfs ovl: fsync after copy-up 2016-11-10 16:36:34 +01:00
proc mm: larger stack guard gap, between vmas 2017-06-26 07:13:11 +02:00
pstore pstore/ram: Use memcpy_fromio() to save old buffer 2016-10-28 03:01:27 -04:00
qnx4
qnx6
quota quota: Fix possible GPF due to uninitialised pointers 2016-04-12 09:08:56 -07:00
ramfs mm, fs: obey gfp_mapping for add_to_page_cache() 2015-10-16 11:42:28 -07:00
reiserfs fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-06-17 06:39:38 +02:00
squashfs squashfs: xattr simplifications 2015-11-13 20:34:33 -05:00
sysfs sysfs: be careful of error returns from ops->show() 2017-04-12 12:38:33 +02:00
sysv fix sysvfs symlinks 2015-11-23 21:11:08 -05:00
tracefs tracefs: Fix refcount imbalance in start_creating() 2015-11-04 22:13:45 -05:00
ubifs ubifs: Fix journal replay wrt. xattr nodes 2017-01-26 08:23:48 +01:00
udf fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
ufs ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path 2017-06-14 13:16:24 +02:00
xfs Make __xfs_xattr_put_listen preperly report errors. 2017-06-14 13:16:27 +02:00
aio.c aio: mark AIO pseudo-fs noexec 2016-10-07 15:23:47 +02:00
anon_inodes.c
attr.c vfs: move permission checking into notify_change() for utimes(NULL) 2016-10-22 12:26:56 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
binfmt_elf.c binfmt_elf: use ELF_ET_DYN_BASE only for PIE 2017-07-21 07:44:57 +02:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c fs/block_dev: always invalidate cleancache in invalidate_bdev() 2017-05-20 14:27:01 +02:00
buffer.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c i2c-dev: Fix typo in ioctl name reference 2015-10-23 23:26:43 +02:00
compat.c
coredump.c coredump: Ensure proper size of sparse core files 2017-07-05 14:37:20 +02:00
dax.c dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
dcache.c fs/dcache.c: fix spin lockup issue on nlru->lock 2017-07-21 07:44:56 +02:00
dcookies.c
direct-io.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
drop_caches.c
eventfd.c
eventpoll.c
exec.c exec: Limit arg stack to at most 75% of _STK_LIM 2017-07-21 07:44:57 +02:00
fcntl.c fs: add a VALID_OPEN_FLAGS 2017-07-15 11:57:44 +02:00
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
file_table.c
file.c vfs: clear remainder of 'full_fds_bits' in dup_fd() 2015-11-05 23:05:32 -08:00
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode 2016-04-12 09:09:04 -07:00
inode.c vfs: fix deadlock in file_remove_privs() on overlayfs 2016-08-10 11:49:30 +02:00
internal.h
ioctl.c
Kconfig dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
Kconfig.binfmt
libfs.c
locks.c locks: use file_inode() 2016-08-10 11:49:27 +02:00
Makefile ext4: promote ext4 over ext2 in the default probe order 2015-10-15 10:33:21 -04:00
mbcache.c
mount.h mnt: In propgate_umount handle visiting mounts in any order 2017-07-21 07:44:57 +02:00
mpage.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
namei.c fs: Check for invalid i_uid in may_follow_link() 2016-09-15 08:27:49 +02:00
namespace.c mnt: In propgate_umount handle visiting mounts in any order 2017-07-21 07:44:57 +02:00
no-block.c
nsfs.c fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void 2015-09-11 15:21:34 -07:00
open.c fs: completely ignore unknown open flags 2017-07-15 11:57:44 +02:00
pipe.c pipe: limit the per-user amount of pages allocated in pipes 2016-06-07 18:14:35 -07:00
pnode.c mnt: Make propagate_umount less slow for overlapping mount propagation trees 2017-07-21 07:44:58 +02:00
pnode.h mnt: Add a per mount namespace limit on the number of mounts 2017-04-30 05:49:28 +02:00
posix_acl.c tmpfs: clear S_ISGID when setting posix ACLs 2017-01-26 08:23:47 +01:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-04-12 09:08:55 -07:00
read_write.c
readdir.c
select.c
seq_file.c fs/seq_file: fix out-of-bounds read 2016-09-07 08:32:43 +02:00
signalfd.c
splice.c vfs: fix uninitialized flags in splice_to_pipe() 2017-02-23 17:43:09 +01:00
stack.c
stat.c ufs: restore maintaining ->i_blocks 2017-06-14 13:16:24 +02:00
statfs.c
super.c fs/super.c: fix race between freeze_super() and thaw_super() 2016-10-28 03:01:32 -04:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2015-11-06 17:50:42 -08:00
timerfd.c timerfd: Protect the might cancel mechanism proper 2017-05-08 07:46:01 +02:00
userfaultfd.c userfaultfd: don't block on the last VM updates at exit time 2016-03-16 08:43:01 -07:00
utimes.c vfs: move permission checking into notify_change() for utimes(NULL) 2016-10-22 12:26:56 +02:00
xattr.c fs/xattr.c: zero out memory copied to userspace in getxattr 2017-05-20 14:27:01 +02:00