linux/fs
Chunguang Xu f9ed0ea0a9 ext4: fix a possible ABBA deadlock due to busy PA
commit 8c80fb312d upstream.

We found on older kernel (3.10) that in the scenario of insufficient
disk space, system may trigger an ABBA deadlock problem, it seems that
this problem still exists in latest kernel, try to fix it here. The
main process triggered by this problem is that task A occupies the PA
and waits for the jbd2 transaction finish, the jbd2 transaction waits
for the completion of task B's IO (plug_list), but task B waits for
the release of PA by task A to finish discard, which indirectly forms
an ABBA deadlock. The related calltrace is as follows:

    Task A
    vfs_write
    ext4_mb_new_blocks()
    ext4_mb_mark_diskspace_used()       JBD2
    jbd2_journal_get_write_access()  -> jbd2_journal_commit_transaction()
  ->schedule()                          filemap_fdatawait()
 |                                              |
 | Task B                                       |
 | do_unlinkat()                                |
 | ext4_evict_inode()                           |
 | jbd2_journal_begin_ordered_truncate()        |
 | filemap_fdatawrite_range()                   |
 | ext4_mb_new_blocks()                         |
  -ext4_mb_discard_group_preallocations() <-----

Here, try to cancel ext4_mb_discard_group_preallocations() internal
retry due to PA busy, and do a limited number of retries inside
ext4_mb_discard_preallocations(), which can circumvent the above
problems, but also has some advantages:

1. Since the PA is in a busy state, if other groups have free PAs,
   keeping the current PA may help to reduce fragmentation.
2. Continue to traverse forward instead of waiting for the current
   group PA to be released. In most scenarios, the PA discard time
   can be reduced.

However, in the case of smaller free space, if only a few groups have
space, then due to multiple traversals of the group, it may increase
CPU overhead. But in contrast, I feel that the overall benefit is
better than the cost.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/1637630277-23496-1-git-send-email-brookxu.cn@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 10:54:27 +01:00
..
9p 9p: only copy valid iattrs in 9P2000.L setattr implementation 2022-01-20 09:17:50 +01:00
adfs
affs
afs afs: Fix updating of i_blocks on file/dir extension 2021-09-30 10:11:01 +02:00
autofs
befs
bfs
btrfs btrfs: respect the max size in the header when activating swap file 2022-01-27 10:54:27 +01:00
cachefiles
ceph ceph: fix up non-directory creation in SGID directories 2021-12-29 12:26:05 +01:00
cifs smb3: do not error on fsync when readonly 2021-12-01 09:19:08 +01:00
coda
configfs
cramfs
crypto fscrypt: allow 256-bit master keys with AES-256-XTS 2021-11-18 14:03:54 +01:00
debugfs debugfs: lockdown: Allow reading debugfs files that are not world readable 2022-01-27 10:54:02 +01:00
devpts
dlm fs: dlm: filter user dlm messages for kernel locks 2022-01-27 10:54:10 +01:00
ecryptfs
efivarfs
efs
erofs erofs: fix deadlock when shrink erofs slab 2021-12-01 09:19:05 +01:00
exfat exfat: fix incorrect loading of i_blocks for large files 2021-11-18 14:03:37 +01:00
exportfs
ext2 ext2: fix sleeping in atomic bugs on error 2021-10-09 14:40:56 +02:00
ext4 ext4: fix a possible ABBA deadlock due to busy PA 2022-01-27 10:54:27 +01:00
f2fs f2fs: fix to do sanity check in is_alive() 2022-01-27 10:53:41 +01:00
fat
freevxfs
fscache fscache: Fix cookie key hashing 2021-09-18 13:40:15 +02:00
fuse fuse: Pass correct lend value to filemap_write_and_wait_range() 2022-01-27 10:54:25 +01:00
gfs2 gfs2: Fix length of holes reported at end-of-file 2021-12-08 09:03:18 +01:00
hfs
hfsplus
hostfs
hpfs
hugetlbfs
iomap treewide: Change list_sort to use const pointers 2021-09-30 10:11:04 +02:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-11-12 14:58:33 +01:00
jbd2
jffs2 jffs2: GC deadlock reading a page that is used in jffs2_write_begin() 2022-01-27 10:54:18 +01:00
jfs JFS: fix memleak in jfs_mount 2021-11-18 14:04:15 +01:00
kernfs
lockd lockd: lockd server-side shouldn't set fl_ops 2021-09-18 13:40:30 +02:00
minix
nfs NFSv42: Fix pagecache invalidation after COPY/CLONE 2021-12-08 09:03:17 +01:00
nfs_common
nfsd nfsd: Fix nsfd startup race (again) 2021-12-14 11:32:39 +01:00
nilfs2 nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group 2021-09-26 14:09:01 +02:00
nls
notify fanotify: limit number of event merge attempts 2021-09-18 13:40:38 +02:00
ntfs
ocfs2 ocfs2: fix data corruption on truncate 2021-11-18 14:03:37 +01:00
omfs
openpromfs
orangefs orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc() 2022-01-20 09:17:50 +01:00
overlayfs ovl: fix warning in ovl_create_real() 2021-12-22 09:30:58 +01:00
proc proc/vmcore: fix clearing user buffer by properly using clear_user() 2021-12-01 09:19:02 +01:00
pstore
qnx4 qnx4: work around gcc false positive warning bug 2021-09-30 10:11:08 +02:00
qnx6
quota quota: correct error number in free_dqentry() 2021-11-18 14:03:51 +01:00
ramfs
reiserfs
romfs
squashfs
sysfs
sysv
tracefs tracefs: Set all files to the same group ownership as the mount option 2021-12-14 11:32:40 +01:00
ubifs ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers 2022-01-27 10:54:24 +01:00
udf udf: Fix error handling in udf_new_inode() 2022-01-27 10:54:23 +01:00
ufs
unicode
vboxsf vboxfs: fix broken legacy mount signature checking 2021-10-17 10:43:33 +02:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-10-06 15:55:46 +02:00
xfs xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate 2022-01-11 15:25:01 +01:00
zonefs zonefs: add MODULE_ALIAS_FS 2021-12-22 09:30:57 +01:00
aio.c aio: fix use-after-free due to missing POLLFREE handling 2021-12-14 11:32:40 +01:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings 2021-10-06 15:55:59 +02:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c
buffer.c
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: fix memleak in dump_vma_snapshot() 2021-09-26 14:08:56 +02:00
d_path.c
dax.c
dcache.c
dcookies.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c
exec.c
fcntl.c
fhandle.c
file_table.c
file.c fget: check that the fd still exists after getting a ref to it 2021-12-08 09:03:21 +01:00
filesystems.c
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-20 09:17:50 +01:00
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c
fsopen.c
init.c
inode.c fs: export an inode_update_time helper 2021-11-26 10:39:22 +01:00
internal.h
io_uring.c Revert "io_uring: reinforce cancel on flush during exit" 2021-11-06 14:10:08 +01:00
io-wq.c io-wq: fix wakeup race when adding new work 2021-09-18 13:40:06 +02:00
io-wq.h
ioctl.c
Kconfig
Kconfig.binfmt
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-27 09:56:51 +02:00
libfs.c
locks.c
Makefile
mbcache.c
mount.h
mpage.c
namei.c
namespace.c
no-block.c
nsfs.c
open.c
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c
remap_range.c
select.c
seq_file.c
signalfd.c signalfd: use wake_up_pollfree() 2021-12-14 11:32:40 +01:00
splice.c
stack.c
stat.c
statfs.c
super.c devtmpfs regression fix: reconfigure on each mount 2022-01-20 09:17:49 +01:00
sync.c
timerfd.c
userfaultfd.c userfaultfd: fix a race between writeprotect and exit_mmap() 2021-10-27 09:56:51 +02:00
utimes.c
xattr.c