linux/fs
Ryusuke Konishi f37b5d42b6 nilfs2: fix deadlock of segment constructor during recovery
commit 283ee1482f upstream.

According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
during recovery at mount time.  The code path that caused the deadlock was
as follows:

  nilfs_fill_super()
    load_nilfs()
      nilfs_salvage_orphan_logs()
        * Do roll-forwarding, attach segment constructor for recovery,
          and kick it.

        nilfs_segctor_thread()
          nilfs_segctor_thread_construct()
           * A lock is held with nilfs_transaction_lock()
             nilfs_segctor_do_construct()
               nilfs_segctor_drop_written_files()
                 iput()
                   iput_final()
                     write_inode_now()
                       writeback_single_inode()
                         __writeback_single_inode()
                           do_writepages()
                             nilfs_writepage()
                               nilfs_construct_dsync_segment()
                                 nilfs_transaction_lock() --> deadlock

This can happen if commit 7ef3ff2fea ("nilfs2: fix deadlock of segment
constructor over I_SYNC flag") is applied and roll-forward recovery was
performed at mount time.  The roll-forward recovery can happen if datasync
write is done and the file system crashes immediately after that.  For
instance, we can reproduce the issue with the following steps:

 < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
 # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
 # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
 count=1 && reboot -nfh
 < the system will immediately reboot >
 # mount -t nilfs2 /dev/sdb1 /nilfs

The deadlock occurs because iput() can run segment constructor through
writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags.  The
above commit changed segment constructor so that it calls iput()
asynchronously for inodes with i_nlink == 0, but that change was
imperfect.

This fixes the another deadlock by deferring iput() in segment constructor
even for the case that mount is not finished, that is, for the case that
MS_ACTIVE flag is not set.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Yuxuan Shui <yshuiv7@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:00:59 +01:00
..
9p aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
adfs
affs
afs aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
autofs4 autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation 2015-03-18 13:22:32 +01:00
befs befs_readdir(): do not increment ->f_pos if filldir tells us to stop 2013-05-31 15:17:56 -04:00
bfs
btrfs Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. 2015-03-18 13:22:29 +01:00
cachefiles lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
ceph ceph: allow sync_read/write return partial successed size of read/write. 2014-01-09 12:24:25 -08:00
cifs CIFS: Fix SMB2 readdir error handling 2014-10-05 14:54:11 -07:00
coda lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
configfs configfs: fix race between dentry put and lookup 2013-11-29 11:11:53 -08:00
cramfs
debugfs debugfs: leave freeing a symlink body until inode eviction 2015-03-18 13:22:32 +01:00
devpts devpts: plug the memory leak in kill_sb 2013-12-04 10:55:49 -08:00
dlm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
ecryptfs eCryptfs: Remove buggy and unnecessary write in file name decode routine 2015-01-08 09:58:17 -08:00
efivarfs efivarfs: Never return ENOENT from firmware again 2013-05-13 20:12:10 +01:00
efs
exofs ore: Fix wrong math in allocation of per device BIO 2014-02-13 13:48:00 -08:00
exportfs
ext2 ext2: Fix oops in ext2_get_block() called from ext2_quota_write() 2014-12-16 09:09:43 -08:00
ext3 ext3: Don't check quota format when there are no quota files 2014-11-14 08:48:00 -08:00
ext4 ext4: prevent bugon on race between write/fcntl 2015-02-11 14:48:17 +08:00
f2fs f2fs updates for v3.10 2013-05-08 15:11:48 -07:00
fat fat: fix possible overflow for fat_clusters 2013-05-24 16:22:50 -07:00
freevxfs
fscache fs/fscache/stats.c: fix memory leak 2013-04-29 15:54:27 -07:00
fuse fuse: notify: don't move pages 2015-03-26 15:00:57 +01:00
gfs2 GFS2: Increase i_writecount during gfs2_setattr_chown 2014-01-25 08:27:11 -08:00
hfs hfs: avoid crash in hfs_bnode_create 2013-05-24 16:22:51 -07:00
hfsplus aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
hostfs hostfs: use kmalloc instead of kzalloc 2013-05-04 15:48:45 -04:00
hpfs hpfs: better test for errors 2013-07-13 11:42:26 -07:00
hppfs hppfs: get rid of ->fsync() 2013-04-29 15:41:42 -04:00
hugetlbfs cope with potentially long ->d_dname() output for shmem/hugetlb 2013-10-18 07:45:45 -07:00
isofs isofs: Fix unchecked printing of ER records 2015-01-08 09:58:15 -08:00
jbd Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2013-05-03 09:56:25 -07:00
jbd2 jbd2: free bh when descriptor block checksum fails 2014-11-14 08:47:57 -08:00
jffs2 jffs2: fix handling of corrupted summary length 2015-03-06 14:40:53 -08:00
jfs jfs: fix error path in ialloc 2013-11-13 12:05:31 +09:00
lockd LOCKD: Fix a race when initialising nlmsvc_timeout 2015-01-27 07:52:33 -08:00
logfs
minix
ncpfs ncpfs: return proper error from NCP_IOC_SETROOT ioctl 2015-01-08 09:58:17 -08:00
nfs NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args 2015-03-06 14:40:50 -08:00
nfs_common
nfsd nfsd4: fix xdr4 inclusion of escaped char 2015-01-16 06:59:02 -08:00
nilfs2 nilfs2: fix deadlock of segment constructor during recovery 2015-03-26 15:00:59 +01:00
nls
notify fsnotify: next_i is freed during fsnotify_unmount_inodes. 2015-01-27 07:52:33 -08:00
ntfs aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
ocfs2 ocfs2: fix journal commit deadlock 2015-01-16 06:59:00 -08:00
omfs
openpromfs
proc procfs: fix race between symlink removals and traversals 2015-03-18 13:22:32 +01:00
pstore pstore/ram: avoid atomic accesses for ioremapped regions 2015-02-05 22:35:40 -08:00
qnx4
qnx6 qnx6: qnx6_readdir() has a braino in pos calculation 2013-05-31 15:17:31 -04:00
quota quota: provide interface for readding allocated space into reserved space 2015-01-29 17:40:57 -08:00
ramfs
reiserfs reiserfs: call truncate_setsize under tailpack mutex 2014-07-06 18:54:15 -07:00
romfs romfs: fix nommu map length to keep inside filesystem 2013-04-29 09:17:57 +10:00
squashfs
sysfs sysfs: check if one entry has been removed before freeing 2013-04-05 15:35:52 -07:00
sysv sysv: Add forgotten superblock lock init for v7 fs 2013-10-05 07:13:09 -07:00
ubifs UBIFS: fix free log space calculation 2014-11-14 08:47:54 -08:00
udf udf: Verify symlink size before loading it 2015-01-08 09:58:17 -08:00
ufs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-04-30 09:36:50 -07:00
xfs xfs: set superblock buffer type correctly 2015-03-06 14:40:47 -08:00
aio.c aio: fix kernel memory disclosure in io_getevents() introduced in v3.10 2014-06-30 20:09:45 -07:00
anon_inodes.c
attr.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-16 13:42:52 -07:00
bad_inode.c
binfmt_aout.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
binfmt_elf_fdpic.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-05-02 10:16:16 -07:00
binfmt_elf.c x86, mm/ASLR: Fix stack randomization on 64-bit systems 2015-03-06 14:40:54 -08:00
binfmt_em86.c
binfmt_flat.c new helper: read_code() 2013-04-29 15:40:23 -04:00
binfmt_misc.c binfmt_misc: reuse string_unescape_inplace() 2013-04-30 17:04:03 -07:00
binfmt_script.c
binfmt_som.c
bio-integrity.c bio-integrity: Fix bio_integrity_verify segment start bug 2014-03-23 21:38:21 -07:00
bio.c block: Fix bio_copy_data() 2013-10-05 07:13:09 -07:00
block_dev.c writeback: Fix periodic writeback after fs mount 2013-07-28 16:29:40 -07:00
buffer.c vfs: fix data corruption when blocksize < pagesize for mmaped data 2014-11-14 08:47:54 -08:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c Removed unused typedef to avoid "unused local typedef" warnings. 2013-05-04 15:03:05 -04:00
compat.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
coredump.c coredump: fix the setting of PF_DUMPCORE 2014-07-31 12:53:50 -07:00
coredump.h
dcache.c vfs: fix bad hashing of dentries 2014-09-17 09:04:02 -07:00
dcookies.c fs/compat: fix lookup_dcookie() parameter handling 2014-02-13 13:48:00 -08:00
direct-io.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
drop_caches.c
eventfd.c
eventpoll.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
exec.c metag: Reduce maximum stack size to 256MB 2014-06-07 13:25:38 -07:00
fcntl.c
fhandle.c
file_table.c don't bother with {get,put}_write_access() on non-regular files 2014-05-30 21:52:12 -07:00
file.c fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem 2014-02-22 12:41:25 -08:00
filesystems.c
fs_struct.c
fs-writeback.c writeback: fix a subtle race condition in I_DIRTY clearing 2015-01-16 06:59:02 -08:00
generic_acl.c
inode.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-16 13:42:52 -07:00
internal.h splice: don't pass the address of ->f_pos to methods 2013-06-20 19:02:45 +04:00
ioctl.c
ioprio.c block: Fix computation of merged request priority 2014-11-21 09:22:53 -08:00
Kconfig efivarfs: Move to fs/efivarfs 2013-04-17 13:25:09 +01:00
Kconfig.binfmt fs: make binfmt support for #! scripts modular and removable 2013-04-30 17:04:04 -07:00
libfs.c
locks.c locks: allow __break_lease to sleep even when break_time is 0 2014-05-13 13:59:44 +02:00
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
mbcache.c
mount.h vfs: Is mounted should be testing mnt_ns for NULL or error. 2014-02-06 11:08:16 -08:00
mpage.c
namei.c vfs: fix bad hashing of dentries 2014-09-17 09:04:02 -07:00
namespace.c umount: Disallow unprivileged mount force 2015-01-08 09:58:16 -08:00
no-block.c
open.c don't bother with {get,put}_write_access() on non-regular files 2014-05-30 21:52:12 -07:00
pipe.c vfs: fix subtle use-after-free of pipe_inode_info 2013-12-11 22:36:26 -08:00
pnode.c vfs: Fix invalid ida_remove() call 2013-05-31 15:16:33 -04:00
pnode.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
posix_acl.c posix_acl: handle NULL ACL in posix_acl_equiv_mode 2014-06-07 13:25:33 -07:00
proc_namespace.c
read_write.c fs/compat: fix parameter handling for compat readv/writev syscalls 2014-02-13 13:48:00 -08:00
readdir.c
select.c
seq_file.c seq_file: always update file->f_pos in seq_lseek() 2013-11-13 12:05:34 +09:00
signalfd.c
splice.c fuse: fix pipe_buf_operations 2014-02-13 13:47:59 -08:00
stack.c
stat.c quota: provide interface for readding allocated space into reserved space 2015-01-29 17:40:57 -08:00
statfs.c vfs: allow O_PATH file descriptors for fstatfs() 2013-10-18 07:45:44 -07:00
super.c fs: Fix theoretical division by 0 in super_cache_scan(). 2014-11-14 08:47:54 -08:00
sync.c
timerfd.c
utimes.c
xattr_acl.c
xattr.c