linux/mm
Naohiro Aota 66f10f5d58 mm/swapfile.c: move inode_lock out of claim_swapfile
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -ebusy).  and, on the other error
cases, it does not lock the inode.

this inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

this commit fixes this issue by moving the inode_lock() and is_swapfile
check out of claim_swapfile().  the inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    warning: bad unlock balance detected!
    5.5.0-rc7+ #176 not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    cpu: 5 pid: 4294 comm: swapon not tainted 5.5.0-rc7-btrfs-zns+ #176
    hardware name: asus all series/h87-pro, bios 2102 07/29/2014
    call trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_syscall_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_syscall_64_after_hwframe+0x49/0xbe
    rip: 0033:0x7f15da0a0dc7

link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
fixes: 1638045c36 ("mm: set s_swapfile on blockdev swap devices")
signed-off-by: naohiro aota <naohiro.aota@wdc.com>
reviewed-by: andrew morton <akpm@linux-foundation.org>
reviewed-by: darrick j. wong <darrick.wong@oracle.com>
tested-by: qais youef <qais.yousef@arm.com>
cc: christoph hellwig <hch@infradead.org>
cc: <stable@vger.kernel.org>
2020-03-28 07:17:56 -07:00
..
kasan kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN 2018-08-17 16:20:30 -07:00
backing-dev.c blkcg: delay blkg destruction until after writeback has finished 2018-08-31 14:48:56 -06:00
balloon_compaction.c
bootmem.c docs/mm: bootmem: add overview documentation 2018-08-02 12:17:27 -06:00
cleancache.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
cma_debug.c mm/cma: remove unsupported gfp_mask parameter from cma_alloc() 2018-08-17 16:20:32 -07:00
cma.c mm/cma: remove unsupported gfp_mask parameter from cma_alloc() 2018-08-17 16:20:32 -07:00
cma.h
compaction.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
debug_page_ref.c
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
dmapool.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
early_ioremap.c
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
filemap.c vfs: don't allow writes to swap files 2019-12-02 15:03:46 -08:00
frame_vector.c
frontswap.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
gup_benchmark.c mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl 2018-10-05 16:32:04 -07:00
gup.c mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
highmem.c
hmm.c libnvdimm-for-4.19_dax-memory-failure 2018-08-25 18:43:59 -07:00
huge_memory.c mremap: properly flush TLB before releasing the page 2018-10-18 11:30:52 +02:00
hugetlb_cgroup.c mm: rename page_counter's count/limit into usage/max 2018-06-07 17:34:35 -07:00
hugetlb.c hugetlb: take PMD sharing into account when flushing tlb/caches 2018-10-05 16:32:04 -07:00
hwpoison-inject.c
init-mm.c mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids 2018-07-17 09:35:30 +02:00
internal.h mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
interval_tree.c
Kconfig mm: disable deferred struct page for 32-bit arches 2018-09-20 22:01:11 +02:00
Kconfig.debug mm: clarify CONFIG_PAGE_POISONING and usage 2018-08-22 10:52:44 -07:00
khugepaged.c mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
kmemleak-test.c
kmemleak.c kmemleak: always register debugfs file 2018-09-04 16:45:02 -07:00
ksm.c include/linux/compiler*.h: make compiler-*.h mutually exclusive 2018-08-22 17:31:34 -07:00
list_lru.c mm/list_lru: introduce list_lru_shrink_walk_irq() 2018-08-17 16:20:32 -07:00
maccess.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-05 16:32:05 -07:00
Makefile vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
memblock.c mm/memblock.c: replace u64 with phys_addr_t where appropriate 2018-08-17 16:20:30 -07:00
memcontrol.c mm: memcontrol: print proper OOM header when no eligible victim left 2018-09-04 16:45:02 -07:00
memfd.c alloc_file(): switch to passing O_... flags instead of FMODE_... mode 2018-07-12 10:02:57 -04:00
memory_hotplug.c mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported. 2018-09-04 16:45:02 -07:00
memory-failure.c libnvdimm-for-4.19_dax-memory-failure 2018-08-25 18:43:59 -07:00
memory.c vfs: don't allow writes to swap files 2019-12-02 15:03:46 -08:00
mempolicy.c mm: access zone->node via zone_to_nid() and zone_set_nid() 2018-08-22 10:52:45 -07:00
mempool.c mm/mempool.c: add missing parameter description 2018-08-22 10:52:44 -07:00
memtest.c
migrate.c Merge branch 'akpm' 2018-10-05 16:33:03 -07:00
mincore.c
mlock.c dax: remove VM_MIXEDMAP for fsdax and device dax 2018-08-17 16:20:27 -07:00
mm_init.c mm: access zone->node via zone_to_nid() and zone_set_nid() 2018-08-22 10:52:45 -07:00
mmap.c vfs: don't allow writes to swap files 2019-12-02 15:03:46 -08:00
mmu_context.c
mmu_notifier.c mm, oom: distinguish blockable mode for mmu notifiers 2018-08-22 10:52:44 -07:00
mmzone.c
mprotect.c x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings 2018-06-20 19:10:01 +02:00
mremap.c mremap: properly flush TLB before releasing the page 2018-10-18 11:30:52 +02:00
msync.c
nobootmem.c mm/memblock: add a name for memblock flags enumeration 2018-08-02 12:17:27 -06:00
nommu.c mm: provide a fallback for PAGE_KERNEL_EXEC for architectures 2018-08-17 16:20:29 -07:00
oom_kill.c mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm(). 2018-09-04 16:45:02 -07:00
page_alloc.c mm, sched/numa: Remove remaining traces of NUMA rate-limiting 2018-10-09 08:30:51 +02:00
page_counter.c memcg: introduce memory.min 2018-06-07 17:34:36 -07:00
page_ext.c mm/page_ext.c: constify lookup_page_ext() argument 2018-08-17 16:20:28 -07:00
page_idle.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
page_io.c swap,blkcg: issue swap io with the appropriate context 2018-07-09 09:07:54 -06:00
page_isolation.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
page_owner.c mm: use octal not symbolic permissions 2018-06-15 07:55:25 +09:00
page_poison.c mm/page_poison.c: make early_page_poison_param() __init 2018-04-05 21:36:26 -07:00
page_vma_mapped.c
page-writeback.c notifier: Remove notifier header file wherever not used 2018-08-30 12:56:40 +02:00
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h
percpu-km.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu-stats.c treewide: Use array_size() in vmalloc() 2018-06-12 16:19:22 -07:00
percpu-vm.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu.c percpu: stop leaking bitmap metadata blocks 2018-10-07 14:50:12 -07:00
pgtable-generic.c
process_vm_access.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
quicklist.c
readahead.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
rmap.c mm: migration: fix migration of huge PMD shared pages 2018-10-05 16:32:04 -07:00
rodata_test.c
shmem.c mm: shmem.c: Correctly annotate new inodes for lockdep 2018-09-20 22:01:11 +02:00
slab_common.c mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB 2018-08-17 16:20:30 -07:00
slab.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
slab.h mm: introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB 2018-08-17 16:20:30 -07:00
slob.c slab: __GFP_ZERO is incompatible with a constructor 2018-06-07 17:34:34 -07:00
slub.c notifier: Remove notifier header file wherever not used 2018-08-30 12:56:40 +02:00
sparse-vmemmap.c mm/sparse: delete old sparse_init and enable new one 2018-08-17 16:20:32 -07:00
sparse.c mm/sparse: delete old sparse_init and enable new one 2018-08-17 16:20:32 -07:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c treewide: kvzalloc() -> kvcalloc() 2018-06-12 16:19:22 -07:00
swap.c mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS 2018-05-22 06:59:39 -07:00
swapfile.c mm/swapfile.c: move inode_lock out of claim_swapfile 2020-03-28 07:17:56 -07:00
truncate.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
usercopy.c usercopy: Allow boot cmdline disabling of hardening 2018-07-04 08:04:52 -07:00
userfaultfd.c userfaultfd: prevent non-cooperative events vs mcopy_atomic races 2018-06-07 17:34:38 -07:00
util.c mm/util.c: improve kvfree() kerneldoc 2018-09-04 16:45:02 -07:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c mm: provide a fallback for PAGE_KERNEL_EXEC for architectures 2018-08-17 16:20:29 -07:00
vmpressure.c mm/vmpressure.c: convert to use match_string() helper 2018-06-07 17:34:36 -07:00
vmscan.c mm/vmscan.c: fix int overflow in callers of do_shrink_slab() 2018-10-05 16:32:05 -07:00
vmstat.c mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly 2018-10-05 16:32:05 -07:00
workingset.c mm/list_lru: introduce list_lru_shrink_walk_irq() 2018-08-17 16:20:32 -07:00
z3fold.c z3fold: fix reclaim lock-ups 2018-05-11 17:28:45 -07:00
zbud.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zpool.c mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc 2018-02-21 15:35:43 -08:00
zsmalloc.c mm/zsmalloc.c: make several functions and a struct static 2018-08-17 16:20:30 -07:00
zswap.c zswap: re-check zswap_is_full() after do zswap_shrink() 2018-07-26 19:38:03 -07:00