linux/mm
Mike Kravetz a3ccc156f3 hugetlb: use same fault hash key for shared and private mappings
commit 1b426bac66 upstream.

hugetlb uses a fault mutex hash table to prevent page faults of the
same pages concurrently.  The key for shared and private mappings is
different.  Shared keys off address_space and file index.  Private keys
off mm and virtual address.  Consider a private mappings of a populated
hugetlbfs file.  A fault will map the page from the file and if needed
do a COW to map a writable page.

Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
pages.  It uses the address_space file index key.  However, private
mappings will use a different key and could race with this code to map
the file page.  This causes problems (BUG) for the page cache remove
code as it expects the page to be unmapped.  A sample stack is:

page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
kernel BUG at mm/filemap.c:169!
...
RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
...
Call Trace:
__delete_from_page_cache+0x39/0x220
delete_from_page_cache+0x45/0x70
remove_inode_hugepages+0x13c/0x380
? __add_to_page_cache_locked+0x162/0x380
hugetlbfs_fallocate+0x403/0x540
? _cond_resched+0x15/0x30
? __inode_security_revalidate+0x5d/0x70
? selinux_file_permission+0x100/0x130
vfs_fallocate+0x13f/0x270
ksys_fallocate+0x3c/0x80
__x64_sys_fallocate+0x1a/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9

There seems to be another potential COW issue/race with this approach
of different private and shared keys as noted in commit 8382d914eb
("mm, hugetlb: improve page-fault scalability").

Since every hugetlb mapping (even anon and private) is actually a file
mapping, just use the address_space index key for all mappings.  This
results in potentially more hash collisions.  However, this should not
be the common case.

Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
Fixes: b5cec28d36 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22 07:37:40 +02:00
..
kasan
backing-dev.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-03-05 17:58:50 +01:00
balloon_compaction.c
bootmem.c
cleancache.c
cma_debug.c
cma.c mm/cma.c: cma_declare_contiguous: correct err handling 2019-04-05 22:32:58 +02:00
cma.h
compaction.c
debug_page_ref.c
debug.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
dmapool.c
early_ioremap.c
fadvise.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
failslab.c
filemap.c
frame_vector.c
frontswap.c
gup_benchmark.c mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl 2018-10-05 16:32:04 -07:00
gup.c mm: prevent get_user_pages() from overflowing page refcount 2019-05-04 09:20:11 +02:00
highmem.c
hmm.c mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL 2019-01-13 09:51:04 +01:00
huge_memory.c mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses 2019-05-22 07:37:40 +02:00
hugetlb_cgroup.c
hugetlb.c hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
hwpoison-inject.c
init-mm.c
internal.h mm: Change return type int to vm_fault_t for fault handlers 2018-08-23 18:48:44 -07:00
interval_tree.c
Kconfig mm: disable deferred struct page for 32-bit arches 2018-09-20 22:01:11 +02:00
Kconfig.debug mm: clarify CONFIG_PAGE_POISONING and usage 2018-08-22 10:52:44 -07:00
khugepaged.c mm/khugepaged: collapse_shmem() do not crash on Compound 2018-12-05 19:31:57 +01:00
kmemleak-test.c
kmemleak.c mm/kmemleak.c: fix unused-function warning 2019-05-08 07:21:55 +02:00
ksm.c include/linux/compiler*.h: make compiler-*.h mutually exclusive 2018-08-22 17:31:34 -07:00
list_lru.c
maccess.c
madvise.c mm: madvise(MADV_DODUMP): allow hugetlbfs pages 2018-10-05 16:32:05 -07:00
Makefile vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
memblock.c
memcontrol.c mm: writeback: use exact memcg dirty counts 2019-04-17 08:38:51 +02:00
memfd.c
memory_hotplug.c mm/memory_hotplug.c: drop memory device reference after find_memory_block() 2019-05-16 19:41:25 +02:00
memory-failure.c mm: hwpoison: fix thp split handing in soft_offline_in_use_page() 2019-03-23 20:10:04 +01:00
memory.c mm/memory.c: fix modifying of page protection by insert_pfn() 2019-05-16 19:41:26 +02:00
mempolicy.c mm, mempolicy: fix uninit memory access 2019-04-05 22:32:58 +02:00
mempool.c mm/mempool.c: add missing parameter description 2018-08-22 10:52:44 -07:00
memtest.c
migrate.c mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate 2019-04-03 06:26:28 +02:00
mincore.c mm/mincore.c: make mincore() more conservative 2019-05-22 07:37:40 +02:00
mlock.c
mm_init.c mm: access zone->node via zone_to_nid() and zone_set_nid() 2018-08-22 10:52:45 -07:00
mmap.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-04-27 09:36:37 +02:00
mmu_context.c
mmu_notifier.c mm, oom: distinguish blockable mode for mmu notifiers 2018-08-22 10:52:44 -07:00
mmzone.c
mprotect.c
mremap.c mremap: properly flush TLB before releasing the page 2018-10-18 11:30:52 +02:00
msync.c
nobootmem.c
nommu.c
oom_kill.c mm,oom: don't kill global init via memory.oom.group 2019-04-05 22:32:58 +02:00
page_alloc.c page_poison: play nicely with KASAN 2019-04-05 22:32:59 +02:00
page_counter.c
page_ext.c mm/page_ext.c: fix an imbalance with kmemleak 2019-04-05 22:32:58 +02:00
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c page_poison: play nicely with KASAN 2019-04-05 22:32:59 +02:00
page_vma_mapped.c mm/rmap: map_pte() was not handling private ZONE_DEVICE page properly 2018-11-13 11:08:46 -08:00
page-writeback.c mm/page-writeback.c: don't break integrity writeback on ->writepage() error 2019-01-26 09:32:43 +01:00
pagewalk.c
percpu-internal.h
percpu-km.c percpu: convert spin_lock_irq to spin_lock_irqsave. 2019-02-12 19:47:12 +01:00
percpu-stats.c
percpu-vm.c
percpu.c percpu: stop printing kernel addresses 2019-04-27 09:36:40 +02:00
pgtable-generic.c
process_vm_access.c
quicklist.c
readahead.c vfs: implement readahead(2) using POSIX_FADV_WILLNEED 2018-08-30 20:01:32 +02:00
rmap.c mm/huge_memory: rename freeze_page() to unmap_page() 2018-12-05 19:31:57 +01:00
rodata_test.c
shmem.c tmpfs: fix uninitialized return value in shmem_link 2019-03-23 20:09:52 +01:00
slab_common.c mm: add support for kmem caches in DMA32 zone 2019-04-03 06:26:28 +02:00
slab.c slab: fix a crash by reading /proc/slab_allocators 2019-05-10 17:54:08 +02:00
slab.h mm: add support for kmem caches in DMA32 zone 2019-04-03 06:26:28 +02:00
slob.c
slub.c mm: add support for kmem caches in DMA32 zone 2019-04-03 06:26:28 +02:00
sparse-vmemmap.c
sparse.c mm/sparse: fix a bad comparison 2019-04-05 22:32:58 +02:00
swap_cgroup.c
swap_slots.c mm, swap, get_swap_pages: use entry_size instead of cluster in parameter 2018-08-22 10:52:44 -07:00
swap_state.c
swap.c mm: handle lru_add_drain_all for UP properly 2019-03-23 20:09:50 +01:00
swapfile.c mm, swap: bounds check swap_info array accesses to avoid NULL derefs 2019-04-05 22:32:58 +02:00
truncate.c mm: cleancache: fix corruption on missed inode invalidation 2018-12-05 19:32:13 +01:00
usercopy.c mm/usercopy.c: no check page span for stack objects 2019-01-16 22:04:33 +01:00
userfaultfd.c hugetlb: use same fault hash key for shared and private mappings 2019-05-22 07:37:40 +02:00
util.c mm: page_mapped: don't assume compound page is huge or THP 2019-01-16 22:04:36 +01:00
vmacache.c mm: get rid of vmacache_flush_all() entirely 2018-09-13 15:18:04 -10:00
vmalloc.c mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! 2019-04-05 22:32:58 +02:00
vmpressure.c
vmscan.c mm: fix inactive list balancing between NUMA nodes and cgroups 2019-05-16 19:41:23 +02:00
vmstat.c mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n 2019-04-27 09:36:40 +02:00
workingset.c
z3fold.c z3fold: fix possible reclaim races 2018-12-01 09:37:33 +01:00
zbud.c
zpool.c
zsmalloc.c
zswap.c