linux/mm
Yang Shi 377c60dd32 mm: gup: fix the fast GUP race against THP collapse
commit 70cbc3cc78 upstream.

Since general RCU GUP fast was introduced in commit 2667f50e8b ("mm:
introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
sufficient to handle concurrent GUP-fast in all cases, it only handles
traditional IPI-based GUP-fast correctly.  On architectures that send an
IPI broadcast on TLB flush, it works as expected.  But on the
architectures that do not use IPI to broadcast TLB flush, it may have the
below race:

   CPU A                                          CPU B
THP collapse                                     fast GUP
                                              gup_pmd_range() <-- see valid pmd
                                                  gup_pte_range() <-- work on pte
pmdp_collapse_flush() <-- clear pmd and flush
__collapse_huge_page_isolate()
    check page pinned <-- before GUP bump refcount
                                                      pin the page
                                                      check PTE <-- no change
__collapse_huge_page_copy()
    copy data to huge page
    ptep_clear()
install huge pmd for the huge page
                                                      return the stale page
discard the stale page

The race can be fixed by checking whether PMD is changed or not after
taking the page pin in fast GUP, just like what it does for PTE.  If the
PMD is changed it means there may be parallel THP collapse, so GUP should
back off.

Also update the stale comment about serializing against fast GUP in
khugepaged.

Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
Fixes: 2667f50e8b ("mm: introduce a general RCU get_user_pages_fast()")
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-15 07:55:51 +02:00
..
kasan kasan: prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time 2022-05-09 09:05:07 +02:00
backing-dev.c mm: bdi: initialize bdi_min_ratio when bdi is unregistered 2021-12-14 11:32:37 +01:00
balloon_compaction.c
cleancache.c
cma_debug.c
cma.c
cma.h
compaction.c mm, compaction: fast_find_migrateblock() should return pfn in the target zone 2022-06-09 10:21:23 +02:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-08 18:30:35 +01:00
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() 2021-02-10 09:29:21 +01:00
frame_vector.c
frontswap.c
gup_benchmark.c mm/gup_benchmark: take the mmap lock around GUP 2020-10-18 09:27:09 -07:00
gup.c mm: gup: fix the fast GUP race against THP collapse 2022-10-15 07:55:51 +02:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c mm/hmm.c: allow VM_MIXEDMAP to work with hmm_range_fault 2022-01-27 10:54:36 +01:00
huge_memory.c mm/huge_memory.c: use helper function migration_entry_to_page() 2022-08-31 17:15:16 +02:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hugetlb.c hugetlb: fix huge_pmd_unshare address update 2022-06-09 10:21:27 +02:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c mm/gup: prevent gup_fast from racing with COW during fork 2020-12-30 11:53:54 +01:00
internal.h mm/thp: fix vma_address() if virtual address below file offset 2021-06-30 08:47:27 -04:00
interval_tree.c
ioremap.c
Kconfig mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING 2020-12-06 10:19:07 -08:00
Kconfig.debug
khugepaged.c mm: gup: fix the fast GUP race against THP collapse 2022-10-15 07:55:51 +02:00
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 11:32:02 +02:00
ksm.c ksm: fix potential missing rmap_item for stable_node 2021-05-19 10:13:07 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c
madvise.c mm: fix madivse_pageout mishandling on non-LRU page 2022-10-05 10:38:40 +02:00
Makefile
mapping_dirty_helpers.c
memblock.c memblock: use kfree() to release kmalloced memblock regions 2022-03-02 11:42:57 +01:00
memcontrol.c mm/memcontrol: return 1 from cgroup.memory __setup() handler 2022-04-08 14:40:43 +02:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-08 19:09:36 +01:00
memory_hotplug.c mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() 2021-09-22 12:27:59 +02:00
memory-failure.c mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() 2021-12-29 12:26:05 +01:00
memory.c mm: split huge PUD on wp_huge_pud fallback 2022-07-21 21:19:59 +02:00
mempolicy.c mm/mempolicy: fix uninit-value in mpol_rebind_policy() 2022-07-29 17:19:23 +02:00
mempool.c
memremap.c mm: fix memory_failure() handling of dax-namespace metadata 2021-03-04 11:38:21 +01:00
memtest.c
migrate.c mm/migrate_device.c: flush TLB while holding PTL 2022-10-05 10:38:40 +02:00
mincore.c
mlock.c
mm_init.c
mmap.c mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() 2022-09-20 12:38:32 +02:00
mmu_gather.c
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-27 13:53:54 +02:00
mmzone.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 20:00:09 +02:00
mprotect.c mm: don't try to NUMA-migrate COW pages that have other uses 2022-02-23 12:00:57 +01:00
mremap.c mm/mremap: hold the rmap lock in write mode when moving page table entries. 2022-08-21 15:15:21 +02:00
msync.c
nommu.c mm: remove alloc_vm_area 2020-10-18 09:27:10 -07:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 13:53:54 +02:00
page_alloc.c mm: prevent page_frag_alloc() from corrupting the memory 2022-10-05 10:38:39 +02:00
page_counter.c
page_ext.c
page_idle.c
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-04-20 09:23:25 +02:00
page_isolation.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_owner.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_poison.c mm/page_poison.c: replace bool variable with static key 2020-10-16 11:11:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h
page_vma_mapped.c mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() 2021-06-30 08:47:29 -04:00
page-writeback.c mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-12 20:18:22 +01:00
pagewalk.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 11:11:38 +02:00
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-km.c
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-vm.c
percpu.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
pgalloc-track.h
pgtable-generic.c mm/thp: fix __split_huge_pmd_locked() on shmem migration entry 2021-06-30 08:47:26 -04:00
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 11:11:38 +02:00
readahead.c mm: use limited read-ahead to satisfy read 2020-10-17 13:49:08 -06:00
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:28:56 +02:00
rodata_test.c
shmem.c shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode 2022-01-27 10:53:44 +01:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h
slab_common.c mm/slub: fix redzoning for small allocations 2021-06-23 14:42:54 +02:00
slab.c mm/sl?b.c: remove ctor argument from kmem_cache_flags 2021-05-14 09:50:45 +02:00
slab.h mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag 2021-11-26 10:39:19 +01:00
slob.c
slub.c mm/slub: fix to return errno if kmalloc() fails 2022-09-28 11:10:28 +02:00
sparse-vmemmap.c
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-05-14 09:50:45 +02:00
swap_cgroup.c
swap_slots.c
swap_state.c mm: swap: get rid of livelock in swapin readahead 2022-03-23 09:13:27 +01:00
swap.c
swapfile.c mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare 2021-06-23 14:42:53 +02:00
truncate.c mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 2021-06-30 08:47:27 -04:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-08 14:40:43 +02:00
userfaultfd.c mm: userfaultfd: fix missing cache flush in mcopy_atomic_pte() and __mcopy_atomic() 2022-05-15 20:00:09 +02:00
util.c mm: Add kvrealloc() 2022-08-21 15:15:21 +02:00
vmacache.c
vmalloc.c mm/vmalloc.c: fix potential memory leak 2021-01-19 18:27:21 +01:00
vmpressure.c
vmscan.c mm,vmscan: fix divide by zero in get_scan_count 2021-09-18 13:40:36 +02:00
vmstat.c arm: remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL 2022-05-15 20:00:09 +02:00
workingset.c XArray updates for 5.9 2020-10-20 14:39:37 -07:00
z3fold.c mm/z3fold: use release_z3fold_page_locked() to release locked z3fold page 2021-07-14 16:56:51 +02:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:42:43 +02:00
zswap.c