linux/mm
Aneesh Kumar K.V e37cc8a09f UPSTREAM: mm/mremap: hold the rmap lock in write mode when moving page table entries.
To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks().  The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables().  The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken.  This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.

As explained in commit eb66ae0308 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page.  The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock.  This can result in stale TLB entries as show
below.

This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::

Optmized PMD move

    CPU 1                           CPU 2                                   CPU 3

    mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one

    mmap_write_lock_killable()

                                    addr = old_addr
                                    lock(pte_ptl)
    lock(pmd_ptl)
    pmd = *old_pmd
    pmd_clear(old_pmd)
    flush_tlb_range(old_addr)

    *new_pmd = pmd
                                                                            *new_addr = 10; and fills
                                                                            TLB with new addr
                                                                            and old pfn

    unlock(pmd_ptl)
                                    ptep_clear_flush()
                                    old pfn is free.
                                                                            Stale TLB entry

Optimized PUD move also suffers from a similar race.  Both the above race
condition can be fixed if we force mremap path to take rmap lock.

Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
Fixes: 2c91bd4a4e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd34018 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 97113eb39f)

Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I5b7235e982ea2efdc155018271fbaf2711fac4c1
2021-07-15 18:39:14 +00:00
..
kasan BACKPORT: FROMLIST: kasan: add memzero int for unaligned size at DEBUG 2021-07-15 16:49:20 +00:00
kfence FROMGIT: kfence: unconditionally use unbound work queue 2021-06-03 20:52:39 +00:00
backing-dev.c
balloon_compaction.c
cleancache.c
cma_debug.c FROMLIST: mm: cma: introduce gfp flag in cma_alloc instead of no_warn 2021-01-25 12:21:02 -08:00
cma_sysfs.c ANDROID: make cma_sysfs experimental 2021-03-25 19:20:18 +00:00
cma.c ANDROID: mm: cma do not sleep for __GFP_NORETRY 2021-07-14 11:54:49 -07:00
cma.h ANDROID: GKI: add OEM data in cma struct 2021-06-04 11:15:16 -07:00
compaction.c FROMLIST: mm: compaction: fix wakeup logic of proactive compaction 2021-06-17 14:16:29 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() 2021-06-10 13:39:26 +02:00
debug.c ANDROID: mm: introduce page_pinner 2021-04-30 09:13:34 -07:00
dmapool.c mm/dmapool.c: replace hard coded function name with __func__ 2020-10-13 18:38:32 -07:00
early_ioremap.c
fadvise.c mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED 2020-10-13 18:38:29 -07:00
failslab.c
filemap.c ANDROID: mm: Add hooks to filemap_fault for oem's optimization 2021-06-17 14:16:47 -07:00
frame_vector.c
frontswap.c
gup_benchmark.c mm/gup_benchmark: take the mmap lock around GUP 2020-10-18 09:27:09 -07:00
gup.c Merge 5.10.38 into android12-5.10 2021-05-20 15:35:25 +02:00
highmem.c mm/highmem.c: clean up endif comments 2020-10-16 11:11:18 -07:00
hmm.c
huge_memory.c Merge 5.10.27 into android12-5.10 2021-04-02 15:25:50 +02:00
hugetlb_cgroup.c hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings 2021-03-30 14:31:54 +02:00
hugetlb.c Merge 5.10.43 into android12-5.10 2021-06-12 14:48:14 +02:00
hwpoison-inject.c mm,hwpoison-inject: don't pin for hwpoison_filter 2020-10-16 11:11:16 -07:00
init-mm.c FROMLIST: mm: protect mm_rb tree with a rwlock 2021-01-22 18:00:57 +00:00
internal.h FROMLIST: mm: provide speculative fault infrastructure 2021-01-22 18:01:16 +00:00
interval_tree.c
ioremap.c
Kconfig FROMLIST: mm: cma: support sysfs 2021-03-25 19:20:09 +00:00
Kconfig.debug ANDROID: mm: introduce page_pinner 2021-04-30 09:13:34 -07:00
khugepaged.c Merge 5.10.38 into android12-5.10 2021-05-20 15:35:25 +02:00
kmemleak.c UPSTREAM: kfence: make compatible with kmemleak 2021-04-29 08:13:57 +02:00
ksm.c Merge 5.10.38 into android12-5.10 2021-05-20 15:35:25 +02:00
list_lru.c mm: list_lru: set shrinker map bit when child nr_items is not zero 2020-12-06 10:19:07 -08:00
maccess.c
madvise.c This is the 5.10.24 stable release 2021-03-19 09:42:56 +01:00
Makefile ANDROID: mm: introduce page_pinner 2021-04-30 09:13:34 -07:00
mapping_dirty_helpers.c
memblock.c UPSTREAM: mm: memblock: add more debug logs 2021-05-21 09:08:08 +05:30
memcontrol.c FROMLIST: mm, memcg: inline swap-related functions to improve disabled memcg config 2021-07-12 18:34:30 -07:00
memfd.c
memory_hotplug.c ANDROID: mm: cma: skip problematic pageblock 2021-07-14 11:54:49 -07:00
memory-failure.c mm/memory-failure: unnecessary amount of unmapping 2021-05-14 09:50:45 +02:00
memory.c ANDROID: mm: bail out tlb free batching on page zapping when cma is going on 2021-07-14 11:54:49 -07:00
mempolicy.c FROMLIST: mm: replace migrate_[prep|finish] with lru_cache_[disable|enable] 2021-03-23 04:05:24 +00:00
mempool.c FROMGIT: kasan: use separate (un)poison implementation for integrated init 2021-06-17 14:39:37 -07:00
memremap.c mm: fix memory_failure() handling of dax-namespace metadata 2021-03-04 11:38:21 +01:00
memtest.c
migrate.c Merge 5.10.38 into android12-5.10 2021-05-20 15:35:25 +02:00
mincore.c mm: factor find_get_incore_page out of mincore_page 2020-10-13 18:38:29 -07:00
mlock.c ANDROID: mm: page_pinner: unattribute follow_page in munlock_vma_pages_range 2021-04-30 09:13:35 -07:00
mm_init.c
mmap.c ANDROID: android: export kernel function vm_unmapped_area 2021-07-08 22:12:00 +00:00
mmu_gather.c
mmu_notifier.c mm/mmu_notifiers: ensure range_end() is paired with range_start() 2021-03-30 14:32:06 +02:00
mmzone.c ANDROID: mm: export zone_watermark_ok 2021-02-25 19:36:38 +00:00
mprotect.c FROMGIT: mm: improve mprotect(R|W) efficiency on pages referenced once 2021-06-15 19:33:15 +00:00
mremap.c UPSTREAM: mm/mremap: hold the rmap lock in write mode when moving page table entries. 2021-07-15 18:39:14 +00:00
msync.c
nommu.c ANDROID: mm: allow vmas with vm_ops to be speculatively handled 2021-04-23 18:42:39 -07:00
oom_kill.c ANDROID: signal: Add vendor hook for memory reaping 2021-06-03 20:59:15 +00:00
OWNERS ANDROID: Add OWNERS files referring to the respective android-mainline OWNERS 2021-04-03 14:11:30 +00:00
page_alloc.c ANDROID: mm: cma: skip problematic pageblock 2021-07-14 11:54:49 -07:00
page_counter.c mm/page_counter: correct the obsolete func name in the comment of page_counter_try_charge() 2020-10-13 18:38:30 -07:00
page_ext.c ANDROID: mm: introduce page_pinner 2021-04-30 09:13:34 -07:00
page_idle.c
page_io.c UPSTREAM: mm/page_io: use pr_alert_ratelimited for swap read/write errors 2021-03-30 18:44:11 +00:00
page_isolation.c ANDROID: mm: cma: skip problematic pageblock 2021-07-14 11:54:49 -07:00
page_owner.c ANDROID: mm: Make page_owner_enabled global 2021-04-01 00:09:00 +00:00
page_pinner.c ANDROID: mm: page_pinner: use EXPORT_SYMBOL_GPL 2021-07-14 03:38:32 +00:00
page_poison.c FROMGIT: mm, page_poison: remove CONFIG_PAGE_POISONING_NO_SANITY 2021-03-24 15:09:17 -07:00
page_reporting.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
page_reporting.h
page_vma_mapped.c
page-writeback.c ANDROID: vendor_hooks: add hook to balance_dirty_pages() 2021-05-20 19:38:42 +00:00
pagewalk.c
percpu-internal.h percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-km.c
percpu-stats.c percpu: make pcpu_nr_empty_pop_pages per chunk type 2021-04-14 08:42:03 +02:00
percpu-vm.c
percpu.c Merge 5.10.30 into android12-5.10 2021-04-15 14:23:41 +02:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c mm/process_vm_access.c: include compat.h 2021-01-19 18:27:21 +01:00
ptdump.c This is the 5.10.32 stable release 2021-04-22 11:12:08 +02:00
readahead.c ANDROID: mm: Create vendor hooks to control ZONE_MOVABLE allocations 2020-12-01 18:07:54 +00:00
rmap.c FROMLIST: mm: introduce __page_add_new_anon_rmap() 2021-01-22 18:00:48 +00:00
rodata_test.c
shmem.c ANDROID: mm: provision to add shmem pages to inactive file lru head 2021-07-14 20:52:01 -07:00
shuffle.c mm: rename page_order() to buddy_order() 2020-10-16 11:11:19 -07:00
shuffle.h
slab_common.c FROMGIT: mm: slub: move sysfs slab alloc/free interfaces to debugfs 2021-06-15 18:11:57 +00:00
slab.c Merge 5.10.37 into android12-5.10 2021-05-15 09:28:55 +02:00
slab.h BACKPORT: FROMLIST: mm: move helper to check slub_debug_enabled 2021-07-15 16:49:09 +00:00
slob.c
slub.c ANDROID: Fix lost track action type in save_track_hash 2021-07-01 00:52:06 +00:00
sparse-vmemmap.c
sparse.c mm/sparse: add the missing sparse_buffer_fini() in error branch 2021-05-14 09:50:45 +02:00
swap_cgroup.c
swap_slots.c mm/swap_slots.c: remove always zero and unused return value of enable_swap_slots_cache() 2020-10-13 18:38:30 -07:00
swap_state.c FROMLIST: mm: protect VMA modifications using VMA sequence count 2021-01-22 17:59:47 +00:00
swap.c ANDROID: mm: provision to add shmem pages to inactive file lru head 2021-07-14 20:52:01 -07:00
swapfile.c FROMLIST: mm, memcg: inline swap-related functions to improve disabled memcg config 2021-07-12 18:34:30 -07:00
truncate.c mm/truncate.c: make __invalidate_mapping_pages() static 2020-11-02 12:14:19 -08:00
usercopy.c
userfaultfd.c FROMGIT: userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte() 2021-06-04 19:13:10 +00:00
util.c ANDROID: android: export kernel function arch_mmap_rnd 2021-07-09 20:51:14 +00:00
vmacache.c
vmalloc.c ANDROID: vendor_hooks: add hooks for slab memory leak debugging 2021-05-21 13:17:08 -07:00
vmpressure.c FROMLIST: mm, memcg: add mem_cgroup_disabled checks in vmpressure and swap-related functions 2021-07-12 18:26:15 -07:00
vmscan.c ANDROID: Allow vendor module to reclaim a memcg 2021-07-12 18:54:56 +00:00
vmstat.c ANDROID: mm: allow vmas with vm_ops to be speculatively handled 2021-04-23 18:42:39 -07:00
workingset.c XArray updates for 5.9 2020-10-20 14:39:37 -07:00
z3fold.c z3fold: prevent reclaim/free race for headless pages 2021-03-30 14:31:54 +02:00
zbud.c mm/zbud: remove redundant initialization 2020-10-13 18:38:34 -07:00
zpool.c
zsmalloc.c This is the 5.10.21 stable release 2021-03-07 12:53:30 +01:00
zswap.c