linux/arch
Aneesh Kumar K.V 75358ea359 powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race
MADV_DONTNEED holds mmap_sem in read mode and that implies a
parallel page fault is possible and the kernel can end up with a level 1 PTE
entry (THP entry) converted to a level 0 PTE entry without flushing
the THP TLB entry.

Most architectures including POWER have issues with kernel instantiating a level
0 PTE entry while holding level 1 TLB entries.

The code sequence I am looking at is

down_read(mmap_sem)                         down_read(mmap_sem)

zap_pmd_range()
 zap_huge_pmd()
  pmd lock held
  pmd_cleared
  table details added to mmu_gather
  pmd_unlock()
                                         insert a level 0 PTE entry()

tlb_finish_mmu().

Fix this by forcing a tlb flush before releasing pmd lock if this is
not a fullmm invalidate. We can safely skip this invalidate for
task exit case (fullmm invalidate) because in that case we are sure
there can be no parallel fault handlers.

This do change the Qemu guest RAM del/unplug time as below

128 core, 496GB guest:

Without patch:
munmap start: timer = 196449 ms, PID=6681
munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms

With patch:
munmap start: timer = 196345 ms, PID=6879
munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-23-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:16 +10:00
..
alpha mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
arc mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS 2020-04-10 15:36:21 -07:00
arm xen: branch for v5.7-rc2 2020-04-17 10:35:17 -07:00
arm64 arm64: Delete the space separator in __emit_inst 2020-04-15 13:07:12 +01:00
c6x mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS 2020-04-10 15:36:21 -07:00
csky mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
h8300 Kbuild updates for v5.7 (2nd) 2020-04-11 09:46:12 -07:00
hexagon mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
ia64 mm/memory_hotplug: add pgprot_t to mhp_params 2020-04-10 15:36:21 -07:00
m68k m68k: Drop redundant generic-y += hardirq.h 2020-04-13 11:08:52 -07:00
microblaze mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
mips Kbuild updates for v5.7 (2nd) 2020-04-11 09:46:12 -07:00
nds32 mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
nios2 nios2 update for v5.7-rc1 2020-04-11 11:38:44 -07:00
openrisc mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
parisc mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
powerpc powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race 2020-05-05 21:20:16 +10:00
riscv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-04-16 14:52:29 -07:00
s390 mm: change pmdp_huge_get_and_clear_full take vm_area_struct as arg 2020-05-05 21:20:13 +10:00
sh Kbuild updates for v5.7 (2nd) 2020-04-11 09:46:12 -07:00
sparc mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
um mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
unicore32 mm/special: create generic fallbacks for pte_special() and pte_mkspecial() 2020-04-10 15:36:21 -07:00
x86 A set of fixes for x86 and objtool: 2020-04-19 11:58:32 -07:00
xtensa Merge branch 'akpm' (patches from Andrew) 2020-04-10 17:57:48 -07:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
Kconfig dma-mapping updates for 5.7 2020-04-04 10:12:47 -07:00