linux/include
Mel Gorman 3ea277194d mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

        CPU0                            CPU1
        ----                            ----
                                        user accesses memory using RW PTE
                                        [PTE now cached in TLB]
        try_to_unmap_one()
        ==> ptep_get_and_clear()
        ==> set_tlb_ubc_flush_pending()
                                        mprotect(addr, PROT_READ)
                                        ==> change_pte_range()
                                        ==> [ PTE non-present - no flush ]

                                        user writes using cached RW PTE
        ...

        try_to_unmap_flush()

The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.

For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access.  For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB.  However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.

Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption.  In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch.  Other users such as gup as a race with
reclaim looks just at PTEs.  huge page variants should be ok as they
don't race with reclaim.  mincore only looks at PTEs.  userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.

Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected.  His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.

[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>	[v4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
..
acpi ACPI: NUMA: add missing include in acpi_numa.h 2017-07-24 22:27:43 +02:00
asm-generic Merge branch 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 11:17:52 -07:00
clocksource
crypto crypto: engine - replace pr_xxx by dev_xxx 2017-06-19 14:19:54 +08:00
drm i915, amd and some core fixes + mediatek color support 2017-07-13 11:26:18 -07:00
dt-bindings Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-07-15 10:59:54 -07:00
keys
kvm KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2 2017-06-15 09:44:59 +01:00
linux mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries 2017-08-02 16:34:46 -07:00
math-emu
media main drm pull for v4.13 2017-07-09 18:48:37 -07:00
memory
misc cxl: Export library to support IBM XSL 2017-07-03 23:07:03 +10:00
net udp6: fix socket leak on early demux 2017-07-29 14:19:03 -07:00
pcmcia
ras trace, ras: add ARM processor error trace event 2017-06-22 18:22:05 +01:00
rdma IB/cma: Fix reference count leak when no ipv4 addresses are set 2017-07-20 11:24:13 -04:00
rxrpc rxrpc: Implement service upgrade 2017-06-05 14:30:49 +01:00
scsi Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2017-07-13 14:27:32 -07:00
soc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-07-05 12:31:59 -07:00
sound ASoC: Updates for v4.13 2017-07-03 19:51:42 +02:00
target iscsi-target: Add login_keys_workaround attribute for non RFC initiators 2017-07-11 10:56:39 -07:00
trace mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic 2017-07-12 16:26:03 -07:00
uapi TTY/Serial fixes for 4.13-rc2 2017-07-22 09:00:24 -07:00
video imx-drm: cleanups and YUV 4:2:0 memory read/write reduction support 2017-06-16 10:05:38 +10:00
xen xen/balloon: don't online new memory initially 2017-07-23 08:13:18 +02:00