Linux kernel source tree
Go to file
Yang Shi 377c60dd32 mm: gup: fix the fast GUP race against THP collapse
commit 70cbc3cc78 upstream.

Since general RCU GUP fast was introduced in commit 2667f50e8b ("mm:
introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
sufficient to handle concurrent GUP-fast in all cases, it only handles
traditional IPI-based GUP-fast correctly.  On architectures that send an
IPI broadcast on TLB flush, it works as expected.  But on the
architectures that do not use IPI to broadcast TLB flush, it may have the
below race:

   CPU A                                          CPU B
THP collapse                                     fast GUP
                                              gup_pmd_range() <-- see valid pmd
                                                  gup_pte_range() <-- work on pte
pmdp_collapse_flush() <-- clear pmd and flush
__collapse_huge_page_isolate()
    check page pinned <-- before GUP bump refcount
                                                      pin the page
                                                      check PTE <-- no change
__collapse_huge_page_copy()
    copy data to huge page
    ptep_clear()
install huge pmd for the huge page
                                                      return the stale page
discard the stale page

The race can be fixed by checking whether PMD is changed or not after
taking the page pin in fast GUP, just like what it does for PTE.  If the
PMD is changed it means there may be parallel THP collapse, so GUP should
back off.

Also update the stale comment about serializing against fast GUP in
khugepaged.

Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
Fixes: 2667f50e8b ("mm: introduce a general RCU get_user_pages_fast()")
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-15 07:55:51 +02:00
arch x86/alternative: Fix race in try_get_desc() 2022-10-05 10:38:43 +02:00
block blk-mq: fix io hung due to missing commit_rqs 2022-08-31 17:15:24 +02:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:13:17 +02:00
crypto KEYS: asymmetric: enforce SM2 signature use pkey algo 2022-08-21 15:16:22 +02:00
Documentation docs: update mediator information in CoC docs 2022-10-15 07:55:50 +02:00
drivers clk: iproc: Do not rely on node name for correct PLL setup 2022-10-05 10:38:43 +02:00
fs ceph: don't truncate file in atomic_open 2022-10-15 07:55:50 +02:00
include xsk: Inherit need_wakeup flag for shared sockets 2022-10-15 07:55:51 +02:00
init Kconfig: Add option for asm goto w/ tied outputs to workaround clang-13 bug 2022-06-09 10:21:25 +02:00
ipc ipc/mqueue: use get_tree_nodev() in mqueue_get_tree() 2022-06-09 10:21:17 +02:00
kernel swiotlb: max mapping size takes min align mask into account 2022-10-05 10:38:40 +02:00
lib lib/vdso: Mark do_hres_timens() and do_coarse_timens() __always_inline() 2022-09-05 10:28:58 +02:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm: gup: fix the fast GUP race against THP collapse 2022-10-15 07:55:51 +02:00
net xsk: Inherit need_wakeup flag for shared sockets 2022-10-15 07:55:51 +02:00
samples x86: Prepare inline-asm for straight-line-speculation 2022-07-25 11:26:29 +02:00
scripts Makefile.extrawarn: Move -Wcast-function-type-strict to W=1 2022-10-15 07:55:50 +02:00
security apparmor: Fix memleak in aa_simple_write_to_buffer() 2022-08-25 11:37:53 +02:00
sound ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC 2022-10-15 07:55:51 +02:00
tools perf tools: Fixup get_current_dir_name() compilation 2022-10-15 07:55:50 +02:00
usr usr/include/Makefile: add linux/nfc.h to the compile-test coverage 2022-02-01 17:25:48 +01:00
virt KVM: SEV: add cache flush to solve SEV cache incoherency issues 2022-09-28 11:10:28 +02:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: add Amir as xfs maintainer for 5.10.y 2022-07-02 16:39:22 +02:00
Makefile Linux 5.10.147 2022-10-05 10:38:43 +02:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.