linux/include
Nico Pache 41ba681c63 oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
commit e4a38402c3 upstream.

The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which
can be targeted by the oom reaper.  This mapping is used to store the
futex robust list head; the kernel does not keep a copy of the robust
list and instead references a userspace address to maintain the
robustness during a process death.

A race can occur between exit_mm and the oom reaper that allows the oom
reaper to free the memory of the futex robust list before the exit path
has handled the futex death:

    CPU1                               CPU2
    --------------------------------------------------------------------
    page_fault
    do_exit "signal"
    wake_oom_reaper
                                        oom_reaper
                                        oom_reap_task_mm (invalidates mm)
    exit_mm
    exit_mm_release
    futex_exit_release
    futex_cleanup
    exit_robust_list
    get_user (EFAULT- can't access memory)

If the get_user EFAULT's, the kernel will be unable to recover the
waiters on the robust_list, leaving userspace mutexes hung indefinitely.

Delay the OOM reaper, allowing more time for the exit path to perform
the futex cleanup.

Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer

Based on a patch by Michal Hocko.

Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1]
Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com
Fixes: 2129258024 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Nico Pache <npache@redhat.com>
Co-developed-by: Joel Savitz <jsavitz@redhat.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Herton R. Krzesinski <herton@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27 14:38:58 +02:00
..
acpi ACPICA: actypes.h: Expand the ACPI_ACCESS_ definitions 2022-01-27 11:04:49 +01:00
asm-generic tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry 2022-04-20 09:34:16 +02:00
clocksource
crypto
drm drm/connector: Fix typo in documentation 2022-04-08 14:24:12 +02:00
dt-bindings linux-watchdog 5.15-rc1 tag 2021-09-07 13:52:46 -07:00
keys
kunit kunit: fix kernel-doc warnings due to mismatched arg names 2021-10-06 17:54:07 -06:00
kvm KVM: arm64: Fix PMU probe ordering 2021-09-20 12:43:34 +01:00
linux oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 14:38:58 +02:00
math-emu
media media: cec: fix a deadlock situation 2022-01-27 11:02:53 +01:00
memory memory: renesas-rpc-if: Correct QSPI data transfer in Manual mode 2021-11-18 19:16:01 +01:00
misc
net ipv6: make ip6_rt_gc_expire an atomic_t 2022-04-27 14:38:54 +02:00
pcmcia
ras
rdma RDMA/netlink: Add __maybe_unused to static inline in C file 2021-11-25 09:49:07 +01:00
scsi scsi: iscsi: Fix NOP handling during conn recovery 2022-04-27 14:38:56 +02:00
soc net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib 2021-10-12 17:35:18 -07:00
sound ALSA: core: Add snd_card_free_on_error() helper 2022-04-20 09:34:05 +02:00
target scsi: target: Fix ordered tag handling 2021-11-25 09:48:29 +01:00
trace SUNRPC: Fix the svc_deferred_event trace class 2022-04-20 09:34:09 +02:00
uapi bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide 2022-04-13 20:59:25 +02:00
vdso
video
xen xen/gnttab: fix gnttab_end_foreign_access() without page specified 2022-03-11 12:22:37 +01:00