Linux kernel source tree
Go to file
Thomas Gleixner 7874eee013 futex: Prevent exit livelock
commit 3ef240eaff upstream

Oleg provided the following test case:

int main(void)
{
	struct sched_param sp = {};

	sp.sched_priority = 2;
	assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);

	int lock = vfork();
	if (!lock) {
		sp.sched_priority = 1;
		assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);
		_exit(0);
	}

	syscall(__NR_futex, &lock, FUTEX_LOCK_PI, 0,0,0);
	return 0;
}

This creates an unkillable RT process spinning in futex_lock_pi() on a UP
machine or if the process is affine to a single CPU. The reason is:

 parent	    	    			child

  set FIFO prio 2

  vfork()			->	set FIFO prio 1
   implies wait_for_child()	 	sched_setscheduler(...)
 			   		exit()
					do_exit()
 					....
					mm_release()
					  tsk->futex_state = FUTEX_STATE_EXITING;
					  exit_futex(); (NOOP in this case)
					  complete() --> wakes parent
  sys_futex()
    loop infinite because
    tsk->futex_state == FUTEX_STATE_EXITING

The same problem can happen just by regular preemption as well:

  task holds futex
  ...
  do_exit()
    tsk->futex_state = FUTEX_STATE_EXITING;

  --> preemption (unrelated wakeup of some other higher prio task, e.g. timer)

  switch_to(other_task)

  return to user
  sys_futex()
	loop infinite as above

Just for the fun of it the futex exit cleanup could trigger the wakeup
itself before the task sets its futex state to DEAD.

To cure this, the handling of the exiting owner is changed so:

   - A refcount is held on the task

   - The task pointer is stored in a caller visible location

   - The caller drops all locks (hash bucket, mmap_sem) and blocks
     on task::futex_exit_mutex. When the mutex is acquired then
     the exiting task has completed the cleanup and the state
     is consistent and can be reevaluated.

This is not a pretty solution, but there is no choice other than returning
an error code to user space, which would break the state consistency
guarantee and open another can of problems including regressions.

For stable backports the preparatory commits ac31c7ff86 .. ba31c1a485
are required as well, but for anything older than 5.3.y the backports are
going to be provided when this hits mainline as the other dependencies for
those kernels are definitely not stable material.

Fixes: 778e9a9c3e ("pi-futex: fix exit races and locking problems")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stable Team <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191106224557.041676471@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-30 13:32:12 +01:00
arch sh: dma: fix kconfig dependency for G2_DMA 2021-01-27 11:05:42 +01:00
block bfq: Fix computation of shallow depth 2021-01-19 18:22:36 +01:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto crypto: ecdh - avoid buffer overflow in ecdh_set_secret() 2021-01-12 20:10:20 +01:00
Documentation USB: UAS: introduce a quirk to set no_write_same 2020-12-30 11:25:42 +01:00
drivers gpio: mvebu: fix pwm .get_state period calculation 2021-01-30 13:32:11 +01:00
firmware Fix built-in early-load Intel microcode alignment 2020-01-23 08:21:29 +01:00
fs exit/exec: Seperate mm_release() 2021-01-30 13:32:11 +01:00
include futex: Add mutex around futex exit 2021-01-30 13:32:12 +01:00
init printk: reduce LOG_BUF_SHIFT range for H8300 2020-11-05 11:08:41 +01:00
ipc ipc/util.c: sysvipc_find_ipc() incorrectly updates position index 2020-05-20 08:18:40 +02:00
kernel futex: Prevent exit livelock 2021-01-30 13:32:12 +01:00
lib lib/genalloc: fix the overflow when size is too big 2021-01-12 20:10:16 +01:00
LICENSES LICENSES: Remove CC-BY-SA-4.0 license text 2018-10-18 11:28:50 +02:00
mm Revert "mm/slub: fix a memory leak in sysfs_slab_add()" 2021-01-30 13:32:11 +01:00
net net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled 2021-01-27 11:05:44 +01:00
samples samples: bpf: Fix lwt_len_hist reusing previous BPF map 2020-12-30 11:25:57 +01:00
scripts depmod: handle the case of /sbin/depmod without /sbin in PATH 2021-01-12 20:10:16 +01:00
security dump_common_audit_data(): fix racy accesses to ->d_name 2021-01-19 18:22:37 +01:00
sound ASoC: Intel: haswell: Add missing pm_ops 2021-01-27 11:05:36 +01:00
tools selftests: net: fib_tests: remove duplicate log test 2021-01-27 11:05:39 +01:00
usr initramfs: restore default compression behavior 2020-04-13 10:44:59 +02:00
virt KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace 2020-12-02 08:48:07 +01:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap libnvdimm-for-4.19_misc 2018-08-25 18:13:10 -07:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Documentation/llvm: add documentation on building w/ Clang/LLVM 2020-09-26 18:01:31 +02:00
Makefile Linux 4.19.171 2021-01-27 11:05:44 +01:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.