Linux kernel source tree
Go to file
Linus Torvalds 3b2018f9c9 pipe: do FASYNC notifications for every pipe IO, not just state changes
commit fe67f4dd8d upstream.

It turns out that the SIGIO/FASYNC situation is almost exactly the same
as the EPOLLET case was: user space really wants to be notified after
every operation.

Now, in a perfect world it should be sufficient to only notify user
space on "state transitions" when the IO state changes (ie when a pipe
goes from unreadable to readable, or from unwritable to writable).  User
space should then do as much as possible - fully emptying the buffer or
what not - and we'll notify it again the next time the state changes.

But as with EPOLLET, we have at least one case (stress-ng) where the
kernel sent SIGIO due to the pipe being marked for asynchronous
notification, but the user space signal handler then didn't actually
necessarily read it all before returning (it read more than what was
written, but since there could be multiple writes, it could leave data
pending).

The user space code then expected to get another SIGIO for subsequent
writes - even though the pipe had been readable the whole time - and
would only then read more.

This is arguably a user space bug - and Colin King already fixed the
stress-ng code in question - but the kernel regression rules are clear:
it doesn't matter if kernel people think that user space did something
silly and wrong.  What matters is that it used to work.

So if user space depends on specific historical kernel behavior, it's a
regression when that behavior changes.  It's on us: we were silly to
have that non-optimal historical behavior, and our old kernel behavior
was what user space was tested against.

Because of how the FASYNC notification was tied to wakeup behavior, this
was first broken by commits f467a6a664 and 1b6b26ae70 ("pipe: fix
and clarify pipe read/write wakeup logic"), but at the time it seems
nobody noticed.  Probably because the stress-ng problem case ends up
being timing-dependent too.

It was then unwittingly fixed by commit 3a34b13a88 ("pipe: make pipe
writes always wake up readers") only to be broken again when by commit
3b844826b6 ("pipe: avoid unnecessary EPOLLET wakeups under normal
loads").

And at that point the kernel test robot noticed the performance
refression in the stress-ng.sigio.ops_per_sec case.  So the "Fixes" tag
below is somewhat ad hoc, but it matches when the issue was noticed.

Fix it for good (knock wood) by simply making the kill_fasync() case
separate from the wakeup case.  FASYNC is quite rare, and we clearly
shouldn't even try to use the "avoid unnecessary wakeups" logic for it.

Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
Fixes: 3b844826b6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03 10:09:28 +02:00
arch perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32 2021-09-03 10:09:26 +02:00
block blk-mq: don't grab rq's refcount in blk_mq_check_expired() 2021-09-03 10:09:27 +02:00
certs certs: add 'x509_revocation_list' to gitignore 2021-07-20 16:05:35 +02:00
crypto crypto: sm2 - fix a memory leak in sm2 2021-07-14 16:56:06 +02:00
Documentation dt-bindings: sifive-l2-cache: Fix 'select' matching 2021-09-03 10:09:26 +02:00
drivers drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences 2021-09-03 10:09:28 +02:00
fs pipe: do FASYNC notifications for every pipe IO, not just state changes 2021-09-03 10:09:28 +02:00
include pipe: avoid unnecessary EPOLLET wakeups under normal loads 2021-09-03 10:09:28 +02:00
init sched/core: Initialize the idle task with preemption disabled 2021-07-14 16:55:50 +02:00
ipc ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry 2021-05-26 12:06:54 +02:00
kernel ucounts: Increase ucounts reference counter before the security hook 2021-09-03 10:09:24 +02:00
lib once: Fix panic when module unload 2021-09-03 10:09:21 +02:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim 2021-08-26 08:35:57 -04:00
net net/rds: dma_map_sg is entitled to merge entries 2021-09-03 10:09:28 +02:00
samples samples/bpf: Fix the error return code of xdp_redirect's main() 2021-07-14 16:56:23 +02:00
scripts scripts/tracing: fix the bug that can't parse raw_trace_func 2021-08-12 13:22:12 +02:00
security bpf: Add lockdown check for probe_write_user helper 2021-08-15 14:00:25 +02:00
sound ASoC: component: Remove misplaced prefix handling in pin control functions 2021-09-03 10:09:21 +02:00
tools tools/virtio: fix build 2021-09-03 10:09:27 +02:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt KVM: Do not leak memory for duplicate debugfs directories 2021-08-12 13:22:17 +02:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS f2fs: move ioctl interface definitions to separated file 2021-05-19 10:13:00 +02:00
Makefile Linux 5.10.61 2021-08-26 08:51:21 -04:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.