Linux kernel source tree
Go to file
Vasily Averin f1e83db27a memcg: prohibit unconditional exceeding the limit of dying tasks
commit a4ebf1b6ca upstream.

Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit.  It is assumed that the amount of the memory charged by those
tasks is bound and most of the memory will get released while the task
is exiting.  This is resembling a heuristic for the global OOM situation
when tasks get access to memory reserves.  There is no global memory
shortage at the memcg level so the memcg heuristic is more relieved.

The above assumption is overly optimistic though.  E.g.  vmalloc can
scale to really large requests and the heuristic would allow that.  We
used to have an early break in the vmalloc allocator for killed tasks
but this has been reverted by commit b8c8a338f7 ("Revert "vmalloc:
back off when the current task is killed"").  There are likely other
similar code paths which do not check for fatal signals in an
allocation&charge loop.  Also there are some kernel objects charged to a
memcg which are not bound to a process life time.

It has been observed that it is not really hard to trigger these
bypasses and cause global OOM situation.

One potential way to address these runaways would be to limit the amount
of excess (similar to the global OOM with limited oom reserves).  This
is certainly possible but it is not really clear how much of an excess
is desirable and still protects from global OOMs as that would have to
consider the overall memcg configuration.

This patch is addressing the problem by removing the heuristic
altogether.  Bypass is only allowed for requests which either cannot
fail or where the failure is not desirable while excess should be still
limited (e.g.  atomic requests).  Implementation wise a killed or dying
task fails to charge if it has passed the OOM killer stage.  That should
give all forms of reclaim chance to restore the limit before the failure
(ENOMEM) and tell the caller to back off.

In addition, this patch renames should_force_charge() helper to
task_is_dying() because now its use is not associated witch forced
charging.

This patch depends on pagefault_out_of_memory() to not trigger
out_of_memory(), because then a memcg failure can unwind to VM_FAULT_OOM
and cause a global OOM killer.

Link: https://lkml.kernel.org/r/8f5cebbb-06da-4902-91f0-6566fc4b4203@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18 19:17:16 +01:00
arch KVM: x86: move guest_pv_has out of user_access section 2021-11-18 19:17:14 +01:00
block block: Hold invalidate_lock in BLKRESETZONE ioctl 2021-11-18 19:17:15 +01:00
certs certs: Add support for using elliptic curve keys for signing modules 2021-08-23 19:55:42 +03:00
crypto crypto: pcrypt - Delay write to padata->info 2021-11-18 19:16:44 +01:00
Documentation fscrypt: allow 256-bit master keys with AES-256-XTS 2021-11-18 19:16:11 +01:00
drivers dmaengine: bestcomm: fix system boot lockups 2021-11-18 19:17:16 +01:00
fs ksmbd: don't need 8byte alignment for request length in ksmbd_check_message 2021-11-18 19:17:15 +01:00
include net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE 2021-11-18 19:17:16 +01:00
init init: make unknown command line param message clearer 2021-11-18 19:17:11 +01:00
ipc ipc: remove memcg accounting for sops objects in do_semtimedop() 2021-09-14 10:22:11 -07:00
kernel posix-cpu-timers: Clear task::posix_cputimers_work in copy_process() 2021-11-18 19:17:14 +01:00
lib dyndbg: make dyndbg a known cli param 2021-11-18 19:16:52 +01:00
LICENSES LICENSES/dual/CC-BY-4.0: Git rid of "smart quotes" 2021-07-15 06:31:24 -06:00
mm memcg: prohibit unconditional exceeding the limit of dying tasks 2021-11-18 19:17:16 +01:00
net 9p/net: fix missing error check in p9_check_errors 2021-11-18 19:17:16 +01:00
samples samples/kretprobes: Fix return value if register_kretprobe() failed 2021-11-18 19:16:42 +01:00
scripts leaking_addresses: Always print a trailing newline 2021-11-18 19:16:16 +01:00
security apparmor: fix error check 2021-11-18 19:16:58 +01:00
sound ALSA: memalloc: Catch call with NULL snd_dma_buffer pointer 2021-11-18 19:17:08 +01:00
tools selftests/net: udpgso_bench_rx: fix port argument 2021-11-18 19:17:13 +01:00
usr .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
virt KVM: Remove tlbs_dirty 2021-09-23 11:01:12 -04:00
.clang-format clang-format: Update with the latest for_each macro list 2021-05-12 23:32:39 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap mailmap: add Andrej Shadura 2021-10-18 20:22:03 -10:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Daniel Drake to credits 2021-09-21 08:34:58 +03:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS drm fixes for 5.15 final 2021-10-28 12:17:01 -07:00
Makefile Linux 5.15.2 2021-11-12 15:05:52 +01:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.