Linux kernel source tree
Go to file
Ilya Lipnitskiy ec3e06e06f mm: fix race by making init_zero_pfn() early_initcall
commit e720e7d0e9 upstream.

There are code paths that rely on zero_pfn to be fully initialized
before core_initcall.  For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput.  If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:

  BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1

Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:

  1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
       kobject_uevent_env+0x7e4/0x7ec
       kset_register+0x68/0x88
       bus_register+0xdc/0x34c
       subsys_virtual_register+0x34/0x78
       wq_sysfs_init+0x1c/0x4c
       do_one_initcall+0x50/0x1a8
       kernel_init_freeable+0x230/0x2c8
       kernel_init+0x10/0x100
       ret_from_kernel_thread+0x14/0x1c

  2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
     kernel_execve asynchronously.

  3. Memory allocations in kernel_execve cause a page fault, bumping the
     MM reference counter:
       add_mm_counter_fast+0xb4/0xc0
       handle_mm_fault+0x6e4/0xea0
       __get_user_pages.part.78+0x190/0x37c
       __get_user_pages_remote+0x128/0x360
       get_arg_page+0x34/0xa0
       copy_string_kernel+0x194/0x2a4
       kernel_execve+0x11c/0x298
       call_usermodehelper_exec_async+0x114/0x194

  4. In case zero_pfn has not been initialized yet, zap_pte_range does
     not decrement the MM_ANONPAGES RSS counter and the BUG message is
     triggered shortly afterwards when __mmdrop checks the ref counters:
       __mmdrop+0x98/0x1d0
       free_bprm+0x44/0x118
       kernel_execve+0x160/0x1d8
       call_usermodehelper_exec_async+0x114/0x194
       ret_from_kernel_thread+0x14/0x1c

To avoid races such as described above, initialize init_zero_pfn at
early_initcall level.  Depending on the architecture, ZERO_PAGE is
either constant or gets initialized even earlier, at paging_init, so
there is no issue with initializing zero_pfn earlier.

Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com
Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Tested-by: 周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-07 15:00:10 +02:00
arch s390/vdso: fix tod_steering_delta type 2021-04-07 15:00:10 +02:00
block block: recalculate segment count for multi-segment discards correctly 2021-03-30 14:32:06 +02:00
certs certs: Fix blacklist flag type confusion 2021-03-04 11:37:59 +01:00
crypto crypto: mips/poly1305 - enable for all MIPS processors 2021-03-17 17:06:10 +01:00
Documentation KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish 2021-03-30 14:31:53 +02:00
drivers PM: runtime: Fix ordering in pm_runtime_get_suppliers() 2021-04-07 15:00:10 +02:00
fs io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL 2021-04-07 15:00:06 +02:00
include ACPI: tables: x86: Reserve memory occupied by ACPI tables 2021-04-07 15:00:08 +02:00
init kgdb: fix to kill breakpoints on initmem after boot 2021-03-04 11:38:46 +01:00
ipc ipc: adjust proc_ipc_sem_dointvec definition to match prototype 2020-09-05 12:14:29 -07:00
kernel tracing: Fix stack trace event size 2021-04-07 15:00:10 +02:00
lib kasan: fix memory corruption in kasan_bitops_tags test 2021-03-17 17:06:25 +01:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm: fix race by making init_zero_pfn() early_initcall 2021-04-07 15:00:10 +02:00
net bpf: Remove MTU check in __bpf_skb_max_len 2021-04-07 15:00:08 +02:00
samples samples, bpf: Add missing munmap in xdpsock 2021-03-17 17:06:12 +01:00
scripts kbuild: dummy-tools: fix inverted tests for gcc 2021-03-30 14:31:50 +02:00
security integrity: double check iint_cache was initialized 2021-03-30 14:31:55 +02:00
sound ALSA: hda/realtek: fix mute/micmute LEDs for HP 640 G8 2021-04-07 15:00:09 +02:00
tools flow_dissector: fix TTL and TOS dissection on IPv4 fragments 2021-04-07 15:00:07 +02:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped() 2021-02-26 10:13:01 +01:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: docs: ignore sphinx_*/ directories 2020-09-10 10:44:31 -06:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: move the staging subsystem to lists.linux.dev 2021-03-25 09:04:18 +01:00
Makefile Linux 5.10.27 2021-03-30 14:32:09 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.