Linux kernel source tree
Go to file
Baolin Wang 43e027e414 mm: memory: extend finish_fault() to support large folio
Patch series "add mTHP support for anonymous shmem", v5.

Anonymous pages have already been supported for multi-size (mTHP)
allocation through commit 19eaf44954, that can allow THP to be
configured through the sysfs interface located at
'/sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled'.

However, the anonymous shmem will ignore the anonymous mTHP rule
configured through the sysfs interface, and can only use the PMD-mapped
THP, that is not reasonable.  Many implement anonymous page sharing
through mmap(MAP_SHARED | MAP_ANONYMOUS), especially in database usage
scenarios, therefore, users expect to apply an unified mTHP strategy for
anonymous pages, also including the anonymous shared pages, in order to
enjoy the benefits of mTHP.  For example, lower latency than PMD-mapped
THP, smaller memory bloat than PMD-mapped THP, contiguous PTEs on ARM
architecture to reduce TLB miss etc.

As discussed in the bi-weekly MM meeting[1], the mTHP controls should
control all of shmem, not only anonymous shmem, but support will be added
iteratively.  Therefore, this patch set starts with support for anonymous
shmem.

The primary strategy is similar to supporting anonymous mTHP.  Introduce a
new interface '/mm/transparent_hugepage/hugepage-XXkb/shmem_enabled',
which can have almost the same values as the top-level
'/sys/kernel/mm/transparent_hugepage/shmem_enabled', with adding a new
additional "inherit" option and dropping the testing options 'force' and
'deny'.  By default all sizes will be set to "never" except PMD size,
which is set to "inherit".  This ensures backward compatibility with the
anonymous shmem enabled of the top level, meanwhile also allows
independent control of anonymous shmem enabled for each mTHP.

Use the page fault latency tool to measure the performance of 1G anonymous shmem
with 32 threads on my machine environment with: ARM64 Architecture, 32 cores,
125G memory:
base: mm-unstable
user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
0.04s        3.10s         83516.416                  2669684.890

mm-unstable + patchset, anon shmem mTHP disabled
user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
0.02s        3.14s         82936.359                  2630746.027

mm-unstable + patchset, anon shmem 64K mTHP enabled
user-time    sys_time    faults_per_sec_per_cpu     faults_per_sec
0.08s        0.31s         678630.231                 17082522.495

From the data above, it is observed that the patchset has a minimal impact
when mTHP is not enabled (some fluctuations observed during testing). 
When enabling 64K mTHP, there is a significant improvement of the page
fault latency.

[1] https://lore.kernel.org/all/f1783ff0-65bd-4b2b-8952-52b6822a0835@redhat.com/


This patch (of 6):

Add large folio mapping establishment support for finish_fault() as a
preparation, to support multi-size THP allocation of anonymous shmem pages
in the following patches.

Keep the same behavior (per-page fault) for non-anon shmem to avoid
inflating the RSS unintentionally, and we can discuss what size of mapping
to build when extending mTHP to control non-anon shmem in the future.

[baolin.wang@linux.alibaba.com: avoid going beyond the PMD pagetable size]
  Link: https://lkml.kernel.org/r/b0e6a8b1-a32c-459e-ae67-fde5d28773e6@linux.alibaba.com
[baolin.wang@linux.alibaba.com: use 'PTRS_PER_PTE' instead of 'PTRS_PER_PTE - 1']
  Link: https://lkml.kernel.org/r/e1f5767a-2c9b-4e37-afe6-1de26fe54e41@linux.alibaba.com
Link: https://lkml.kernel.org/r/cover.1718090413.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/3a190892355989d42f59cf9f2f98b94694b0d24d.1718090413.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03 19:30:03 -07:00
arch arch/x86: do not explicitly clear Reserved flag in free_pagetable 2024-07-03 19:30:02 -07:00
block block: unmap and free user mapped integrity via submitter 2024-06-12 11:00:50 -06:00
certs kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
crypto This push fixes a bug in the new ecc P521 code as well as a buggy 2024-05-20 08:47:54 -07:00
Documentation mm: drop leftover comment references to pxx_huge() 2024-07-03 19:30:02 -07:00
drivers ata fixes for 6.10-rc6 2024-06-30 14:32:24 -07:00
fs mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
include percpu: add __this_cpu_try_cmpxchg() 2024-07-03 19:30:02 -07:00
init gcc: disable '-Warray-bounds' for gcc-9 2024-06-15 10:43:04 -07:00
io_uring io_uring: signal SQPOLL task_work with TWA_SIGNAL_NO_IPI 2024-06-24 19:46:15 -06:00
ipc Mainly singleton patches, documented in their respective changelogs. 2024-05-19 14:02:03 -07:00
kernel mm: remove the implementation of swap_free() and always use swap_free_nr() 2024-07-03 19:30:01 -07:00
lib hardening fixes for v6.10-rc6 2024-06-28 16:11:02 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm: memory: extend finish_fault() to support large folio 2024-07-03 19:30:03 -07:00
net NFS client bugfixes for Linux 6.10 2024-06-29 13:48:24 -07:00
rust rust: avoid unused import warning in rusttest 2024-06-11 23:33:28 +02:00
samples tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
scripts kbuild: scripts/gdb: bring the "abspath" back 2024-06-27 04:20:32 +09:00
security lsm/stable-6.10 PR 20240617 2024-06-17 18:35:12 -07:00
sound ASoC: Fixes for v6.10 2024-06-26 22:02:55 +02:00
tools selftest: mm: Test if hugepage does not get leaked during __bio_release_pages() 2024-07-03 19:29:59 -07:00
usr kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
virt KVM fixes for 6.10 2024-06-21 08:03:55 -04:00
.clang-format clang-format: Update with v6.7-rc4's for_each macro list 2023-12-08 23:54:38 +01:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: create a list of all built DTB files 2024-02-19 18:20:39 +09:00
.mailmap bpf-for-netdev 2024-06-14 17:57:10 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer 2024-03-27 13:41:02 -05:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig
MAINTAINERS IOMMU Fixes for Linux v6.10-rc5 2024-06-28 09:18:01 -07:00
Makefile Linux 6.10-rc6 2024-06-30 14:40:44 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.