linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-12 16:18:45 +02:00

Author	SHA1	Message	Date
Johannes Weiner	2f5bd89ba9	mm: zpdesc: minor naming and comment corrections zpdesc is the page descriptor used by the zsmalloc backend allocator, which in turn is used by zswap and zram. The zpool layer is gone. Link: https://lkml.kernel.org/r/20250829162212.208258-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: SeongJae Park <sj@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-21 14:21:59 -07:00
Johannes Weiner	2ccd9fecd9	mm: remove unused zpool layer With zswap using zsmalloc directly, there are no more in-tree users of this code. Remove it. With zpool gone, zsmalloc is now always a simple dependency and no longer something the user needs to configure. Hide CONFIG_ZSMALLOC from the user and have zswap and zram pull it in as needed. Link: https://lkml.kernel.org/r/20250829162212.208258-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Vitaly Wool <vitaly.wool@konsulko.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-21 14:21:59 -07:00
Johannes Weiner	5c3f8be0c6	mm: zswap: interact directly with zsmalloc Patch series "mm: remove zpool". zpool is an indirection layer for zswap to switch between multiple allocator backends at runtime. Since 6.15, zsmalloc is the only allocator left in-tree, so there is no point in keeping zpool around. This patch (of 3): zswap goes through the zpool layer to enable runtime-switching of allocator backends for compressed data. However, since zbud and z3fold were removed in 6.15, zsmalloc has been the only option available. As such, the zpool indirection is unnecessary. Make zswap deal with zsmalloc directly. This is comparable to zram, which also directly interacts with zsmalloc and has never supported a different backend. Note that this does not preclude future improvements and experiments with different allocation strategies. Should it become necessary, it's possible to provide an alternate implementation for the zsmalloc API, selectable at compile time. However, zsmalloc is also rather mature and feature rich, with years of widespread production exposure; it's encouraged to make incremental improvements rather than fork it. In any case, the complexity of runtime pluggability seems excessive and unjustified at this time. Switch zswap to zsmalloc to remove the last user of the zpool API. [hannes@cmpxchg.org: fix default compressr test] Link: https://lkml.kernel.org/r/20250915153640.GA828739@cmpxchg.org Link: https://lkml.kernel.org/r/20250829162212.208258-1-hannes@cmpxchg.org Link: https://lkml.kernel.org/r/20250829162212.208258-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Nacked-by: Vitaly Wool <vitaly.wool@konsulko.se> Acked-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-21 14:21:58 -07:00
Sabyrzhan Tasbolatov	e45085f267	kasan: call kasan_init_generic in kasan_init Call kasan_init_generic() which handles Generic KASAN initialization. For architectures that do not select ARCH_DEFER_KASAN, this will be a no-op for the runtime flag but will print the initialization banner. For SW_TAGS and HW_TAGS modes, their respective init functions will handle the flag enabling, if they are enabled/implemented. Link: https://lkml.kernel.org/r/20250810125746.1105476-3-snovitoll@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217049 Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> [riscv] Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Alexander Potapenko <glider@google.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: David Gow <davidgow@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Marco Elver <elver@google.com> Cc: Qing Zhang <zhangqing@loongson.cn> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-21 14:21:58 -07:00
Sabyrzhan Tasbolatov	1e338f4d99	kasan: introduce ARCH_DEFER_KASAN and unify static key across modes Patch series "kasan: unify kasan_enabled() and remove arch-specific implementations", v6. This patch series addresses the fragmentation in KASAN initialization across architectures by introducing a unified approach that eliminates duplicate static keys and arch-specific kasan_arch_is_ready() implementations. The core issue is that different architectures have inconsistent approaches to KASAN readiness tracking: - PowerPC, LoongArch, and UML arch, each implement own kasan_arch_is_ready() - Only HW_TAGS mode had a unified static key (kasan_flag_enabled) - Generic and SW_TAGS modes relied on arch-specific solutions or always-on behavior This patch (of 2): Introduce CONFIG_ARCH_DEFER_KASAN to identify architectures [1] that need to defer KASAN initialization until shadow memory is properly set up, and unify the static key infrastructure across all KASAN modes. [1] PowerPC, UML, LoongArch selects ARCH_DEFER_KASAN. The core issue is that different architectures haveinconsistent approaches to KASAN readiness tracking: - PowerPC, LoongArch, and UML arch, each implement own kasan_arch_is_ready() - Only HW_TAGS mode had a unified static key (kasan_flag_enabled) - Generic and SW_TAGS modes relied on arch-specific solutions or always-on behavior This patch addresses the fragmentation in KASAN initialization across architectures by introducing a unified approach that eliminates duplicate static keys and arch-specific kasan_arch_is_ready() implementations. Let's replace kasan_arch_is_ready() with existing kasan_enabled() check, which examines the static key being enabled if arch selects ARCH_DEFER_KASAN or has HW_TAGS mode support. For other arch, kasan_enabled() checks the enablement during compile time. Now KASAN users can use a single kasan_enabled() check everywhere. Link: https://lkml.kernel.org/r/20250810125746.1105476-1-snovitoll@gmail.com Link: https://lkml.kernel.org/r/20250810125746.1105476-2-snovitoll@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217049 Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> #powerpc Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: David Gow <davidgow@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Marco Elver <elver@google.com> Cc: Qing Zhang <zhangqing@loongson.cn> Cc: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-21 14:21:58 -07:00
Andrew Morton	bc9950b56f	Merge branch 'mm-hotfixes-stable' into mm-stable in order to pick up changes required by mm-stable material: hugetlb and damon.	2025-09-21 14:19:36 -07:00
Sergey Senozhatsky	ce4be9e430	zram: fix slot write race condition Parallel concurrent writes to the same zram index result in leaked zsmalloc handles. Schematically we can have something like this: CPU0 CPU1 zram_slot_lock() zs_free(handle) zram_slot_lock() zram_slot_lock() zs_free(handle) zram_slot_lock() compress compress handle = zs_malloc() handle = zs_malloc() zram_slot_lock zram_set_handle(handle) zram_slot_lock zram_slot_lock zram_set_handle(handle) zram_slot_lock Either CPU0 or CPU1 zsmalloc handle will leak because zs_free() is done too early. In fact, we need to reset zram entry right before we set its new handle, all under the same slot lock scope. Link: https://lkml.kernel.org/r/20250909045150.635345-1-senozhatsky@chromium.org Fixes: `71268035f5` ("zram: free slot memory early during write") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reported-by: Changhui Zhong <czhong@redhat.com> Closes: https://lore.kernel.org/all/CAGVVp+UtpGoW5WEdEU7uVTtsSCjPN=ksN6EcvyypAtFDOUf30A@mail.gmail.com/ Tested-by: Changhui Zhong <czhong@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-15 20:01:45 -07:00
Liam R. Howlett	103e90626d	maple_tree: testing fix for spanning store on 32b 32 bit nodes have a larger branching factor. This affects the required value to cause a height change. Update the spanning store height test to work for both 64 and 32 bit nodes. Link: https://lkml.kernel.org/r/20250828003023.418966-3-Liam.Howlett@oracle.com Fixes: `f9d3a963fe` ("maple_tree: use height and depth consistently") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:26 -07:00
Liam R. Howlett	82b5fe3059	maple_tree: fix testing for 32 bit builds Patch series "maple_tree: Fix testing for 32bit compiles". The maple tree test suite supports 32bit builds which causes 32bit nodes and index/last values. Some tests have too large values and must be skipped while others depend on certain actions causing the tree to be altered in another measurable way (such as the height decreasing or increasing). Two tests were added that broke 32bit testing, either by compile warnings or failures. These fixes restore the tests to a working order. Building 32bit version can be done on a 32bit platform, or by using a command like: BUILD=32 make clean maple This patch (of 2): Some tests are invalid on 32bit due to the size of the index and last. Making those tests depend on the correct build flags stops compile complaints. Link: https://lkml.kernel.org/r/20250828003023.418966-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20250828003023.418966-2-Liam.Howlett@oracle.com Fixes: `5d659bbb52` ("maple_tree: introduce mas_wr_store_type()") Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:26 -07:00
Max Kellermann	39b44c8c73	huge_mm.h: disallow is_huge_zero_folio(NULL) Calling is_huge_zero_folio(NULL) should not be legal - it makes no sense, and a different (theoretical) implementation may dereference the pointer. But currently, lacking any explicit documentation, this call is possible. But if somebody really passes NULL, the function should not return true - this isn't the huge zero folio after all! However, if the `huge_zero_folio` hasn't been allocated yet, it's NULL, and is_huge_zero_folio(NULL) just happens to return true, which is a lie. This weird side effect prevented me from reproducing a kernel crash that occurred when the elements of a folio_batch were NULL - since folios_put_refs() skips huge zero folios, this sometimes causes a crash, but sometimes does not. For debugging, it is better to reveal such bugs reliably and not hide them behind random preconditions like "has the huge zero folio already been created?" To improve detection of such bugs, David Hildenbrand suggested adding a VM_WARN_ON_ONCE(). Link: https://lkml.kernel.org/r/20250828084820.570118-1-max.kellermann@ionos.com Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:26 -07:00
Wei Yang	204dfefe03	mm/page_alloc: find_large_buddy() from start_pfn aligned order We iterate pfn from order 0 to MAX_PAGE_ORDER aligned to find large buddy. While if the order is less than start_pfn aligned order, we would get the same pfn and do the same check again. Iterate from start_pfn aligned order to reduce duplicated work. [richard.weiyang@gmail.com: add comment on assignment of order] Link: https://lkml.kernel.org/r/20250828091618.7869-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250902025807.11467-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250828091618.7869-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250902025807.11467-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:25 -07:00
Brendan Jackman	c66ae64401	tools: testing: use existing atomic.h for vma/maple tests The shared userspace logic used for unit-testing maple tree and VMA code currently has its own replacements for atomics helpers. This is not needed as the necessary APIs already have userspace implementations in the tools tree. Switching over to that allows deleting a bit of code. Note that the implementation is different; while the version being deleted here is implemented using liburcu, the existing version in tools uses either x86 asm or compiler builtins. It's assumed that both are equally likely to be correct. The tools tree's version of atomic_t is a struct type while the version being deleted was just a typedef of an integer. This means it's no longer valid to call __sync_bool_compare_and_swap() directly on it. One option would be to just peek into the struct and call it on the field, but it seems a little cleaner to just use the corresponding atomic.h API whic has been added recently. Now the fake mapping_map_writable() is copied from the real one. Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-4-02d146a58ed2@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:25 -07:00
Brendan Jackman	953dad21bb	tools: testing: support EXTRA_CFLAGS in shared.mk This allows the user to set cflags when building tests that use this shared build infrastructure. For example, it enables building with -Werror so that patch-check scripts will fail: make -C tools/testing/vma -j EXTRA_CFLAGS=-Werror Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-3-02d146a58ed2@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Pedro Falcato <pfalcato@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:25 -07:00
Brendan Jackman	d794cd23dc	tools: testing: allow importing arch headers in shared.mk There is an arch/ tree under tools. This contains some useful stuff, to make that available, add it to the -I flags. This requires $(SRCARCH), which is provided by Makefile.arch, so include that.. There still aren't that many headers so also just smush all of them into SHARED_DEPS instead of starting to do any header dependency hocus pocus. Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-2-02d146a58ed2@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Pedro Falcato <pfalcato@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:25 -07:00
Brendan Jackman	ff0db419b2	tools/include: implement a couple of atomic_t ops Patch series "tools: testing: Use existing atomic.h for vma/maple tests", v2. De-duplicating this lets us delete a bit of code. Ulterior motive: I'm working on a new set of the userspace-based unit tests, which will need the atomics API too. That would involve even more duplication, so while the win in this patchset alone is very minimal, it looks a lot more significant with my other WIP patchset. I've tested these commands: make -C tools/testing/vma -j tools/testing/vma/vma make -C tools/testing/radix-tree -j tools/testing/radix-tree/maple Note the EXTRA_CFLAGS patch is actually orthogonal, let me know if you'd prefer I send it separately. This patch (of 4): The VMA tests need an operation equivalent to atomic_inc_unless_negative() to implement a fake mapping_map_writable(). Adding it will enable them to switch to the shared atomic headers and simplify that fake implementation. In order to add that, also add atomic_try_cmpxchg() which can be used to implement it. This is copied from Documentation/atomic_t.txt. Then, implement atomic_inc_unless_negative() itself based on the raw_atomic_dec_unless_positive() in include/linux/atomic/atomic-arch-fallback.h. There's no present need for a highly-optimised version of this (nor any reason to think this implementation is sub-optimal on x86) so just implement this with generic C, no x86-specifics. Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-0-02d146a58ed2@google.com Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-1-02d146a58ed2@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:24 -07:00
Max Kellermann	1e332f303a	pagevec.h: add `const` to pointer parameters of getter functions For improved const-correctness. Link: https://lkml.kernel.org/r/20250828130311.772993-1-max.kellermann@ionos.com Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:24 -07:00
Quanmin Yan	d8f867fa08	mm/damon: add damon_ctx->min_sz_region Adopting addr_unit would make DAMON_MINREGION 'addr_unit * 4096' bytes and cause data alignment issues[1]. Add damon_ctx->min_sz_region to change DAMON_MIN_REGION from a global macro value to per-context variable. Link: https://lkml.kernel.org/r/20250828171242.59810-12-sj@kernel.org Link: https://lore.kernel.org/all/527714dd-0e33-43ab-bbbd-d89670ba79e7@huawei.com [1] Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:24 -07:00
SeongJae Park	56cd19404a	Docs/ABI/damon: document addr_unit file Document addr_unit DAMON sysfs file on DAMON ABI document. Link: https://lkml.kernel.org/r/20250828171242.59810-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:24 -07:00
SeongJae Park	e0c725455f	Docs/admin-guide/mm/damon/usage: document addr_unit file Document addr_unit DAMON sysfs file on DAMON usage document. Link: https://lkml.kernel.org/r/20250828171242.59810-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:23 -07:00
SeongJae Park	7b06c471af	Docs/mm/damon/design: document 'address unit' parameter Add 'addr_unit' parameter description on DAMON design document. Link: https://lkml.kernel.org/r/20250828171242.59810-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:23 -07:00
SeongJae Park	540a2aebc6	mm/damon/sysfs: implement addr_unit file under context dir Only DAMON kernel API callers can use addr_unit parameter. Implement a sysfs file to let DAMON sysfs ABI users use it. Additionally, addr_unit must be set to a non-zero value. Link: https://lkml.kernel.org/r/20250828171242.59810-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:23 -07:00
SeongJae Park	01e7ee33a0	mm/damon/paddr: support addr_unit for DAMOS_STAT Add support of addr_unit for DAMOS_STAT action handling from the DAMOS operation implementation for the physical address space. Link: https://lkml.kernel.org/r/20250828171242.59810-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:23 -07:00
SeongJae Park	ec1d5bab06	mm/damon/paddr: support addr_unit for MIGRATE_{HOT,COLD} Add support of addr_unit for DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD action handling from the DAMOS operation implementation for the physical address space. Link: https://lkml.kernel.org/r/20250828171242.59810-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:22 -07:00
SeongJae Park	51a1ebd3a2	mm/damon/paddr: support addr_unit for DAMOS_LRU_[DE]PRIO Add support of addr_unit for DAMOS_LRU_PRIO and DAMOS_LRU_DEPRIO action handling from the DAMOS operation implementation for the physical address space. Link: https://lkml.kernel.org/r/20250828171242.59810-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:22 -07:00
SeongJae Park	85246435b2	mm/damon/paddr: support addr_unit for DAMOS_PAGEOUT Add support of addr_unit for DAMOS_PAGEOUT action handling from the DAMOS operation implementation for the physical address space. Link: https://lkml.kernel.org/r/20250828171242.59810-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:22 -07:00
SeongJae Park	d8096848e7	mm/damon/paddr: support addr_unit for access monitoring Add support of addr_unit paramer for access monitoing operations of paddr. Link: https://lkml.kernel.org/r/20250828171242.59810-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:22 -07:00
SeongJae Park	09a616cbb3	mm/damon/core: add damon_ctx->addr_unit Patch series "mm/damon: support ARM32 with LPAE", v3. Previously, DAMON's physical address space monitoring only supported memory ranges below 4GB on LPAE-enabled systems. This was due to the use of 'unsigned long' in 'struct damon_addr_range', which is 32-bit on ARM32 even with LPAE enabled[1]. To add DAMON support for ARM32 with LPAE enabled, a new core layer parameter called 'addr_unit' was introduced[2]. Operations set layer can translate a core layer address to the real address by multiplying the parameter value to the core layer address. Support of the parameter is up to each operations layer implementation, though. For example, operations set implementations for virtual address space can simply ignore the parameter. Add the support on paddr, which is the DAMON operations set implementation for the physical address space, as we have a clear use case for that. This patch (of 11): In some cases, some of the real address that handled by the underlying operations set cannot be handled by DAMON since it uses only 'unsinged long' as the address type. Using DAMON for physical address space monitoring of 32 bit ARM devices with large physical address extension (LPAE) is one example[1]. Add a parameter name 'addr_unit' to core layer to help such cases. DAMON core API callers can set it as the scale factor that will be used by the operations set for translating the core layer's addresses to the real address by multiplying the parameter value to the core layer address. Support of the parameter is up to each operations set layer. The support from the physical address space operations set (paddr) will be added with following commits. Link: https://lkml.kernel.org/r/20250828171242.59810-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250828171242.59810-2-sj@kernel.org Link: https://lore.kernel.org/20250408075553.959388-1-zuoze1@huawei.com [1] Link: https://lore.kernel.org/all/20250416042551.158131-1-sj@kernel.org/ [2] Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:21 -07:00
Wei Yang	98c94f1035	mm/pageblock-flags: remove PB_migratetype_bits/PB_migrate_end enum pageblock_bits defines the meaning of pageblock bits. Currently PB_migratetype_bits says the lowest 3 bits represents migratetype and PB_migrate_end/MIGRATETYPE_MASK's definition rely on it with magical computation. Remove the definition of PB_migratetype_bits/PB_migrate_end. Use PB_migrate_[0\|1\|2] to represent lowest bits for migratetype. Then we can simplify related definition. Also, MIGRATETYPE_AND_ISO_MASK is MIGRATETYPE_MASK add isolation bit. Use MIGRATETYPE_MASK in the definition of MIGRATETYPE_AND_ISO_MASK looks cleaner. No functional change intended. Link: https://lkml.kernel.org/r/20250827070105.16864-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:21 -07:00
Wei Yang	dd3b304b94	mm/page_alloc: use xxx_pageblock_isolate() for better reading Patch series "mm/pageblock: improve readability of some pageblock handling", v3. During code reading, found two possible points to improve the readability of pageblock handling. Patch 1: isolate bit is standalone and there are dedicated helpers. Instead of check the bit directly, we could use the helper to do it. Patch 2: remove PB_migratetype_bits and PB_migrate_end to reduce magical computation. This patch (of 2): Since commit `e904bce2d9` ("mm/page_isolation: make page isolation a standalone bit"), it provides dedicated helper to handle isolation. Change to use these helpers to be better reading. No functional change intended. Link: https://lkml.kernel.org/r/20250827070105.16864-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250827070105.16864-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:21 -07:00
Boris Burkov	b55102826d	btrfs: set AS_KERNEL_FILE on the btree_inode extent_buffers are global and shared so their pages should not belong to any particular cgroup (currently whichever cgroups happens to allocate the extent_buffer). Btrfs tree operations should not arbitrarily block on cgroup reclaim or have the shared extent_buffer pages on a cgroup's reclaim lists. Link: https://lkml.kernel.org/r/2ee99832619a3fdfe80bf4dc9760278662d2d746.1755812945.git.boris@bur.io Signed-off-by: Boris Burkov <boris@bur.io> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Tested-by: syzbot@syzkaller.appspotmail.com Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qu Wenruo <wqu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:20 -07:00
Boris Burkov	e3a9ac4e86	mm: add vmstat for kernel_file pages Kernel file pages are tricky to track because they are indistinguishable from files whose usage is accounted to the root cgroup. To maintain good accounting, introduce a vmstat counter tracking kernel file pages. Confirmed that these work as expected at a high level by mounting a btrfs using AS_KERNEL_FILE for metadata pages, and seeing the counter rise with fs usage then go back to a minimal level after drop_caches and finally down to 0 after unmounting the fs. Link: https://lkml.kernel.org/r/08ff633e3a005ed5f7691bfd9f58a5df8e474339.1755812945.git.boris@bur.io Signed-off-by: Boris Burkov <boris@bur.io> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Tested-by: syzbot@syzkaller.appspotmail.com Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qu Wenruo <wqu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:20 -07:00
Boris Burkov	cf1dec76ba	mm/filemap: add AS_KERNEL_FILE Patch series "introduce kernel file mapped folios", v4. Btrfs currently tracks its metadata pages in the page cache, using a fake inode (fs_info->btree_inode) with offsets corresponding to where the metadata is stored in the filesystem's full logical address space. A consequence of this is that when btrfs uses filemap_add_folio(), this usage is charged to the cgroup of whichever task happens to be running at the time. These folios don't belong to any particular user cgroup, so I don't think it makes much sense for them to be charged in that way. Some negative consequences as a result: - A task can be holding some important btrfs locks, then need to lookup some metadata and go into reclaim, extending the duration it holds that lock for, and unfairly pushing its own reclaim pain onto other cgroups. - If that cgroup goes into reclaim, it might reclaim these folios a different non-reclaiming cgroup might need soon. This is naturally offset by LRU reclaim, but still. We have two options for how to manage such file pages: 1. charge them to the root cgroup. 2. don't charge them to any cgroup at all. 2. breaks the invariant that every mapped page has a cgroup. This is workable, but unnecessarily risky. Therefore, go with 1. A very similar proposal to use the root cgroup was previously made by Qu, where he eventually proposed the idea of setting it per address_space. This makes good sense for the btrfs use case, as the behavior should apply to all use of the address_space, not select allocations. I.e., if someone adds another filemap_add_folio() call using btrfs's btree_inode, we would almost certainly want to account that to the root cgroup as well. This patch (of 3): Add the flag AS_KERNEL_FILE to the address_space to indicate that this mapping's memory is exempt from the usual memcg accounting. [boris@bur.io: fix CONFIG_MEMCG build for AS_KERNEL_FILE] Link: https://lkml.kernel.org/r/6de59ddeec81b5c294d337c001ba0061631d4ec6.1755816635.git.boris@bur.io Link: https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/ Link: https://lkml.kernel.org/r/f09c4e2c90351d4cb30a1969f7a863b9238bd291.1755812945.git.boris@bur.io Signed-off-by: Boris Burkov <boris@bur.io> Suggested-by: Qu Wenruo <wqu@suse.com> Suggested-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:20 -07:00
Miaohe Lin	c090868f59	Revert "hugetlb: make hugetlb depends on SYSFS or SYSCTL" Commit `f8142cf94d` ("hugetlb: make hugetlb depends on SYSFS or SYSCTL") added dependency on SYSFS or SYSCTL but hugetlb can be used without SYSFS or SYSCTL. So this dependency is wrong and should be removed. For users with CONFIG_SYSFS or CONFIG_SYSCTL on, there should be no difference. For users have CONFIG_SYSFS and CONFIG_SYSCTL both undefined, hugetlbfs can still works perfectly well through cmdline except a possible kismet warning[1] when select CONFIG_HUGETLBFS. IMHO, it might not worth a backport. This reverts commit `f8142cf94d`. It overlooked the scenario of using hugetlb through boot parameters when it was submitted. Link: https://lkml.kernel.org/r/20250826030955.2898709-1-linmiaohe@huawei.com Link: https://lore.kernel.org/all/5c99458f-4a91-485f-8a35-3618a992e2e4@csgroup.eu/ [1] Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508222032.bwJsQPZ1-lkp@intel.com/ Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:20 -07:00
Dev Jain	1580cd50b6	selftests/mm/uffd-stress: stricten constraint on free hugepages needed before the test The test requires at least 2 * (bytes/page_size) hugetlb memory, since we require identical number of hugepages for src and dst location. Fix this. Along with the above, as explained in patch "selftests/mm/uffd-stress: Make test operate on less hugetlb memory", the racy nature of the test requires that we have some extra number of hugepages left beyond what is required. Therefore, stricten this constraint. Link: https://lkml.kernel.org/r/20250909061531.57272-3-dev.jain@arm.com Fixes: `5a6aa60d18` ("selftests/mm: skip uffd hugetlb tests with insufficient hugepages") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:19 -07:00
Dev Jain	060b6c72ce	selftests/mm/uffd-stress: make test operate on less hugetlb memory Patch series "selftests/mm: uffd-stress fixes", v2. This patchset ensures that the number of hugepages is correctly set in the system so that the uffd-stress test does not fail due to the racy nature of the test. Patch 1 changes the hugepage constraint in the run_vmtests.sh script, whereas patch 2 changes the constraint in the test itself. This patch (of 2): We observed uffd-stress selftest failure on arm64 and intermittent failures on x86 too: running ./uffd-stress hugetlb-private 128 32 bounces: 17, mode: rnd read, ERROR: UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:617) [FAIL] not ok 18 uffd-stress hugetlb-private 128 32 # exit=1 For this particular case, the number of free hugepages from run_vmtests.sh will be 128, and the test will allocate 64 hugepages in the source location. The stress() function will start spawning threads which will operate on the destination location, triggering uffd-operations like UFFDIO_COPY from src to dst, which means that we will require 64 more hugepages for the dst location. Let us observe the locking_thread() function. It will lock the mutex kept at dst, triggering uffd-copy. Suppose that 127 (64 for src and 63 for dst) hugepages have been reserved. In case of BOUNCE_RANDOM, it may happen that two threads trying to lock the mutex at dst, try to do so at the same hugepage number. If one thread succeeds in reserving the last hugepage, then the other thread may fail in alloc_hugetlb_folio(), returning -ENOMEM. I can confirm that this is indeed the case by this hacky patch: :--- a/mm/hugetlb.c ; +++ b/mm/hugetlb.c ; @@ -6929,6 +6929,11 @@ int hugetlb_mfill_atomic_pte(pte_t dst_pte, ; ; folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); ; if (IS_ERR(folio)) { ; + pte_t actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE); ; + if (actual_pte) { ; + ret = -EEXIST; ; + goto out; ; + } ; ret = -ENOMEM; ; goto out; ; } This code path gets triggered indicating that the PMD at which one thread is trying to map a hugepage, gets filled by a racing thread. Therefore, instead of using freepgs to compute the amount of memory, use freepgs - (min(32, nr_cpus) - 1), so that the test still has some extra hugepages to use. The adjustment is a function of min(32, nr_cpus) - the value of nr_parallel in the test - because in the worst case, nr_parallel number of threads will try to map a hugepage on the same PMD, one will win the allocation race, and the other nr_parallel - 1 threads will fail, so we need extra nr_parallel - 1 hugepages to satisfy this request. Note that, in case the adjusted value underflows, there is a check for the number of free hugepages in the test itself, which will fail: get_free_hugepages() < bytes / page_size A negative value will be passed on to bytes which is of type size_t, thus the RHS will become a large value and the check will fail, so we are safe. Link: https://lkml.kernel.org/r/20250909061531.57272-1-dev.jain@arm.com Link: https://lkml.kernel.org/r/20250909061531.57272-2-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:19 -07:00
Baolin Wang	6d11dec130	mm: shmem: drop the unnecessary folio_nr_pages() We've got the number of pages in the folio earlier, thus remove the redundant folio_nr_pages() call. Link: https://lkml.kernel.org/r/67c80182ebd949e3894908e01e224697c143aabb.1756200587.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:19 -07:00
Baolin Wang	ab1c34c834	mm: shmem: use 'folio' for shmem_partial_swap_usage() It is more straightforward to use the term `folio'. No functional changes. Link: https://lkml.kernel.org/r/a2d39608d99cba1130cacd9cffbafc6949193c08.1756200587.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:19 -07:00
Brendan Jackman	6c3826173e	mm/page_alloc: harmonize should_compact_retry() type Currently order is signed in one version of the function and unsigned in the other. Tidy that up. In page_alloc.c, order is unsigned in the vast majority of cases. But, there is a cluster of exceptions in compaction-related code (probably stemming from the fact that compact_control.order is signed). So, prefer local consistency and make this one signed too. Link: https://lkml.kernel.org/r/20250826-cleanup-should_compact_retry-v1-1-d2ca89727fcf@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:18 -07:00
Sidhartha Kumar	ef49b7b39d	maple_tree: fix MAPLE_PARENT_RANGE32 and parent pointer docs MAPLE_PARENT_RANGE32 should be 0x02 as a 32 bit node is indicated by the bit pattern 0b010 which is the hex value 0x02. There are no users currently, so there is no associated bug with this wrong value. Fix typo Note -> Node and replace x with b to indicate binary values. Link: https://lkml.kernel.org/r/20250826151344.403286-1-sidhartha.kumar@oracle.com Fixes: `54a611b605` ("Maple Tree: add new data structure") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:18 -07:00
Pratyush Yadav	e76e09bdf9	kho: make sure kho_scratch argument is fully consumed When specifying fixed sized scratch areas, the parser only parses the three scratch sizes and ignores the rest of the argument. This means the argument can have any bogus trailing characters. For example, "kho_scratch=256M,512M,512Mfoobar" results in successful parsing: [ 0.000000] KHO: scratch areas: lowmem: 256MiB global: 512MiB pernode: 512MiB It is generally a good idea to parse arguments as strictly as possible. In addition, if bogus trailing characters are allowed in the kho_scratch argument, it is possible that some people might end up using them and later extensions to the argument format will cause unexpected breakages. Make sure the argument is fully consumed after all three scratch sizes are parsed. With this change, the bogus argument "kho_scratch=256M,512M,512Mfoobar" results in: [ 0.000000] Malformed early option 'kho_scratch' Link: https://lkml.kernel.org/r/20250826123817.64681-1-pratyush@kernel.org Signed-off-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:18 -07:00
Wander Lairson Costa	dfd04add59	kmem/tracing: add kmem name to kmem_cache_alloc tracepoint The kmem_cache_free tracepoint includes a "name" field, which allows for easy identification and filtering of specific kmem's. However, the kmem_cache_alloc tracepoint lacks this field, making it difficult to pair corresponding alloc and free events for analysis. Add the "name" field to kmem_cache_alloc to enable consistent tracking and correlation of kmem alloc and free events. Link: https://lkml.kernel.org/r/20250825125927.59816-1-wander@redhat.com Signed-off-by: Wander Lairson Costa <wander@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:18 -07:00
Kairui Song	46afff4599	mm/page-writeback: drop usage of folio_index folio_index is only needed for mixed usage of page cache and swap cache. The remaining three caller in page-writeback are for page cache tag marking. Swap cache space doesn't use tag (explicitly sets mapping_set_no_writeback_tags), so use folio->index here directly. Link: https://lkml.kernel.org/r/20250825163721.17734-1-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
I Viswanath	79dfed0976	selftests/mm: use calloc instead of malloc in pagemap_ioctl.c As per Documentation/process/deprecated.rst, dynamic size calculations should not be performed in memory allocator arguments due to possible overflows. Replace malloc with calloc to avoid open-ended arithmetic and prevent possible overflows. Link: https://lkml.kernel.org/r/20250825170643.63174-1-viswanathiyyappan@gmail.com Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed by: Donet Tom <donettom@linux.ibm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
Donet Tom	786eb990cf	drivers/base/node: handle error properly in register_one_node() If register_node() returns an error, it is not handled correctly. The function will proceed further and try to register CPUs under the node, which is not correct. So, in this patch, if register_node() returns an error, we return immediately from the function. Link: https://lkml.kernel.org/r/20250822084845.19219-1-donettom@linux.ibm.com Fixes: `76b67ed9dc` ("[PATCH] node hotplug: register cpu: remove node struct") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
Wei Yang	3615e106e0	mm/khugepaged: use list_xxx() helper to improve readability In general, khugepaged_scan_mm_slot() iterates khugepaged_scan.mm_head list to get a mm_struct for collapse memory. Use list_xxx() helper would be more obvious to the list iteration operation. No functional change. Link: https://lkml.kernel.org/r/20250822025732.9025-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:17 -07:00
Bala-Vignesh-Reddy	a7498388b0	selftests: centralise maybe-unused definition in kselftest.h Several selftests subdirectories duplicated the define __maybe_unused, leading to redundant code. Move to kselftest.h header and remove other definitions. This addresses the duplication noted in the proc-pid-vm warning fix Link: https://lkml.kernel.org/r/20250821101159.2238-1-reddybalavignesh9979@gmail.com Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Link:https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/ Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Mickal Salan <mic@digikod.net> [landlock] Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
ally heev	940b1be225	kselftest: mm: fix typos in test_vmalloc.sh Fix simple typos in function name and console message. Link: https://lkml.kernel.org/r/20250823170208.184149-1-allyheev@gmail.com Signed-off-by: ally heev <allyheev@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
Usama Arif	32960f7503	mm/huge_memory: remove enforce_sysfs from __thp_vma_allowable_orders Using forced_collapse directly is clearer and enforce_sysfs is not really needed. Link: https://lkml.kernel.org/r/20250821150038.2025521-1-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
Brendan Jackman	ce32123b9b	mm: remove is_migrate_highatomic() There are 3 potential reasons for is_migrate_() helpers: 1. They represent higher-level attributes of migratetypes, like is_migrate_movable() 2. They are ifdef'd, like is_migrate_isolate(). 3. For consistency with an is_migrate__page() helper, also like is_migrate_isolate(). It looks like is_migrate_highatomic() was for case 3, but that was removed in commit `e0932b6c1f` ("mm: page_alloc: consolidate free page accounting"). So remove the indirection and go back to a simple comparison. Link: https://lkml.kernel.org/r/20250821-is-migrate-highatomic-v1-1-ddb6e5d7c566@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:16 -07:00
Shankari Anand	9907e1df31	rust: mm: update ARef and AlwaysRefCounted imports from sync::aref Update call sites in the mm subsystem to import `ARef` and `AlwaysRefCounted` from `sync::aref` instead of `types`. This aligns with the ongoing effort to move `ARef` and `AlwaysRefCounted` to sync. Link: https://lkml.kernel.org/r/20250716091158.812860-1-shankari.ak0208@gmail.com Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com> Suggested-by: Benno Lossin <lossin@kernel.org> Link: https://github.com/Rust-for-Linux/linux/issues/1173 Acked-by: Alice Ryhl <aliceryhl@google.com> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Trevor Gross <tmgross@umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-09-13 16:55:15 -07:00

1 2 3 4 5 ...

1382922 Commits