linux

mirror of https://github.com/torvalds/linux.git synced 2026-07-27 09:36:22 +02:00

Author	SHA1	Message	Date
Jann Horn	425224c2d7	proc: Fix broken error paths for namespace links Don't return the return value of down_read_killable() (0) when a ptrace access check fails, return -EACCES as intended. Reported-by: Magnus Lindholm <linmag7@gmail.com> Closes: https://lore.kernel.org/r/20260706170735.2941493-1-linmag7@gmail.com Fixes: `6650527444` ("proc: protect ptrace_may_access() with exec_update_lock (part 1)") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260706-procfs-ns-eacces-fix-v1-1-a69ab14c02e6@google.com Tested-by: Magnus Lindholm <linmag7@gmail.com> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>	2026-07-22 15:37:24 +02:00
Linus Torvalds	0e35b9b6ec	20 hotfixes. 17 are for MM. 12 are cc:stable and the remaining 8 address post-7.1 issues or aren't considered suitable for backporting. A two patch series from SJ addresses a couple of quite old DAMON issues. A two patch series from Yichong Chen fixes tools/virtio build issues. The remaining patches are singletons. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCakxNHwAKCRDdBJ7gKXxA jv5kAPwLjxLYxVcsfDSN3CCvvgzixxb7ici7fTgx6+nDgZ11fAEAwb9S2/BFlPIE 04R1tgAWpfuA4vlRHDznJ5v7UqQfkQA= =hpWC -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2026-07-06-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes. 17 are for MM. 12 are cc:stable and the remaining 8 address post-7.1 issues or aren't considered suitable for backporting. Two patches from SJ addresses a couple of quite old DAMON issues. And two patches from Yichong Chen fixes tools/virtio build issues. The remaining patches are singletons" * tag 'mm-hotfixes-stable-2026-07-06-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: tools/include: include stdint.h for SIZE_MAX in overflow.h tools/virtio: add missing compat definitions for vhost_net_test mm: do file ownership checks with the proper mount idmap samples/damon/mtier: fail early if address range parameters are invalid mm: a second pagecache maintainer mm/damon: add a kernel-doc comment for damon_ctx->rnd_state mm/damon: add a kernel-doc comment for damon_ctx->probes mailmap: add entries for Radu Rendec selftests/mm: hmm-tests: include linux/mman.h to access MADV_COLLAPSE selftests/mm: pagemap_ioctl: use the correct page size for transact_test() fs/proc: fix KPF_KSM reported for all anonymous pages mm: page_ext: add count limit to page_ext_iter_next to prevent invalid PFN access mm/damon/ops-common: handle extreme intervals in damon_hot_score() MAINTAINERS: add Lance as an rmap reviewer mm/compaction: handle free_pages_prepare() properly in compaction_free() mm/damon/sysfs-schemes: put stats for scheme_add_dirs() internal error mm/damon/sysfs-schemes: fix dir put orders in access_pattern_add_dirs() mm: shrinker: fix NULL pointer dereference in debugfs mm: shrinker: fix shrinker_info teardown race with expansion selftests/mm: fix ksft_process_madv.sh test category	2026-07-06 18:51:36 -07:00
Jinjiang Tu	81401cebfc	fs/proc: fix KPF_KSM reported for all anonymous pages Reading /proc/kpageflags for any anonymous page returns KPF_KSM set, even when KSM is not in use. As a result, tools misclassify all anonymous pages as KSM merged. In stable_page_flags(), if the page is anonymous, then use (mapping & FOLIO_MAPPING_KSM) check to identify if the anonymous page is KSM page. However, FOLIO_MAPPING_KSM is FOLIO_MAPPING_ANON \| FOLIO_MAPPING_ANON_KSM, (mapping & FOLIO_MAPPING_KSM) check returns true for all anonymous pages. To fix it, use FOLIO_MAPPING_ANON_KSM instead. Link: https://lore.kernel.org/20260629033122.774318-1-tujinjiang@huawei.com Link: https://lore.kernel.org/20260626013252.2846774-1-tujinjiang@huawei.com Fixes: `dee3d0bef2` ("proc: rewrite stable_page_flags()") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Xu Xin <xu.xin16@zte.com.cn> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Luiz Capitulino <luizcap@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Svetly Todorov <svetly.todorov@memverge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-07-01 19:02:54 -07:00
Krzysztof Wilczyński	16b02eb4b9	proc: only bump parent nlink when registering directories proc_register() increments the parent directory's link count for every entry it registers, while remove_proc_entry() and remove_proc_subtree() decrement it only when the removed entry is a directory. Regular files thus inflate the parent's count while they exist, and leak one link permanently on every create and remove cycle. For example, /proc/bus/pci/00 with twenty-two device files and no subdirectories reports nlink 24 instead of 2, and SR-IOV VF enable and disable cycles, each creating and removing the VF config space entries under /proc/bus/pci/<bus>, inflate the link count of that directory without bound. Before commit `e06689bf57` ("proc: change ->nlink under proc_subdir_lock"), the increment lived in proc_mkdir_data() and proc_create_mount_point(), and was therefore applied only to directories. Moving it into proc_register() to bring it under proc_subdir_lock dropped the S_ISDIR check. Thus, move the nlink accounting into pde_subdir_insert() and pde_erase(), only updating it for directories in both, so the link count is always changed together with the directory entry itself. Fixes: `e06689bf57` ("proc: change ->nlink under proc_subdir_lock") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/20260613211005.921692-1-kwilczynski@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>	2026-07-01 15:26:24 +02:00
Linus Torvalds	2e05544060	mm.git review status for mm-hotfixes-stable..mm-nonmm-stable Everything: Total patches: 108 Reviews/patch: 0.84 Reviewed rate: 75% Patch series in this merge: - The 2 patch series "taskstats: fix TGID dead-thread stat retention" from Yiyang Chen fixes a taskstats TGID aggregation bug where fields added in the TGID query path were not preserved after thread exit, and adds a kselftest covering the regression. - The 2 patch series "lib/tests: string_helpers: Slight improvements" from Andy Shevchenko improves lib/tests/string_helpers_kunit.c a little. - The 2 patch series "lib/base64: decode fixes" from Josh Law addreesses minor issues in lib/base64.c. - The 3 patch series "selftests/filelock: Make output more kselftestish" from Mark Brown makes the output from the ofdlocks test a bit easier for tooling to work with, and also ignores the generated file. - The 3 patch series "uaccess: unify inline vs outline copy_{from,to}_user() selection" from Yury Norov simplifies the usercopy code by removing the selectability of inlining copy_{from,to}_user(). - The 5 patch series "ocfs2: validate inline xattr header consumers" from ZhengYuan Huang fixes a number of possible issues in the ocfs2 xattr code. - The 8 patch series "lib and lib/cmdline enhancements" from Dmitry Antipov provides additional robustness checking in the cmdline handling code and its in-kernel testing and selftests. - The 18 patch series "cleanup the RAID6 P/Q library" from Christoph Hellwig cleans up the RAID6 P/Q library to match the recent updates to the RAID 5 XOR library and other CRC/crypto libraries. - The 3 patch series "ocfs2: harden inode validators against forged metadata" from Michael Bommarito adds three structural checks to OCFS2 dinode validation so malformed on-disk fields are rejected before ocfs2_populate_inode() copies them into the in-core inode. - The 2 patch series "lib/raid: replace __get_free_pages() call with kmalloc()" from Mike Rapoport cleans up the lib/raid code by using kmalloc() in more places. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCajgeIQAKCRDdBJ7gKXxA jmlWAQCLJVDZNJMFaXy4a+YHdu3tfemLpSy83A0Le61tOZUdBQD/Sf/7rhgeaM74 32yp53TZLA8xHImCGEin/1ddPJ8DbgY= =GW2I -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2026-06-21-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "taskstats: fix TGID dead-thread stat retention" (Yiyang Chen) Fix a taskstats TGID aggregation bug where fields added in the TGID query path were not preserved after thread exit, and adds a kselftest covering the regression. - "lib/tests: string_helpers: Slight improvements" (Andy Shevchenko) Improve lib/tests/string_helpers_kunit.c a little - "lib/base64: decode fixes" (Josh Law) Address minor issues in lib/base64.c - "selftests/filelock: Make output more kselftestish" (Mark Brown) Make the output from the ofdlocks test a bit easier for tooling to work with. Also ignore the generated file - "uaccess: unify inline vs outline copy_{from,to}_user() selection" (Yury Norov) Simplify the usercopy code by removing the selectability of inlining copy_{from,to}_user(). - "ocfs2: validate inline xattr header consumers" (ZhengYuan Huang) Fix a number of possible issues in the ocfs2 xattr code - "lib and lib/cmdline enhancements" (Dmitry Antipov) Provide additional robustness checking in the cmdline handling code and its in-kernel testing and selftests - "cleanup the RAID6 P/Q library" (Christoph Hellwig) Clean up the RAID6 P/Q library to match the recent updates to the RAID 5 XOR library and other CRC/crypto libraries - "ocfs2: harden inode validators against forged metadata" (Michael Bommarito) Add three structural checks to OCFS2 dinode validation so malformed on-disk fields are rejected before ocfs2_populate_inode() copies them into the in-core inode - "lib/raid: replace __get_free_pages() call with kmalloc()" (Mike Rapoport) Clean up the lib/raid code by using kmalloc() in more places * tag 'mm-nonmm-stable-2026-06-21-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (108 commits) ocfs2: fix circular locking dependency in ocfs2_dio_end_io_write ocfs2: fix NULL h_transaction deref in ocfs2_assure_trans_credits lib: interval_tree_test: validate benchmark parameters ocfs2: avoid moving extents to occupied clusters treewide: fix transposed "sign" typos and update spelling.txt ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec fat: reject BPB volumes whose data area starts beyond total sectors selftests/uevent: increase __UEVENT_BUFFER_SIZE to avoid ENOBUFS on busy systems lib/test_firmware: allocate the configured into_buf size fs: efs: remove unneeded debug prints checkpatch: cuppress warnings when Reported-by: is followed by Link: MAINTAINERS: add Alexander as a kcov reviewer mailmap: update Alexander Sverdlin's Email addresses fs: fat: inode: replace sprintf() with scnprintf() ocfs2: fix out-of-bounds write in ocfs2_remove_refcount_extent ocfs2: fix race between ocfs2_control_install_private() and ocfs2_control_release() ocfs2/dlm: require a ref for locking_state debugfs open ocfs2: reject FITRIM ranges shorter than a cluster ocfs2: validate fast symlink target during inode read ocfs2: add journal NULL check in ocfs2_checkpoint_inode() ...	2026-06-21 13:20:19 -07:00
Linus Torvalds	a552c81ff4	mm.git review status for mm-hotfixes-stable..mm-stable Everything: Total patches: 321 Reviews/patch: 1.52 Reviewed rate: 68% Excluding DAMON: Total patches: 227 Reviews/patch: 2.07 Reviewed rate: 90% Excluding DAMON and selftests: Total patches: 203 Reviews/patch: 2.14 Reviewed rate: 91% Patchsets in this merge: - The 2 patch series "selftests/mm: clean up build output and verbosity" from Li Wang removes some noise from the MM selftests build out. - The 3 patch series "mm: Free contiguous order-0 pages efficiently" from Ryan Roberts speeds up the freeing of a batch of 0-order pages by first scanning them for coalescing opportunities. This is applicable to vfree() and to the releasing of frozen pages. - The 11 patch series "mm/damon: introduce DAMOS failed region quota charge ratio" from SeongJae Park addresses a DAMOS usability issue: The DAMOS quota often exhausts prematurely because it charges for all memory attempted, causing slow and inconsistent performance when actions fail on unreclaimable memory. To fix this, a new feature lets users set a smaller, flexible quota charge ratio (via a numerator and denominator) for failed regions. Since failed actions cause less overhead, reducing their quota cost ensures more predictable and efficient DAMOS processing - The 8 patch series "selftests/cgroup: improve zswap tests robustness and support large page sizes" from Li Wang fixes various spurious failures and improves the overall robustness of the cgroup zswap selftests. - The 3 patch series "fix MAP_DROPPABLE not supported errno" from Anthony Yznaga fixes an issue in the mlock selftests on arm32. - The 2 patch series "mm: huge_memory: clean up defrag sysfs with shared" from Breno Leitao does some maintenance work in the huge_memory code. - The 3 patch series "treewide: fixup gfp_t printks" from Brendan Jackman uses the special vprintf() gfp_t conversion in various places. - The 6 patch series "mm: Fix vmemmap optimization accounting and initialization" from Muchun Song fixes several bugs in the vmemmap optimization, mainly around incorrect page accounting and memmap initialization in the DAX and memory hotplug paths. It also fixes pageblock migratetype initialization and struct page initialization for ZONE_DEVICE compound pages. - The 4 patch series "mm/damon: repost non-hotfix reviewed patches in damon/next tree" from is a sprinkle of unrelated minor bugfixes for DAMON. - The 3 patch series "mm: remove page_mapped()" from David Hildenbrand remove this function from the tree, replacing it with folio_mapped(). - The 10 patch series "mm/damon: let DAMON be paused and resumed" from SeongJae Park permits DAMON to be paused and resumed without losing its current state. - The 3 patch series "kasan: hw_tags: Disable tagging for stack and page-tables" from Muhammad Usama Anjum simplifies and speeds up kasan by removing its ineffective tagging of stacks and page tables. - The 7 patch series "mm/damon/reclaim,lru_sort: monitor all system rams by default" from SeongJae Park simplifies deployment on diverse hardware like NUMA systems by updating DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the physical address range covering all System RAM areas by default, replacing the overly restrictive behavior that only targeted the single largest memory block to save on negligible overhead. - The 2 patch series "mm/damon/sysfs: document filters/ directory as deprecated" from SeongJae Park updates some DAMON docs. - The 8 patch series "mm: use spinlock guards for zone lock" from Dmitry Ilvokhin switches zone->lock handling over to using the guard() mechanisms. - The 2 patch series "mm/filemap: tighten mmap_miss hit accounting" from fujunjie fixes a flaw where the mmap_miss counter over-credited page cache hits during fault-arounds and page-fault retries. This results in significant reduction of redundant synchronous mmap readahead I/O, drastically cutting down execution time and gigabytes read for sparse random or strided memory access workloads. - The 2 patch series "selftests/cgroup: Fix false positive failures in test_percpu_basic" from Li Wang fixes a couple of false-positives in the cgroup kmem selftests. - The 2 patch series "mm/damon/reclaim: support monitoring intervals auto-tuning" from SeongJae Park adds a new parameter to DAMON permitting DAMON_RECLAIM to automatically tune DAMON's sampling and aggregation intervals. - The 2 patch series "mm/damon/stat: add kdamond_pid parameter" from SeongJae Park chnges DAMON_STAT to provide the pid of its kdamond. - The 2 patch series "mm/kmemleak: dedupe verbose scan output" from Breno Leitao removes large amounts of duplicated backtraces from the verbose-mode kmemleak output. - The 8 patch series "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" from David Hildenbrand reduces our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to removing it entirely in a later series. - The 2 patch series "mm/damon: validate min_region_size to be power of 2" from Liew Rui Yan prevents users from passing a non-power-of-2 value of `addr_unit', as this later results in undesirable behavior. - The 2 patch series "mm: document read_pages and simplify usage" from Frederick Mayle does as claimed. - The 3 patch series "tools/mm/page-types: Fix misc bugs" from Ye Liu fixes three issues in tools/mm/page-types.c. - The 4 patch series "mm: misc cleanups from __GFP_UNMAPPED series" from Brendan Jackman implements several cleanups in the page allocator and related code. - The 12 patch series "mm, swap: swap table phase IV: unify allocation" from Kairui Song unifies the allocation and charging of anon and shmem swap in folios, provides better synchronization, consolidates the metadata management, hence dropping the static array and map, and improves performance. - The 28 patch series "mm/damon: introduce data attributes monitoring" from SeongJae Park extends DAMON for monitoring general data attributes other than accesses. - The 5 patch series "mm/vmalloc: free unused pages on vrealloc() shrink" from Shivam Kalra implements the TODO in vrealloc() to unmap and free unused pages when shrinking across a page boundary. - The 3 patch series "mm/damon: documentation and comment fixes" from niecheng does as advertised. - The 3 patch series "remove mmap_action success, error hooks" from Lorenzo Stoakes eliminates custom hooks from mmap_action by removing the problematic success_hook which allowed drivers to improperly access uninitialized VMAs. It replaces the error_hook with a simple error-code field and updates the memory char driver accordingly. - The 14 patch series "mm/damon: minor improvements for code readability and tests" from SeongJae Park implements minor improvements in code readability and tests for DAMON. - The 2 patch series "mm/damon: fix macro arguments and clarify quota goals doc" from Maksym Shcherba does those things. - The 2 patch series "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" from Mike Rapoport performs that code movement. - The 15 patch series "mm/mglru: improve reclaim loop and dirty folio" from Kairui Song and others cleans up and slightly improves MGLRU's reclaim loop and dirty writeback handling. Large performance improvements are measured. - The 3 patch series "use vma locks for proc/pid/{smaps\|numa_maps} reads" from Suren Baghdasaryan uses per-vma locks when reading /proc/pid/smaps and /proc/pid/numa_maps similar to reduce contention on central mmap_lock. - The 2 patch series "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()" from Ran Xiaokai provides some cleanup work in the THP code. - The 2 patch series "selftests/memfd: fix compilation warnings" from Konstantin Khorenko fixes a few build glitches in the memfd selftest code. - The 4 patch series "memcg: shrink obj_stock_pcp and cache multiple objcgs" from Shakeel Butt resolves a 68% performance regression caused by NUMA-node cache thrashing around struct obj_stock_pcp by shrinking its existing fields and expanding it into a multi-slot array that caches up to five obj_cgroup pointers per CPU, allowing per-node variants of the same memcg to coexist within a single 64-byte cache line. - The 2 patch series "zram: writeback fixes" from Sergey Senozhatsky addresses a couple of unrelated zram writeback issues. - The 9 patch series "mm: switch THP shrinker to list_lru" from Johannes Weiner resolves NUMA-awareness issues and streamlines callsite interaction by refactoring and extending the list_lru API to completely replace the complex, open-coded deferred split queue for Transparent Huge Pages (THPs). - The 2 patch series "mm: improve large folio readahead for exec memory" from Usama Arif improves large-folio readahead on systems like 64K-page arm64 by preventing the mmap_miss check from permanently disabling target-oriented VM_EXEC readahead, and by generalizing the force_thp_readahead gate to support mappings with any usefully large maximum folio order under the cache cap. - The 6 patch series "userfaultfd/pagemap: pre-existing fixes" from Kiryl Shutsemau fixes a bunch of minor issues in the userfaultfd/pagemap, all of which were flagged by Sashiko review of proposed new material. - The 5 patch series "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and vmemmap_check_pmd()" from Muchun Song provides generic versions of these two functions so the four arch-specific implementations can be removed. - The 2 patch series "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device" from Youngjun Park addresses a uswsusp-vs-swapoff race and reduces the swap device reference taking/releasing frequency. - The 2 patch series "mm/hmm: A fix and a selftest" from Dev Jain does as claimed. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCajQcWQAKCRDdBJ7gKXxA jkjFAQD14ZCAS7c/lsNHPRpOARR2zrCJiIp+vhZDEx5V+d8u0QD+I4ca11CnQ+OA dKCeabWWYuTR1GDtTHbLVCRfpGexBwM= =cWPO -----END PGP SIGNATURE----- Merge tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "selftests/mm: clean up build output and verbosity" (Li Wang) Remove some noise from the MM selftests build - "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts) Speed up the freeing of a batch of 0-order pages by first scanning them for coalescing opportunities. This is applicable to vfree() and to the releasing of frozen pages - "mm/damon: introduce DAMOS failed region quota charge ratio" (SeongJae Park) Address a DAMOS usability issue: The DAMOS quota often exhausts prematurely because it charges for all memory attempted, causing slow and inconsistent performance when actions fail on unreclaimable memory. To fix this, a new feature lets users set a smaller, flexible quota charge ratio (via a numerator and denominator) for failed regions. Since failed actions cause less overhead, reducing their quota cost ensures more predictable and efficient DAMOS processing - "selftests/cgroup: improve zswap tests robustness and support large page sizes" (Li Wang) Fix various spurious failures and improves the overall robustness of the cgroup zswap selftests - "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga) Fix an issue in the mlock selftests on arm32 - "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao) Some maintenance work in the huge_memory code - "treewide: fixup gfp_t printks" (Brendan Jackman) Use the special vprintf() gfp_t conversion in various places - "mm: Fix vmemmap optimization accounting and initialization" (Muchun Song) Fix several bugs in the vmemmap optimization, mainly around incorrect page accounting and memmap initialization in the DAX and memory hotplug paths. It also fixes pageblock migratetype initialization and struct page initialization for ZONE_DEVICE compound pages - "mm/damon: repost non-hotfix reviewed patches in damon/next tree" A sprinkle of unrelated minor bugfixes for DAMON - "mm: remove page_mapped()" (David Hildenbrand) Remove this function from the tree, replacing it with folio_mapped() - "mm/damon: let DAMON be paused and resumed" (SeongJae Park) Allow DAMON to be paused and resumed without losing its current state - "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad Usama Anjum) Simplify and speed up kasan by removing its ineffective tagging of stacks and page tables - "mm/damon/reclaim,lru_sort: monitor all system rams by default" (SeongJae Park) Simplify deployment on diverse hardware like NUMA systems by updating DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the physical address range covering all System RAM areas by default, replacing the overly restrictive behavior that only targeted the single largest memory block to save on negligible overhead - "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae Park) Update some DAMON docs - "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin) Switch zone->lock handling over to using the guard() mechanisms - "mm/filemap: tighten mmap_miss hit accounting" (fujunjie) Fix a flaw where the mmap_miss counter over-credited page cache hits during fault-arounds and page-fault retries. This results in significant reduction of redundant synchronous mmap readahead I/O, drastically cutting down execution time and gigabytes read for sparse random or strided memory access workloads - "selftests/cgroup: Fix false positive failures in test_percpu_basic" (Li Wang) Fix a couple of false-positives in the cgroup kmem selftests - "mm/damon/reclaim: support monitoring intervals auto-tuning" (SeongJae Park) Add a new parameter to DAMON permitting DAMON_RECLAIM to automatically tune DAMON's sampling and aggregation intervals - "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park) Change DAMON_STAT to provide the pid of its kdamond - "mm/kmemleak: dedupe verbose scan output" (Breno Leitao) Remove large amounts of duplicated backtraces from the verbose-mode kmemleak output - "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David Hildenbrand) Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to removing it entirely in a later series - "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan) Prevent users from passing a non-power-of-2 value of `addr_unit', as this later results in undesirable behavior - "mm: document read_pages and simplify usage" (Frederick Mayle) - "tools/mm/page-types: Fix misc bugs" (Ye Liu) Fix three issues in tools/mm/page-types.c - "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman) Implement several cleanups in the page allocator and related code - "mm, swap: swap table phase IV: unify allocation" (Kairui Song) Unify the allocation and charging of anon and shmem swap in folios, provides better synchronization, consolidates the metadata management, hence dropping the static array and map, and improves performance - "mm/damon: introduce data attributes monitoring" (SeongJae Park( Extend DAMON to monitor general data attributes other than accesses - "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra) Implement the TODO in vrealloc() to unmap and free unused pages when shrinking across a page boundary - "mm/damon: documentation and comment fixes" (niecheng) - "remove mmap_action success, error hooks" (Lorenzo Stoakes) Eliminate custom hooks from mmap_action by removing the problematic success_hook which allowed drivers to improperly access uninitialized VMAs. It replaces the error_hook with a simple error-code field and updates the memory char driver accordingly - "mm/damon: minor improvements for code readability and tests" (SeongJae Park) - "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym Shcherba) - "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike Rapoport) - "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and others) Clean up and slightly improves MGLRU's reclaim loop and dirty writeback handling. Large performance improvements are measured - "use vma locks for proc/pid/{smaps\|numa_maps} reads" (Suren Baghdasaryan) Use per-vma locks when reading /proc/pid/smaps and numa_maps similar to reduce contention on central mmap_lock - "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()" (Ran Xiaokai) Some cleanup work in the THP code - "selftests/memfd: fix compilation warnings" (Konstantin Khorenko) Fix a few build glitches in the memfd selftest code. - "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel Butt) Resolve a 68% performance regression caused by NUMA-node cache thrashing around struct obj_stock_pcp by shrinking its existing fields and expanding it into a multi-slot array that caches up to five obj_cgroup pointers per CPU, allowing per-node variants of the same memcg to coexist within a single 64-byte cache line. - "zram: writeback fixes" (Sergey Senozhatsky) address a couple of unrelated zram writeback issues - "mm: switch THP shrinker to list_lru" (Johannes Weiner) Resolve NUMA-awareness issues and streamlines callsite interaction by refactoring and extending the list_lru API to completely replace the complex, open-coded deferred split queue for Transparent Huge Pages - "mm: improve large folio readahead for exec memory" (Usama Arif) Improve large-folio readahead on systems like 64K-page arm64 by preventing the mmap_miss check from permanently disabling target-oriented VM_EXEC readahead, and by generalizing the force_thp_readahead gate to support mappings with any usefully large maximum folio order under the cache cap. - "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau) Fix a bunch of minor issues in the userfaultfd/pagemap, all of which were flagged by Sashiko review of proposed new material - "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and vmemmap_check_pmd()" (Muchun Song) Provide generic versions of these two functions so the four arch-specific implementations can be removed. - "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device" (Youngjun Park) Address a uswsusp-vs-swapoff race and reduces the swap device reference taking/releasing frequency. - "mm/hmm: A fix and a selftest" (Dev Jain) * tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries fs/proc/task_mmu: do not warn on seeing non-migration pmd entry lib/test_hmm: check alloc_page_vma() return value and handle OOM mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX mm/swap: remove redundant swap device reference in alloc/free mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device mm/filemap: use folio_next_index() for start vmalloc: fix NULL pointer dereference in is_vm_area_hugepages() sparc/mm: drop vmemmap_check_pmd helper and use generic code loongarch/mm: drop vmemmap_check_pmd helper and use generic code riscv/mm: drop vmemmap_pmd helpers and use generic code arm64/mm: drop vmemmap_pmd helpers and use generic code mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd() rust: page: mark Page::nid as inline userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks userfaultfd: gate must_wait writability check on pte_present() mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole() fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry() fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race ...	2026-06-19 10:14:34 -07:00
Linus Torvalds	a53fcff8fc	Updates for the NOHZ subsystem: - Fix a long standing TOCTOU in get_cpu_sleep_time_us() - Make the CPU offline NOHZ handling more robust by disabling NOHZ on the outgoing CPU early instead of creating unneeded state which needs to be undone. - Unify idle CPU time accounting instead of having two different accounting mechanisms. These two different mechanisms are not really independent, but the different properties can in the worst case cause that gloabl idle time can be observed going backwards. - Consolidate the idle/iowait time retrieval interfaces instead of converting back and forth between them. - Make idle interrupt time accounting more robust. The original code assumes that interrupt time accouting is enabled and therefore stops elapsing idle time while an interrupt is handled in NOHZ dyntick state. That assumption is not correct as interrupt time accounting can be disabled at compile and runtime. - Fix an accounting error between dyntick idle time and dyntick idle steal time. The stolen time is not accounted and therefore idle time becomes inaccurate. The stolen time is now accounted after the fact as there is no way to predict the steal time upfront. -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmotm7YQHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoWEOEADG/RzW9l7fwUcsnubRyHER7GoIEZGP/VO/ nLQnI+M1LwdVyK8Oq8WntSnqOGevVdUKQlLFYMXdwovS8TWCi7PWcQeMIsfJnwT/ 6ugwo4E3mSBcseMjN8eHkEYH+1YmN6JSYQ+5eXT0JUXUxJlMFgtv3ZnTbQOF6Y3k Nkz1THojzdgTMn6WU/01AOXDR8Nhb2gOOQLDF/1ItZWnsDhbrDE2l99OaAELfeIo 8BZFQBPYRfuR8HaGgd8m2OPGnbw+cXMHUrTscMMQvmYv2wJVcNMB4AmVaUmGMRTR c3xs8QG2nUmIm72ENfe2pPDPyxy9JUJ54ro6/rLtcNQZ2wVCznhNahm6wRUa8f/0 7gnQE5nS2SNtdpL5StDVzk2+AZl6SrU8+ss51f1owPcNwKs+tt9XE+GVyL5wZpC8 IeD2SzVQLIIc0+ZvIkt9n76lLJkiLZkmt6UAnvhiz5NNn8XQlgC/4Uadk9CwC5vx t6swpE06D7vx7lm+ycfYUl6VPje5g/YvkkDFJqJulIlEDNpvb85+W6lMTyO7rwVl ln5TzOGPp7lJDwu6dAfg0QtHcUmUIFq05FuOIRNC4ZNxkxL2XDMzoe5IHxcWzU9O CHFGBUNZf+30vxvKeh+rkjZalt7/vgQaxb5somGmqv8g/6SJtJyB2qlq+gqH/+lQ MeG8tFGtVQ== =+cVw -----END PGP SIGNATURE----- Merge tag 'timers-nohz-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip Pull NOHZ updates from Thomas Gleixner: - Fix a long standing TOCTOU in get_cpu_sleep_time_us() - Make the CPU offline NOHZ handling more robust by disabling NOHZ on the outgoing CPU early instead of creating unneeded state which needs to be undone. - Unify idle CPU time accounting instead of having two different accounting mechanisms. These two different mechanisms are not really independent, but the different properties can in the worst case cause that gloabl idle time can be observed going backwards. - Consolidate the idle/iowait time retrieval interfaces instead of converting back and forth between them. - Make idle interrupt time accounting more robust. The original code assumes that interrupt time accouting is enabled and therefore stops elapsing idle time while an interrupt is handled in NOHZ dyntick state. That assumption is not correct as interrupt time accounting can be disabled at compile and runtime. - Fix an accounting error between dyntick idle time and dyntick idle steal time. The stolen time is not accounted and therefore idle time becomes inaccurate. The stolen time is now accounted after the fact as there is no way to predict the steal time upfront. * tag 'timers-nohz-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: sched/cputime: Handle dyntick-idle steal time correctly sched/cputime: Handle idle irqtime gracefully sched/cputime: Provide get_cpu_[idle\|iowait]_time_us() off-case tick/sched: Consolidate idle time fetching APIs tick/sched: Account tickless idle cputime only when tick is stopped tick/sched: Remove unused fields tick/sched: Move dyntick-idle cputime accounting to cputime code tick/sched: Remove nohz disabled special case in cputime fetch tick/sched: Unify idle cputime accounting s390/time: Prepare to stop elapsing in dynticks-idle powerpc/time: Prepare to stop elapsing in dynticks-idle sched/cputime: Correctly support generic vtime idle time sched/cputime: Remove superfluous and error prone kcpustat_field() parameter sched/idle: Handle offlining first in idle loop tick/sched: Fix TOCTOU in nohz idle time fetch	2026-06-15 13:48:52 +05:30
Linus Torvalds	13e1a6d6a1	Interrupt core code changes: - Rework of /proc/interrupt handling: /proc/interrupts was subject to micro optimizations for a long time, but most of the low hanging fruit was left on the table. This rework addresses the major time consuming issues: - Printing a long series of zeros one by one via a format string instead of counting subsequent zeros and emitting a string constant. - Simplify and cache the conditions whether interrupts should be printed - Use a proper iteration over the interrupt descriptor xarray instead of walking and testing one by one. - Provide helper functions for the architecture code to emit the architecture specific counters - Convert the counter structure in x86 to an array, which simplifies the output and add mechanisms to suppress unused architecture interrupts, which just occupy space for nothing. Adopt the new core mechanisms. This adjusts the gdb scripts related to interrupt counter statitics to work with the new mechanisms. - Prevent a string overflow in the /proc/irq/$N/ directory name creation code. -----BEGIN PGP SIGNATURE----- iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmoths8QHHRnbHhAa2Vy bmVsLm9yZwAKCRCmGPVMDXSYoSoMEACRODwHjNfULjgD2heHbiPKsmPMRZvwO1Ud xu5XAoNT1gwxnLo4D+KrGCZeyxka+byRpby6eNg7HdRJuu3DUf8umwt/Q472I9a9 ck8OGFp8ntbxnueISKfzxY/O2eXHYxSKmmfZMv3wdOKbvn5OUlFT6eHPjb8PzVUM 7DiXsBL8s3MNHwdJ3grG5lBh60pt5fujzURwYAqvh/i8jlDHxsFRTMGuhR710knr YZrgZ4/7ffnEbDsn98xezPewRomIbhhEijgfjkkbnYYUub6Y2RHJqOzZhlp6zNgi vTsU/suW3ryVuzG34rL2uHvsxOcJY1HNA+ING7fkRmPuKxRGKOMBQfPmLQcWqP69 GxwGIlBvNbAEYievgTCS7GNHTy3t0JbxTGhHcBvX3oMtnnOSTttqH9XzvrTwGxjj fMUykfvB+40Fp47D+t0JDhgyNNEkixSBjW8/gogZFQ0OdMFX6BQZNT/DLhMMC0LR JbqMpfsffp5+gYam/wixv3sPlxajMpQ2w8ocgyUHVAeFMo1LOY1spUuO3+Tq7nSj xt95xVg6HQDr+L+8QmZmnRq27uG276CxPpLotbPMsrn0Ax5PL+fymfmVsFmJFjAR ZHKK3tSD6M94GtklfKlB/yBJGNRafH4MVZbMa0iUxGI6UyAFr/Yror3mfDK9NsIA WTwwaqI8qw== =z6vj -----END PGP SIGNATURE----- Merge tag 'irq-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip Pull interrupt core updates from Thomas Gleixner: - Rework of /proc/interrupt handling: /proc/interrupts was subject to micro optimizations for a long time, but most of the low hanging fruit was left on the table. This rework addresses the major time consuming issues: - Printing a long series of zeros one by one via a format string instead of counting subsequent zeros and emitting a string constant. - Simplify and cache the conditions whether interrupts should be printed - Use a proper iteration over the interrupt descriptor xarray instead of walking and testing one by one. - Provide helper functions for the architecture code to emit the architecture specific counters - Convert the counter structure in x86 to an array, which simplifies the output and add mechanisms to suppress unused architecture interrupts, which just occupy space for nothing. Adopt the new core mechanisms. This adjusts the gdb scripts related to interrupt counter statistics to work with the new mechanisms. - Prevent a string overflow in the /proc/irq/$N/ directory name creation code. * tag 'irq-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: x86/irq: Add missing 's' back to thermal event printout genirq/proc: Speed up /proc/interrupts iteration genirq/proc: Runtime size the chip name genirq: Expose irq_find_desc_at_or_after() in core code genirq: Add rcuref count to struct irq_desc genirq/proc: Increase default interrupt number precision to four genirq: Calculate precision only when required genirq: Cache the condition for /proc/interrupts exposure genirq/manage: Make NMI cleanup RT safe genirq: Expose nr_irqs in core code scripts/gdb: Update x86 interrupts to the array based storage x86/irq: Move IOAPIC misrouted and PIC/APIC error counts into irq_stats x86/irq: Suppress unlikely interrupt stats by default x86/irq: Make irqstats array based genirq/proc: Utilize irq_desc::tot_count to avoid evaluation genirq/proc: Avoid formatting zero counts in /proc/interrupts x86/irq: Optimize interrupts decimals printing genirq/proc: Size interrupt directory names for 10-digit interrupt numbers	2026-06-15 13:19:41 +05:30
Linus Torvalds	e8a56d6fc8	dentry memory safety stuff * d_alloc_parallel() API change (Neil's with my changes). * NORCU fixes. * Reorganization and simplification of dentry eviction logics. * Simplifying rcu_read_lock() scopes in fs/dcache.c. * Secondary roots work - getting rid of NFS fake root dentries and dealing with remaining shrink_dcache_for_umount()/shrink_dentry_list() races. * making cursors NORCU (surprisingly easy) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCai8ppAAKCRBZ7Krx/gZQ 67xXAP48lSBS6WyiTztk/ZD/0WFO+EemnxEDXZ0s55ejQ1/uswEApWNMaxBYkzxB 0lgAjKPojjtqYovttP/AHklomSLzMA4= =llw0 -----END PGP SIGNATURE----- Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: - d_alloc_parallel() API change (Neil's with my changes) - NORCU fixes - Reorganization and simplification of dentry eviction logic - Simplifying rcu_read_lock() scopes in fs/dcache.c - Secondary roots work - getting rid of NFS fake root dentries and dealing with remaining shrink_dcache_for_umount() and shrink_dentry_list() races - making cursors NORCU (surprisingly easy) * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (22 commits) make cursors NORCU nfs: get rid of fake root dentries wind ->s_roots via ->d_sib instead of ->d_hash shrink_dentry_tree(): unify the calls of shrink_dentry_list() shrinking rcu_read_lock() scope in d_alloc_parallel() d_walk(): shrink rcu_read_lock() scope document dentry_kill() adjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill() Document rcu_read_lock() use in select_collect2() Shift rcu_read_{,un}lock() inside fast_dput() simplify safety for lock_for_kill() slowpath fold lock_for_kill() and __dentry_kill() into common helper fold lock_for_kill() into shrink_kill() shrink_dentry_list(): start with removing from shrink list d_prune_aliases(): make sure to skip NORCU aliases kill d_dispose_if_unused() make to_shrink_list() return whether it has moved dentry to list select_collect(): ignore dentries on shrink lists if they have positive refcounts find_acceptable_alias(): skip NORCU aliases with zero refcount fix a race between d_find_any_alias() and final dput() of NORCU dentries ...	2026-06-15 04:15:31 +05:30
Linus Torvalds	79169a1624	vfs-7.2-rc1.procfs Please consider pulling these changes from the signed vfs-7.2-rc1.procfs tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaiwLKgAKCRCRxhvAZXjc otp1AQD6ONFiqQZJhnHbVg1MJlA8iP80x0qlil0OKGUMOby4NAD+MZymcLjtP+4O T1b9Uca2/qz+L1F5o7vJAD3OAd4BzQA= =/KGF -----END PGP SIGNATURE----- Merge tag 'vfs-7.2-rc1.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull procfs updates from Christian Brauner: - Revamp fs/filesystems.c The file was a mess with a hand-rolled linked list in desperate need of a cleanup. The filesystems list is now RCU-ified, /proc files can be marked permanent from outside fs/proc/, and the string emitted when reading /proc/filesystems is pre-generated and cached instead of pointer-chasing and printfing entry by entry on every read. The file is read frequently because libselinux reads it and is linked into numerous frequently used programs (even ones you would not suspect, like sed!). Scalability also improves since reference maintenance on open/close is bypassed. open+read+close cycle single-threaded (ops/s): before: 442732 after: 1063462 (+140%) open+read+close cycle with 20 processes (ops/s): before: 606177 after: `3300576` (+444%) A follow-up patch adds missing unlocks in some corner cases and tidies things up. - Relax the mount visibility check for subset=pid mounts When procfs is mounted with subset=pid, all static files become unavailable and only the dynamic pid information is accessible. In that case there is no point in imposing the full mount visibility restrictions on the mounter - everything that can be hidden in procfs is already inaccessible. These restrictions prevented procfs from being mounted inside rootless containers since almost all container implementations overmount parts of procfs to hide certain directories. As part of this /proc/self/net is only shown in subset=pid mounts for CAP_NET_ADMIN, reconfiguring subset=pid is rejected, the SB_I_USERNS_VISIBLE superblock flag is replaced with an FS_USERNS_MOUNT_RESTRICTED filesystem flag, fully visible mounts are recorded in a list, and the mount restrictions are finally documented. - Protect ptrace_may_access() with exec_update_lock in procfs Most uses of ptrace_may_access() in procfs should hold exec_update_lock to avoid TOCTOU issues with concurrent privileged execve() (like setuid binary execution). This fixes the easy cases - the owner and visibility checks and the FD link permission checks - with the gnarlier ones to follow later. * tag 'vfs-7.2-rc1.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: fix ups and tidy ups to /proc/filesystems caching proc: protect ptrace_may_access() with exec_update_lock (FD links) proc: protect ptrace_may_access() with exec_update_lock (part 1) docs: proc: add documentation about mount restrictions proc: handle subset=pid separately in userns visibility checks proc: prevent reconfiguring subset=pid proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN fs: cache the string generated by reading /proc/filesystems sysfs: remove trivial sysfs_get_tree() wrapper fs: RCU-ify filesystems list fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED proc: allow to mark /proc files permanent outside of fs/proc/ namespace: record fully visible mounts in list	2026-06-15 04:07:58 +05:30
Linus Torvalds	7e0e7bd60d	vfs-7.2-rc1.misc Please consider pulling these changes from the signed vfs-7.2-rc1.misc tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaiwLKgAKCRCRxhvAZXjc ou/zAP9SOUE6n58i0BdhLYw0RA9Ge5tz42e4inSFi4tkgfCrDwEAlxbRHDcMyhWB dHmx8OW6b5riMrW+lGPMH58RMoTkdQ4= =3vuc -----END PGP SIGNATURE----- Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Reduce pipe->mutex contention by pre-allocating pages outside the lock in anon_pipe_write(). anon_pipe_write() called alloc_page() once per page while holding pipe->mutex. The allocation can sleep doing direct reclaim and runs memcg charging, which extends the critical section and stalls any concurrent reader on the same mutex. Now up to 8 pages are pre-allocated before the mutex is taken, leftovers are recycled into the per-pipe tmp_page[] cache before unlock, and any remainder is released after unlock, keeping the allocator out of the critical section on both sides. On a writers x readers sweep with 64KB writes against a 1 MB pipe throughput improves 6-28% and average write latency drops 5-22%; under memory pressure - when the cost of holding the mutex across reclaim is highest - throughput improves 21-48% and latency drops 17-33%. The microbenchmark is added to selftests. - uaccess/sockptr: fix the ignored_trailing logic in copy_struct_to_user() to behave as documented and the usize check in copy_struct_from_sockptr() for user pointers, and add copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr() helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC). - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real inode backing a dentry via d_real_inode(). On overlayfs the inode attached to the dentry doesn't carry the underlying device information; this is used by the filesystem restriction BPF program that was merged into systemd. - docs: add guidelines for submitting new filesystems, motivated by the maintenance burden abandoned and untestable filesystems impose on VFS developers, blocking infrastructure work like folio conversions and iomap migration. Fixes: - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() and drop the now-redundant assignments in callers. This began as a one-line dma-buf fix for a path_noexec() warning; a pseudo filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo() callers were audited: the only visible effect is on dma-buf where SB_I_NOEXEC silences the warning. - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs, qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a device with a sector size > PAGE_SIZE crashed roughly half of them; the rest had the same missing error handling pattern. Plus a follow-up releasing the superblock buffer_head when setting the minix v3 block size fails. - mount: honour SB_NOUSER in the new mount API. - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by switching the process-group paths of send_sigio() and send_sigurg() from read_lock(&tasklist_lock) to RCU, matching the single-PID path. - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing delegated NFS mounts (fsopen() in a container with the mount performed by a privileged daemon) that broke when non-init s_user_ns was tied to FS_USERNS_MOUNT. - selftests/namespaces: fix a hang in nsid_test where an unreaped grandchild kept the TAP pipe write-end open, a waitpid(-1) race in listns_efault_test, and a false FAIL on kernels without listns() where the tests should SKIP. - filelock: fix the break_lease() stub signature for CONFIG_FILE_LOCKING=n. - init/initramfs_test: wait for the async initramfs unpacking before running; the test and do_populate_rootfs() share the parser state. - fs/coredump: reduce redundant log noise in validate_coredump_safety(). - iomap: pass the correct length to fserror_report_io() in __iomap_write_begin(). - backing-file: fix the backing_file_open() kerneldoc. Cleanups: - initramfs: refactor the cpio hex header parsing to use hex2bin() instead of the hand-rolled simple_strntoul() which is reverted, and extend the initramfs KUnit tests to cover header fields with 0x prefixes. - Replace __get_free_pages() and friends with kmalloc()/kzalloc() across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2, isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the do_mounts init code - part of the larger work of replacing page allocator calls with kmalloc(). - Use clear_and_wake_up_bit() in unlock_buffer() and journal_end_buffer_io_sync() instead of open-coding the sequence. - Drop unused VFS exports: unexport drop_super_exclusive(), remove start_removing_user_path_at(), and fold __start_removing_path() into start_removing_path(). - fs/read_write: narrow the __kernel_write() export with EXPORT_SYMBOL_FOR_MODULES(). - vfs: uapi: retire octal and hex constants in favor of (1 << n) for the O_ flags. Finding a free bit for a new flag across the architectures was needlessly hard with the mixed bases. - dcache: add extra sanity checks of dead dentries in dentry_free() via a new DENTRY_WARN_ONCE() that also prints d_flags. - iov_iter: use kmemdup_array() in dup_iter() to harden the allocation against multiplication overflow. - fs/pipe: write to ->poll_usage only once. - vfs: remove an always-taken if-branch in find_next_fd(). - dcache: use kmalloc_flex() for struct external_name in __d_alloc(). - namei: use QSTR() instead of QSTR_INIT() in path_pts(). - sync_file_range: delete dead S_ISLNK code. - Comment fixes: retire a stale comment in fget_task_next() and fix assorted spelling mistakes" * tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits) backing-file: fix backing_file_open() kerneldoc parameter iomap: pass the correct len to fserror_report_io in __iomap_write_begin vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags bpf: add bpf_real_inode() kfunc fs/read_write: Do not export __kernel_write() to the entire world libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() mount: honour SB_NOUSER in the new mount API fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling selftests/pipe: add pipe_bench microbenchmark fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write fs: retire stale comment in fget_task_next() fs: fix spelling mistakes in comment bfs: replace get_zeroed_page() with kzalloc() binfmt_misc: replace __get_free_page() with kmalloc() configfs: replace __get_free_pages() with kzalloc() fs/namespace: use __getname() to allocate mntpath buffer fs/select: replace __get_free_page() with kmalloc() ...	2026-06-15 03:59:45 +05:30
Dev Jain	cd1fc0e3c1	fs/proc/task_mmu: do not warn on seeing non-migration pmd entry Patch series "mm/hmm: A fix and a selftest", v3. Patch 1 fixes a stale warning present from the time when only migration softleaf entries were supported at the PMD level. Patch 2 adds some code into hmm-tests.c which exercises the pagemap path for PMD device-private entries. This patch (of 2): pagemap_pmd_range_thp() warns if a non-present PMD is not a migration entry. This became false once device-private entries at the PMD level were added. Therefore, remove the stale migration-only assertion. Link: https://lore.kernel.org/20260604055308.1947679-1-dev.jain@arm.com Link: https://lore.kernel.org/20260604055308.1947679-2-dev.jain@arm.com Fixes: `a30b48bf1b` ("mm/migrate_device: implement THP migration of zone device pages") Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Tested-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-06-08 18:21:32 -07:00
Kiryl Shutsemau (Meta)	e92d92bbaf	fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole() A PAGEMAP_SCAN ioctl requesting PM_SCAN_WP_MATCHING on a hugetlb VMA hangs the calling thread, unkillably, as soon as the scan reaches an unpopulated part of the range: do_pagemap_scan() walk_page_range() walk_hugetlb_range() hugetlb_vma_lock_read() # take the vma lock for read ... pagemap_scan_pte_hole() # ... ->pte_hole() for a hole uffd_wp_range() change_protection() hugetlb_change_protection() hugetlb_vma_lock_write() # ... and block taking it for write walk_hugetlb_range() holds the hugetlb vma lock for read across the whole walk. A present entry goes to ->hugetlb_entry(); an unpopulated one goes to ->pte_hole(), i.e. pagemap_scan_pte_hole(). To write-protect the hole that handler calls uffd_wp_range(), which on a hugetlb VMA reaches hugetlb_change_protection() and takes the same vma lock for write. The thread then blocks in down_write() waiting for the read lock it is itself holding. The populated path avoids this: pagemap_scan_hugetlb_entry() write-protects the entry inline under the page-table lock and never enters hugetlb_change_protection(). Do the same for holes. Fault in the page table and install the uffd-wp marker directly with make_uffd_wp_huge_pte() under the page-table lock, rather than routing through uffd_wp_range(). That is the same sequence hugetlb_change_protection() runs for an unpopulated entry, minus the vma write lock -- which is safe to skip because PMD sharing is disabled on uffd-wp VMAs (hugetlb_unshare_all_pmds() runs at registration), leaving nothing for that lock to serialise against. Link: https://lore.kernel.org/20260529172331.356655-4-kas@kernel.org Fixes: `52526ca7fd` ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reported-by: Sashiko AI review <sashiko-bot@kernel.org> Assisted-by: Claude:claude-opus-4-8 Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Balbir Singh <balbirs@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-06-08 18:21:28 -07:00
Kiryl Shutsemau (Meta)	1b074e3270	fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry() The partial-page check compares against HPAGE_SIZE (PMD_SIZE), which is wrong for gigantic hugetlb hstates (e.g. 1G). The walker hands the callback a huge_page_size()-sized range, never start + HPAGE_SIZE, so the comparison always declares it partial and aborts the WP. Compare against the actual hstate's page size. Link: https://lore.kernel.org/20260529172331.356655-3-kas@kernel.org Fixes: `52526ca7fd` ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reported-by: Sashiko AI review <sashiko-bot@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Balbir Singh <balbirs@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-06-08 18:21:28 -07:00
Kiryl Shutsemau (Meta)	04718f7c92	fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race Patch series "userfaultfd/pagemap: pre-existing fixes". These are pre-existing bug fixes that were carried at the front of the userfaultfd RWP working-set-tracking series up to v5 [1]. Per review feedback that fixes should not sit in the middle of a feature series, they are split out and sent on their own; the RWP series is reposted rebased on top of this. All six were flagged by the Sashiko AI review of the RWP series and carry Reported-by: Sashiko AI review <sashiko-bot@kernel.org>. They are independent of RWP, apply to mm-new directly, and carry Cc: stable@. 1: fs/proc/task_mmu: a missing huge_ptep_modify_prot_start() in make_uffd_wp_huge_pte() can lose hardware Dirty/Accessed updates when PAGEMAP_SCAN write-protects a hugetlb PTE. 2: fs/proc/task_mmu: pagemap_scan_hugetlb_entry() compares the range against HPAGE_SIZE rather than the hstate page size, so it never write-protects gigantic hugetlb pages. 3: fs/proc/task_mmu: PAGEMAP_SCAN with PM_SCAN_WP_MATCHING over an unpopulated hugetlb range self-deadlocks -- pagemap_scan_pte_hole() calls uffd_wp_range() while walk_hugetlb_range() holds the hugetlb vma lock for read, and hugetlb_change_protection() then takes it for write. Install the marker inline instead. 4: mm/huge_memory: change_non_present_huge_pmd() drops pmd_swp_uffd_wp on a device-private PMD permission downgrade, silently losing the uffd-wp marker. 5: userfaultfd: must_wait() applies pte_write() to a locklessly read PTE without checking pte_present(), so swap/migration entries decode random offset bits and a thread can stay parked on a stale fault. 6: userfaultfd: __VMA_UFFD_FLAGS feeds VMA_UFFD_MINOR_BIT (41) to mk_vma_flags() unconditionally, an out-of-bounds write into the single-word vma_flags_t on 32-bit. Build the mask from config-gated per-mode masks so an unavailable bit is never materialised. This patch (of 6): make_uffd_wp_huge_pte() arms the UFFD_WP bit on a present HugeTLB PTE by calling huge_ptep_modify_prot_commit() with a ptent snapshot that was fetched without the corresponding huge_ptep_modify_prot_start(). The start helper is what atomically clears the entry so the kernel-owned snapshot stays consistent until the commit; without it, the hardware may set Dirty or Accessed in the live PTE between the original read and the commit, and huge_ptep_modify_prot_commit() (whose generic implementation just calls set_huge_pte_at()) then writes the stale snapshot back over the live hardware bits, losing the update. The non-hugetlb sibling make_uffd_wp_pte() does this correctly via ptep_modify_prot_start() / ptep_modify_prot_commit(). Mirror that pattern for the present-PTE branch. The migration case stays as-is -- migration entries are non-present, so there's no hardware update to race against. Link: https://lore.kernel.org/20260529172331.356655-1-kas@kernel.org Link: https://lore.kernel.org/20260529172331.356655-2-kas@kernel.org Link: https://lore.kernel.org/all/20260526130509.2748441-1-kirill@shutemov.name/ [1] Fixes: `52526ca7fd` ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Reported-by: Sashiko AI review <sashiko-bot@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Balbir Singh <balbirs@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-06-08 18:21:28 -07:00
Jann Horn	6255da28d4	proc: protect ptrace_may_access() with exec_update_lock (FD links) proc_pid_get_link() and proc_pid_readlink() currently look up the task from the pid once, then do the ptrace access check on that task, then look up the task from the pid a second time to do the actual access. That's racy in several ways. To fix it, pass the task to the ->proc_get_link() handler, and instead of proc_fd_access_allowed(), introduce a new helper call_proc_get_link() that looks up and locks the task, does the access check, and calls ->proc_get_link(). Fixes: `778c114477` ("[PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-2-5c3d20e0ac33@google.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>	2026-06-05 10:00:55 +02:00
Jann Horn	6650527444	proc: protect ptrace_may_access() with exec_update_lock (part 1) Fix the easy cases where procfs currently calls ptrace_may_access() without exec_update_lock protection, where the fix is to simply add the extra lock or use mm_access(): - do_task_stat(): grab exec_update_lock - proc_pid_wchan(): grab exec_update_lock - proc_map_files_lookup(): use mm_access() instead of get_task_mm() - proc_map_files_readdir(): use mm_access() instead of get_task_mm() - proc_ns_get_link(): grab exec_update_lock - proc_ns_readlink(): grab exec_update_lock Fixes: `f83ce3e6b0` ("proc: avoid information leaks to non-privileged processes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260518-procfs-lockfix-part1-v1-1-5c3d20e0ac33@google.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>	2026-06-05 10:00:55 +02:00
NeilBrown	0e0e490f5d	VFS: use wait_var_event for waiting in d_alloc_parallel() Parallel lookup starts with a call of d_alloc_parallel(). That primitive either returns a matching hashed dentry or allocates a new one in the in-lookup state and returns it to the caller. Once the caller is done with lookup, it indicates so either by call of d_{splice_alias,add}() or by call of d_done_lookup(); at that point dentry leaves the in-lookup state. If d_alloc_parallel() finds a matching in-lookup dentry, it must wait for that dentry to leave the in-lookup state, one way or another. Currently by supplying wait_queue_head to d_alloc_parallel(). If d_alloc_parallel() creates a new in-lookup dentry, the address of that wait_queue_head is stored in ->d_wait of new dentry and stays there while it's in the in-lookup; subsequent d_alloc_parallel() will wait on the queue found in the matching in-lookup dentry. Transition out of in-lookup state wakes waiters on that queue (if any). That works, but the calling conventions are inconvenient - the caller must supply wait_queue_head and make sure that it survives at least until the new in-lookup dentry leaves the in-lookup state. That amounts to boilerplate in the d_alloc_parallel() callers that are followed by a call of d_lookup_done() in the same function; in cases like nfs asynchronous unlink it gets worse than that. This patch changes d_alloc_parallel() to use wake_up_var_locked() to wake up waiters, and wait_var_event_spinlock() to wait. dentry->d_lock is used for synchronisation as it is already held and the relevant times. That eliminates the need of caller-supplied wait_queue_head, simplifying the calling conventions. Better yet, we only need one bit of information stored in dentry itself: whether there are any waiters to be woken up, and that can be easily stored in ->d_flags; ->d_wait goes away. The reason we need that bit (DCACHE_LOOKUP_WAITERS) is that with wait_var machinery the queues are shared with all kinds of stuff and there's no way tell if any of the waiters have anything to do with our dentry; most of the time none of them will be relevant, so we need to avoid the pointless wakeups. Another benefit of the new scheme comes from the fact that wakeups have to be done outside of write-side critical areas of ->i_dir_seq; with the old scheme we need to carry the value picked from ->d_wait from __d_lookup_unhash() to the place where we actually wake the waiters up. Now we can just leave DCACHE_LOOKUP_WAITERS in ->d_flags until we get to doing wakeups - that's done within the same ->d_lock scope, so we are fine; new bit is accessed only under ->d_lock and it's seen only on dentries with DCACHE_PAR_LOOKUP in ->d_flags. __d_lookup_unhash() no longer needs to re-init ->d_lru. That was previously shared (in a union) with ->d_wait but ->d_wait is now gone so it no longer corrupts ->d_lru. Co-developed-by: Al Viro <viro@zeniv.linux.org.uk> # saner handling of flags Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2026-06-05 00:34:54 -04:00
Suren Baghdasaryan	ea3085dd7a	fs/proc/task_mmu: read proc/pid/{smaps\|numa_maps} under per-vma lock Patch series "use vma locks for proc/pid/{smaps\|numa_maps} reads", v2. Use per-vma locks when reading /proc/pid/smaps and /proc/pid/numa_maps similar to /proc/pid/maps to reduce contention on central mmap_lock. One major difference between maps and smaps/numa_maps reading is that the latter executes page table walk which can't be done under RCU due to a possibility of sleeping. Therefore we drop RCU read lock before this walk while keeping the VMA locked. After the walk we retake RCU read lock, reset VMA iterator and proceed with the next VMA. The last two patches extend /proc/pid/maps test to cover /proc/pid/smaps reading during concurrent address space modification. This patch (of 3): proc/pid/{smaps\|numa_maps} can be read using the combination of RCU and VMA read locks, similar to proc/pid/maps. RCU is required to safely traverse the VMA tree and VMA lock stabilizes the VMA being processed and the pagetable walk. Link: https://lore.kernel.org/20260426062718.1238437-1-surenb@google.com Link: https://lore.kernel.org/20260426062718.1238437-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <liam@infradead.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-06-04 14:45:05 -07:00
Frederic Weisbecker	127b2eb44f	tick/sched: Consolidate idle time fetching APIs Fetching the idle cputime is available through a variety of accessors all over the place depending on the different accounting flavours and needs: - idle vtime generic accounting can be accessed by kcpustat_field(), kcpustat_cpu_fetch(), get_idle/iowait_time() and get_cpu_idle/iowait_time_us() - dynticks-idle accounting can only be accessed by get_idle/iowait_time() or get_cpu_idle/iowait_time_us() - CONFIG_NO_HZ_COMMON=n idle accounting can be accessed by kcpustat_field() kcpustat_cpu_fetch(), or get_idle/iowait_time() but not by get_cpu_idle/iowait_time_us() Moreover get_idle/iowait_time() relies on get_cpu_idle/iowait_time_us() with a non-sensical conversion to microseconds and back to nanoseconds on the way. Start consolidating the APIs with removing get_idle/iowait_time() and make kcpustat_field() and kcpustat_cpu_fetch() work for all cases. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260508131647.43868-13-frederic@kernel.org	2026-06-02 21:27:26 +02:00
Thorsten Blum	de5d0652d0	proc: use strnlen() for name validation in __proc_create Replace strlen(fn) with strnlen(fn, NAME_MAX + 1) when validating the final path component in __proc_create(). This preserves the existing name limit while bounding the length scan to one byte past the maximum name length. Handle empty names separately, and treat names longer than NAME_MAX as too long. Link: https://lore.kernel.org/20260421122648.56723-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Thorsten Blum <thorsten.blum@linux.dev> Cc: wangzijie <wangzijie1@honor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-28 21:24:39 -07:00
Alexey Dobriyan	f7027ed76d	proc: rewrite next_tgid() * deduplicate "iter.tgid += 1" line, Right now it is done once inside next_tgid() itself and second time inside "for" loop. * deduplicate next_tgid() call itself with different loop style: auto it = make_tgid_iter(); while (next_tgid(&it)) { ... } gcc seems to inline it twice: $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux add/remove: 0/1 grow/shrink: 1/0 up/down: 100/-245 (-145) Function old new delta proc_pid_readdir 531 631 +100 next_tgid 245 - -245 * make tgid_iter.pid_ns const it never changes during readdir anyway [akpm@linux-foundation.org: remove newline] Link: https://lore.kernel.org/20260422191745.435556-2-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-28 21:24:39 -07:00
Alexey Dobriyan	9794de4500	proc: add tgid_iter.pid_ns member next_tgid() accepts pid namespace as an argument, but it never changes during readdir (which would be unthinkable thing to do anyway). Move it inside iterator type and hide from direct usage. Link: https://lore.kernel.org/20260422191745.435556-1-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-05-28 21:24:38 -07:00
Mike Rapoport (Microsoft)	cb7907e36c	proc: replace __get_free_page() with kmalloc() A few functions in fs/proc/base.c use __get_free_page() to allocate a temporary buffer. kmalloc() is a better API for such use and it also provides better scalability and more debugging possibilities. Replace use of __get_free_page() with kmalloc(). Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260523-b4-fs-v1-2-275e36a83f0e@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>	2026-05-27 15:12:22 +02:00
Thomas Gleixner	171cc0d9ee	genirq/proc: Speed up /proc/interrupts iteration Reading /proc/interrupts iterates over the interrupt number space one by one and looks up the descriptors one by one. That's just a waste of time. When CONFIG_GENERIC_IRQ_SHOW is enabled this can utilize the maple tree and cache the descriptor pointer efficiently for the sequence file operations. Implement a CONFIG_GENERIC_IRQ_SHOW specific version in the core code and leave the fs/proc/ variant for the legacy architectures which ignore generic code. This reduces the time wasted for looking up the next record significantly. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com> Link: https://patch.msgid.link/20260517194932.165280601@kernel.org	2026-05-26 16:21:15 +02:00
Thomas Gleixner	d6b70b16b4	x86/irq: Move IOAPIC misrouted and PIC/APIC error counts into irq_stats The special treatment of these counts is just adding extra code for no real value. The irq_stats mechanism allows to suppress output of counters, which should never happen by default and provides a mechanism to enable them for the rare case that they occur. Move the IOAPIC misrouted and the PIC/APIC error counts into irq_stats, mark them suppressed by default and update the sites which increment them. This changes the output format of 'ERR' and 'MIS' in case there are events to the regular per CPU display format and otherwise suppresses them completely. As a side effect this removes the arch_cpu_stat() mechanism from proc/stat which was only there to account for the error interrupts on x86 and missed to take the misrouted ones into account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Radu Rendec <radu@rendec.net> Link: https://patch.msgid.link/20260517194931.361942103@kernel.org	2026-05-26 16:21:12 +02:00
Christian Brauner (Amutable)	6b1c66c9cc	exec_state: relocate dumpable information The dumpable flag captured at execve() is consulted by __ptrace_may_access() and several /proc owner / visibility checks. It lives on mm_struct today, which exit_mm() clears from the task long before the task itself is reaped. exec_state is anchored to the execve() that established the current privilege domain. CLONE_VM siblings refcount-share the parent's exec_state via copy_exec_state(); non-CLONE_VM clones allocate a fresh exec_state inheriting the parent's dumpable mode and user_ns reference via task_exec_state_copy(). execve() allocates a fresh instance (via alloc_task_exec_state() in begin_new_exec()) and installs it under task_lock + exec_update_lock with task_exec_state_replace(). init_task uses a static instance. The dumpable mode now lives on task->exec_state->dumpable. task->mm->flags no longer carries dumpability; MMF_DUMPABLE_MASK is removed, but MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* bit positions remain stable for the /proc/<pid>/coredump_filter ABI. The task->user_dumpable cache bit and its assignment in exit_mm() are removed; readers go through get_dumpable(task) directly. coredump_params gains a snapshot field cprm.dumpable, populated from get_dumpable(current) at vfs_coredump() entry, replacing the previous __get_dumpable(cprm->mm_flags) consumers in fs/coredump.c and fs/pidfs.c. The user namespace recorded at execve() is consulted by __ptrace_may_access() and by /proc/PID/* owner derivation. Move the captured user_ns onto task_exec_state, which stays attached to the task past exit_mm() and across exit_files(). bprm grows a user_ns field staged in bprm_mm_init() with the caller's user_ns, narrowed by would_dump() to the closest privileged ancestor, and consumed by exec_mmap() via alloc_task_exec_state(bprm->user_ns). free_bprm() releases the staging reference. mm_struct loses ->user_ns entirely. Initializers in init-mm, efi_mm, and the implicit one in mm_init()/dup_mm()/mm_alloc() are removed; __mmdrop() drops the matching put_user_ns(). The kthread_use_mm() WARN_ON_ONCE(!mm->user_ns) is no longer meaningful and goes too. Reviewed-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260520-work-task_exec_state-v3-4-69f895bc1385@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>	2026-05-26 11:02:01 +02:00
Christian Brauner (Amutable)	4f365e7a5d	sched/coredump: introduce enum task_dumpable Replace the SUID_DUMP_DISABLE/USER/ROOT preprocessor constants with enum task_dumpable. Numeric values are preserved (kernel.suid_dumpable sysctl and prctl(PR_SET_DUMPABLE) ABI), so this is a pure rename with no behavioral change. Subsequent commits relocate dumpability onto a per-task structure where the enum type will allow stronger type-checking on the new API. Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: David Hildenbrand (arm) <david@kernel.org> Link: https://patch.msgid.link/20260520-work-task_exec_state-v3-1-69f895bc1385@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>	2026-05-26 11:02:01 +02:00
Alexey Gladkov	05dab768fc	proc: handle subset=pid separately in userns visibility checks When procfs is mounted with subset=pid, only the dynamic process-related part of the filesystem remains visible. That part cannot be hidden by overmounts, so checking whether an existing procfs mount is fully visible does not make sense for this mode. At the same time, a subset=pid procfs mount must not be used as evidence that a later procfs mount would not reveal additional information. It provides a restricted view of procfs, not the full filesystem view. Mark subset=pid procfs instances as restricted variants. Ignore restricted variants when looking for an already-visible mount, and allow new restricted variants without consulting mnt_already_visible(). Signed-off-by: Alexey Gladkov <legion@kernel.org> Link: https://patch.msgid.link/4d5e760c3d534dd2e05578d119cc408450053a98.1777278334.git.legion@kernel.org Reviewed-by: Aleksa Sarai <aleksa@amutable.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-11 23:13:02 +02:00
Alexey Gladkov	1991a8f693	proc: prevent reconfiguring subset=pid Changing subset=pid on an existing procfs instance is not safe. If a full procfs mount has entries hidden by overmounts, switching it to subset=pid would hide the top-level procfs entries from lookup and readdir while leaving the existing overmounts reachable. Reject attempts to change the subset=pid state during reconfigure before applying any other procfs mount options, so a failed reconfigure cannot partially update the instance. Signed-off-by: Alexey Gladkov <legion@kernel.org> Link: https://patch.msgid.link/13295f40f642af5d6e7038d681d43132ad80f7b2.1777278334.git.legion@kernel.org Reviewed-by: Aleksa Sarai <aleksa@amutable.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-11 23:13:02 +02:00
Alexey Gladkov	a2a5eb6323	proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Cache the mounters credentials and allow access to the net directories contingent of the permissions of the mounter of proc. Do not show /proc/self/net when proc is mounted with subset=pid option and the mounter does not have CAP_NET_ADMIN. To avoid inadvertently allowing access to /proc/<pid>/net, updating mounter credentials is not supported. Signed-off-by: Alexey Gladkov <legion@kernel.org> Link: https://patch.msgid.link/d2466fe9085367f1e24693c437ecb8cff2789660.1777278334.git.legion@kernel.org Reviewed-by: Aleksa Sarai <aleksa@amutable.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-11 23:13:01 +02:00
Christian Brauner	dc4edae7f4	fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED Whether a filesystem's mounts need to undergo a visibility check in user namespaces is a static property of the filesystem type, not a runtime property of each superblock instance. Both proc and sysfs always set SB_I_USERNS_VISIBLE on their superblocks unconditionally (sysfs does so on first creation, and subsequent mounts reuse the same superblock). Move this flag from sb->s_iflags (SB_I_USERNS_VISIBLE) to file_system_type->fs_flags (FS_USERNS_MOUNT_RESTRICTED) so the intent is expressed at the filesystem type level where it belongs. All check sites are updated to test sb->s_type->fs_flags instead of sb->s_iflags. The SB_I_NOEXEC and SB_I_NODEV flags remain on the superblock as they are runtime properties set during fill_super. Link: https://patch.msgid.link/72887c5b6204dc3adf5a53104f0be6bd8bc4f6cd.1777278334.git.legion@kernel.org Reviewed-by: Aleksa Sarai <aleksa@amutable.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-11 23:13:01 +02:00
Alexey Dobriyan	912289ee83	proc: allow to mark /proc files permanent outside of fs/proc/ Add proc_make_permanent() function to mark PDE as permanent to speed up open/read/close (one alloc/free and lock/unlock less). Enable it for built-in code and for compiled-in modules. This function becomes nop magically in modular code. Note, note, note! If built-in code creates and deletes PDEs dynamically (not in init hook), then proc_make_permanent() must not be used. It is intended for simple code: static int __init xxx_module_init(void) { g_pde = proc_create_single(); proc_make_permanent(g_pde); return 0; } static void __exit xxx_module_exit(void) { remove_proc_entry(g_pde); } If module is built-in then exit hook never executed and PDE is permanent so it is OK to mark it as such. If module is module then rmmod will yank PDE, but proc_make_permanent() is nop and core /proc code will do everything right. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: https://patch.msgid.link/20260425220844.1763933-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-05-11 23:13:01 +02:00
Linus Torvalds	440d6635b2	mm.git review status for linus..mm-nonmm-stable Total patches: 126 Reviews/patch: 0.92 Reviewed rate: 76% - The 2 patch series "pid: make sub-init creation retryable" from Oleg Nesterov increases the robustness of our creation of init in a new namespace. By clearing away some historical cruft which is no longer needed. Also some documentation fixups are provided. - The 2 patch series "selftests/fchmodat2: Error handling and general" from Mark Brown has a fixup and a cleanup for the fchmodat2() syscall selftest. - The 3 patch series "lib: polynomial: Move to math/ and clean up" from Andy Shevchenko does as advertised. - The 3 patch series "hung_task: Provide runtime reset interface for hung task detector" from Aaron Tomlin gives administrators the ability to zero out /proc/sys/kernel/hung_task_detect_count. - The 2 patch series "tools/getdelays: use the static UAPI headers from tools/include/uapi" from Thomas Weißschuh teaches getdelays to use the in-kernel UAPI headers rather than the system-provided ones. - The 5 patch series "watchdog/hardlockup: Improvements to hardlockup" from Mayank Rungta provides several cleanups and fixups to the hardlockup detector code and its documentation. - The 2 patch series "lib/bch: fix undefined behavior from signed left-shifts" from Josh Law provides a couple of small/theoretical fixes in the bch code. - The 2 patch series "ocfs2/dlm: fix two bugs in dlm_match_regions()" from Junrui Luo does what is claims. - The 27 patch series "cleanup the RAID5 XOR library" from Christoph Hellwig is a quite far-reaching cleanup to this code. I can't do better than to quote Christoph: The XOR library used for the RAID5 parity is a bit of a mess right now. The main file sits in crypto/ despite not being cryptography and not using the crypto API, with the generic implementations sitting in include/asm-generic and the arch implementations sitting in an asm/ header in theory. The latter doesn't work for many cases, so architectures often build the code directly into the core kernel, or create another module for the architecture code. Change this to a single module in lib/ that also contains the architecture optimizations, similar to the library work Eric Biggers has done for the CRC and crypto libraries later. After that it changes to better calling conventions that allow for smarter architecture implementations (although none is contained here yet), and uses static_call to avoid indirection function call overhead. - The 2 patch series "lib/list_sort: Clean up list_sort() scheduling workarounds" from Kuan-Wei Chiu cleans up this library code by removing a hacky thing which was added for UBIFS, which UBIFS doesn't actually need. - The 5 patch series "Fix bugs in extract_iter_to_sg()" from Christian Ehrhardt fixes a few bugs in the scatterlist code, adds in-kernel tests for the now-fixed bugs and fixes a leak in the test itself. - The 3 patch series "kdump: Enable LUKS-encrypted dump target support in ARM64 and PowerPC" from Coiby Xu eenables support of the LUKS-encrypted device dump target on arm64 and powerpc. - The 4 patch series "ocfs2: consolidate extent list validation into block read callbacks" from Joseph Qi addresses ocfs2's validation of extent list fields - cleanup, simplification, robustness. (Kernel test robot loves mounting corrupted fs images!) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad90rQAKCRDdBJ7gKXxA jl7rAQD4/Rq7ZSSnEv6FS4gOwc3MgTdWcZZaXkqL1KiWyYhRwAEA+cVCO344+AKb znBOjet/hUr+/kBwyViifiC8LHzchwM= =Nfnf -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "pid: make sub-init creation retryable" (Oleg Nesterov) Make creation of init in a new namespace more robust by clearing away some historical cruft which is no longer needed. Also some documentation fixups - "selftests/fchmodat2: Error handling and general" (Mark Brown) Fix and a cleanup for the fchmodat2() syscall selftest - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko) - "hung_task: Provide runtime reset interface for hung task detector" (Aaron Tomlin) Give administrators the ability to zero out /proc/sys/kernel/hung_task_detect_count - "tools/getdelays: use the static UAPI headers from tools/include/uapi" (Thomas Weißschuh) Teach getdelays to use the in-kernel UAPI headers rather than the system-provided ones - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta) Several cleanups and fixups to the hardlockup detector code and its documentation - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law) A couple of small/theoretical fixes in the bch code - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo) - "cleanup the RAID5 XOR library" (Christoph Hellwig) A quite far-reaching cleanup to this code. I can't do better than to quote Christoph: "The XOR library used for the RAID5 parity is a bit of a mess right now. The main file sits in crypto/ despite not being cryptography and not using the crypto API, with the generic implementations sitting in include/asm-generic and the arch implementations sitting in an asm/ header in theory. The latter doesn't work for many cases, so architectures often build the code directly into the core kernel, or create another module for the architecture code. Change this to a single module in lib/ that also contains the architecture optimizations, similar to the library work Eric Biggers has done for the CRC and crypto libraries later. After that it changes to better calling conventions that allow for smarter architecture implementations (although none is contained here yet), and uses static_call to avoid indirection function call overhead" - "lib/list_sort: Clean up list_sort() scheduling workarounds" (Kuan-Wei Chiu) Clean up this library code by removing a hacky thing which was added for UBIFS, which UBIFS doesn't actually need - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt) Fix a few bugs in the scatterlist code, add in-kernel tests for the now-fixed bugs and fix a leak in the test itself - "kdump: Enable LUKS-encrypted dump target support in ARM64 and PowerPC" (Coiby Xu) Enable support of the LUKS-encrypted device dump target on arm64 and powerpc - "ocfs2: consolidate extent list validation into block read callbacks" (Joseph Qi) Cleanup, simplify, and make more robust ocfs2's validation of extent list fields (Kernel test robot loves mounting corrupted fs images!) * tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits) ocfs2: validate group add input before caching ocfs2: validate bg_bits during freefrag scan ocfs2: fix listxattr handling when the buffer is full doc: watchdog: fix typos etc update Sean's email address ocfs2: use get_random_u32() where appropriate ocfs2: split transactions in dio completion to avoid credit exhaustion ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path() ocfs2: validate extent block list fields during block read ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec() ocfs2: validate dx_root extent list fields during block read ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY ocfs2: handle invalid dinode in ocfs2_group_extend .get_maintainer.ignore: add Askar ocfs2: validate bg_list extent bounds in discontig groups checkpatch: exclude forward declarations of const structs tools/accounting: handle truncated taskstats netlink messages taskstats: set version in TGID exit notifications ocfs2/heartbeat: fix slot mapping rollback leaks on error paths arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel ...	2026-04-16 20:11:56 -07:00
Linus Torvalds	334fbe734e	mm.git review status for linus..mm-stable Everything: Total patches: 368 Reviews/patch: 1.56 Reviewed rate: 74% Excluding DAMON: Total patches: 316 Reviews/patch: 1.77 Reviewed rate: 81% Excluding DAMON and zram: Total patches: 306 Reviews/patch: 1.81 Reviewed rate: 82% Excluding DAMON, zram and maple_tree: Total patches: 276 Reviews/patch: 2.01 Reviewed rate: 91% Significant patch series in this merge: - The 30 patch series "maple_tree: Replace big node with maple copy" from Liam Howlett is mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - The 12 patch series "mm, swap: swap table phase III: remove swap_map" from Kairui Song offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush Yadav adds file seal preservation to LUO's memfd code. - The 2 patch series "mm: zswap: add per-memcg stat for incompressible pages" from Jiayuan Chen adds additional userspace stats reportng to zswap. - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike Rapoport implements some cleanups for our handling of ZERO_PAGE() and zero_pfn. - The 2 patch series "mm/kmemleak: Improve scan_should_stop() implementation" from Zhongqiu Han provides an robustness improvement and some cleanups in the kmemleak code. - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang "improves the khugepaged scan logic and reduces CPU consumption by prioritizing scanning tasks that access memory frequently". - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies Kexec Handover by "transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel" - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's tracepointing. - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation. - The 2 patch series "Fix KASAN support for KHO restored vmalloc regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO restores a vmalloc area. - The 4 patch series "mm: Remove stray references to pagevec" from Tal Zussman provides several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago. - The 17 patch series "mm: Eliminate fake head pages from vmemmap optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page. - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for core layer filters" from SeongJae Park improves two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used. - The 3 patch series "mm/damon: strictly respect min_nr_regions" from SeongJae Park improves DAMON usability by extending the treatment of the min_nr_regions user-settable parameter. - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil Babka is a proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ennsed. - The 16 patch series "mm: cleanups around unmapping / zapping" from David Hildenbrand implements "a bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions". - The 6 patch series "support batched checking of the young flag for MGLRU" from Baolin Wang supports batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64. - The 5 patch series "memcg: obj stock and slab stat caching cleanups" from Johannes Weiner provides memcg cleanup and robustness improvements. - The 5 patch series "Allow order zero pages in page reporting" from Yuvraj Sakshith enhances page_reporting's free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is cleanup work following from the recent conversion of the VMA flags to a bitmap. - The 10 patch series "mm/damon: add optional debugging-purpose sanity checks" from SeongJae Park adds some more developer-facing debug checks into DAMON core. - The 2 patch series "mm/damon: test and document power-of-2 min_region_sz requirement" from SeongJae Park adds an additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling. - The 3 patch series "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time overflow issue in DAMON core. - The 7 patch series "mm/damon: improve/fixup/update ratio calculation, test and documentation" from SeongJae Park is a "batch of misc/minor improvements and fixups" for DAMON. - The 4 patch series "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - The 6 patch series "zram: recompression cleanups and tweaks" from Sergey Senozhatsky provides "a somewhat random mix of fixups, recompression cleanups and improvements" in the zram code. - The 11 patch series "mm/damon: support multiple goal-based quota tuning algorithms" from SeongJae Park extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select. - The 4 patch series "mm: thp: reduce unnecessary start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged. - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes provides some cleanups and slight fixes in the mremap, mmap and vma code. - The 5 patch series "mm/damon: support addr_unit on default monitoring targets for modules" from SeongJae Park extends the use of DAMON core's addr_unit tunable. - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites" from Nico Pache provides cleanups in the khugepaged and is a base for Nico's planned khugepaged mTHP support. - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups" from David Hildenbrand implements code movement and cleanups in the memhotplug and sparsemem code. - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some memhotplug Kconfig support. - The 6 patch series "change young flag check functions to return bool" from Baolin Wang is "a cleanup patchset to change all young flag check functions to return bool". - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL dereference issues" from Josh Law and SeongJae Park fixes a few potential DAMON bugs. - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma code" from "converts a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it". Mainly in the vma code. - The 21 patch series "mm: expand mmap_prepare functionality and usage" from Lorenzo Stoakes "expands the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time". Cleanups, documentation, extension of mmap_prepare into filesystem drivers. - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from Lorenzo Stoakes simplifies and cleans up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ= =1Q7o -----END PGP SIGNATURE----- Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...	2026-04-15 12:59:16 -07:00
Linus Torvalds	4a57e0913e	drm for v7.1-rc1 mm: - two pass MMU interval notifiers - add gpu active/reclaim per-node stat counters math: - provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI - implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST() rust: - shared tag with driver-core: register macro and io infra - core: rework DMA coherent API - core: add interop::list to interop with C linked lists - core: add more num::Bounded operations - core: enable generic_arg_infer and add EMSGSIZE - workqueue: add ARef<T> support for work and delayed work - add GPU buddy allocator abstraction - add DRM shmem GEM helper abstraction - allow drm:::Device to dispatch work and delayed work items to driver private data - add dma_resv_lock helper and raw accessors core: - introduce DRM RAS infrastructure over netlink - add connector panel_type property - fourcc: add ARM interleaved 64k modifier - colorop: add destroy helper - suballoc: split into alloc and init helpers - mode: provide DRM_ARGB_GET() macros for reading color components edid: - provide drm_output_color_Format dma-buf: - provide revoke mechanism for shared buffers - rename move_notify to invalidate_mappings - always enable move_notify - protect dma_fence_ops with RCU and improve locking - clean pages with helpers atomic: - allocate drm_private_state via callback - helper: use system_percpu_wq buddy: - make buddy allocator available to gpu level - add kernel-doc for buddy allocator - improve aligned allocation ttm: - fix fence signalling - improve tests and docs - improve handling of gfp_retry_mayfail - use per-node stat counters to track memory allocations - port pool to use list_lru - drop NUMA specific pools - make pool shrinker numa aware - track allocated pages per numa node coreboot: - cleanup coreboot framebuffer support sched: - fix race condition in drm_sched_fini pagemap: - enable THP support - pass pagemap_addr by reference gem-shmem: - Track page accessed/dirty status across mmap/vmap gpusvm: - reenable device to device migration - fix unbalanced unclock bridge: - anx7625: Support USB-C plus DT bindings - connector: Fix EDID detection - dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others - fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor' - imx8qxp-pixel-link: Improve bridge reference handling - lt9611: Support Port-B-only input plus DT bindings - tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up - Support TH1520 HDMI plus DT bindings - waveshare-dsi: Fix register and attach; Support 1..4 DSI lanes plus DT bindings - anx7625: Fix USB Type-C handling - cdns-mhdp8546-core: Handle HDCP state in bridge atomic_check - Support Lontium LT8713SX DP MST bridge plus DT bindings - analogix_dp: Use DP helpers for link training panel: - panel-jdi-lt070me05000: Use mipi-dsi multi functions - panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes - panel-edp: Fix timings for BOE NV140WUM-N64 - ilitek-ili9882t: Allow GPIO calls to sleep - jadard: Support TAIGUAN XTI05101-01A - lxd: Support LXD M9189A plus DT bindings - mantix: Fix pixel clock; Clean up - motorola: Support Motorola Atrix 4G and Droid X2 plus DT bindings - novatek: Support Novatek/Tianma NT37700F plus DT bindings - simple: Support EDT ET057023UDBA plus DT bindings; Support Powertip PH800480T032-ZHC19 plus DT bindings; Support Waveshare 13.3" - novatek-nt36672a: Use mipi_dsi__multi() functions - panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW MNF307QS3-2 - support Himax HX83121A plus DT bindings - support JuTouch JT070TM041 plus DT bindings - support Samsung S6E8FC0 plus DT bindings - himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support backlight - ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings - simple: support Tianma TM050RDH03 plus DT bindings amdgpu: - enable DC by default on CIK APUs - userq fence ioctl param size fixes - set panel_type to OLED for eDP - refactor DC i2c code - FAMS2 update - rework ttm handling to allow multiple engines - DC DCE 6.x cleanup - DC support for NUTMEG/TRAVIS DP bridge - DCN 4.2 support - GC12 idle power fix for compute - use struct drm_edid in non-DC code - enable NV12/P010 support on primary planes - support newer IP discovery tables - VCN/JPEG 5.0.2 support - GC/MES 12.1 updates - USERQ fixes - add DC idle state manager - eDP DSC seamless boot amdkfd: - GC 12.1 updates - non 4K page fixes xe: - basic Xe3p_LPG and NVL-P enabling patches - allow VM_BIND decompress support - add purgeable buffer object support - add xe_vm_get_property_ioctl - restrict multi-lrc to VCS/VECS engines - allow disabling VM overcommit in fault mode - dGPU memory optimizations - Workaround cleanups and simplification - Allow VFs VRAM quote changes using sysfs - convert GT stats to per-cpu counters - pagefault refactors - enable multi-queue on xe3p_xpc - disable DCC on PTL - make MMIO communication more robust - disable D3Cold for BMG on specific platforms - vfio: improve FLR sync for Xe VFIO i915/display: - C10/C20/LT PHY PLL divider verification - use trans push mechanism to generate PSR frame change on LNL+ - refactor DP DSC slice config - VGA decode refactoring - refactor DPT, gen2-4 overlay, masked field register macro helpers - refactor stolen memory allocation decisions - prepare for UHBR DP tunnels - refactor LT PHY PLL to use DPLL framework - implement register polling/waiting in display code - add shared stepping header between i915 and display i915: - fix potential overflow of shmem scatterlist length nouveau: - provide Z cull info to userspace - initial GA100 support - shutdown on PCI device shutdown nova-core: - harden GSP command queue - add support for large RPCs - simplify GSP sequencer and message handling - refactor falcon firmware handling - convert to new register macro - conver to new DMA coherent API - use checked arithmetic - add debugfs support for gsp-rm log buffers - fix aux device registration for multi-GPU msm: - CI: - Uprev mesa - Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices - Core: - Switched to of_get_available_child_by_name() - DPU: - Fixes for DSC panels - Fixed brownout because of the frequency / OPP mismatch - Quad pipe preparation (not enabled yet) - Switched to virtual planes by default - Dropped VBIF_NRT support - Added support for Eliza platform - Reworked alpha handling - Switched to correct CWB definitions on Eliza - Dropped dummy INTF_0 on MSM8953 - Corrected INTFs related to DP-MST - DP: - Removed debug prints looking into PHY internals - DSI: - Fixes for DSC panels - RGB101010 support - Support for SC8280XP - Moved PHY bindings from display/ to phy/ - GPU: - Preemption support for x2-85 and a840 - IFPC support for a840 - SKU detection support for x2-85 and a840 - Expose AQE support (VK ray-pipeline) - Avoid locking in VM_BIND fence signaling path - Fix to avoid reclaim in GPU snapshot path - Disallow foreign mapping of _NO_SHARE BOs - HDMI: - Fixed infoframes programming - MDP5: - Dropped support for MSM8974v1 - Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998 panthor: - add tracepoints for power and IRQs - fix fence handling - extend timestamp query with flags - support various sources for timestamp queries tyr: - fix names and model/versions rockchip: - vop2: use drm logging function - rk3576 displayport support - support CRTC background color atmel-hlcdc: - support sana5d65 LCD controller tilcdc: - use DT bindings schema - use managed DRM interfaces - support DRM_BRIDGE_ATTACH_NO_CONNECTOR verisilicon: - support DC8200 + DT bindings virtgpu: - support PRIME import with 3D enabled komeda: - fix integer overflow in AFBC checks mcde: - improve bridge handling gma500: - use drm client buffer for fbdev framebuffer amdxdna: - add sensors ioctls - provide NPU power estimate - support column utilization sensor - allow forcing DMA through IOMMU IOVA - support per-BO mem usage queries - refactor GEM implementation ivpu: - update boot API to v3.29.4 - limit per-user number of doorbells/contexts - perform engine reset on TDR error loongson: - replace custom code with drm_gem_ttm_dumb_map_offset() imx: - support planes behind the primary plane - fix bus-format selection vkms: - support CRTC background color v3d: - improve handling of struct v3d_stats komeda: - support Arm China Linlon D6 plus DT bindings imagination: - improve power-off sequence - support context-reset notification from firmware mediatek: - mtk_dsi: enable hs clock during pre-enable - Remove all conflicting aperture devices during probe - Add support for mt8167 display blocks -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmnfMHMACgkQDHTzWXnE hr4gEg/+Oaf6KBcvqNKPLwDlNeOvHap1n8oiy7SXvOKN2/KEAu/zGpEciJ7GsSge qdqY4xhEfp0JZLrTZiIIzFr38uzkanfOLdF2AQCVrfCRhlO7QLiUDxAAdDZUyINe kKLvNunxMwhzwsmRHEDL85cgPkhsxt2ux+tUOYZrEQ/ZbdupNrFw9q5ewmuYzGng HY8bsnB0jVwQ9IU/X6h+Xzr/19623/CZyUWJSuY1foKMhHMceyrCmpAFEqjFWn71 7zNYFlPEQtqa6qtIZXVbJB4mhd7NbmMW6s367xx+Sx+UJDDNfS6ku+hpISwxNuVX 7fOoEkhQ+ynIcxGkfOi5Q9j2/mV/WL/GEA/IUWfmX8l219WOrKY4w0NtCE4C78r7 QFGUR6w8Vi97FCP8NuA7Kix4J9eSr/FAzqoG0snAOQbVdaTSBr1hL0PeewD8BRry PUkCCh6J7jKA6POt4JZeU6mbJ3AMoOwS9BICi10R1R6EnIKNpKGVpAuYHk4B5+u3 X5vd1ds+8dJN/etaFYgIbirUocKx6zt9rT5i4/wPZIDPoCgZNofePtPCiJoTcnNN PUZUngcWLpftwW+kCUdc4lF1Q7nguQpXVpX0WJiSfqejshUTPXHPlmJV81GoNSHo fQMUXIjO5cAX0FKPBakSxxwFnOQFq4aZb6kRBt4lYgt+RJfzo3s= =GX7Q -----END PGP SIGNATURE----- Merge tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "Highlights: - new DRM RAS infrastructure using netlink - amdgpu: enable DC on CIK APUs, and more IP enablement, and more user queue work - xe: purgeable BO support, and new hw enablement - dma-buf : add revocable operations Full summary: mm: - two-pass MMU interval notifiers - add gpu active/reclaim per-node stat counters math: - provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI - implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST() rust: - shared tag with driver-core: register macro and io infra - core: rework DMA coherent API - core: add interop::list to interop with C linked lists - core: add more num::Bounded operations - core: enable generic_arg_infer and add EMSGSIZE - workqueue: add ARef<T> support for work and delayed work - add GPU buddy allocator abstraction - add DRM shmem GEM helper abstraction - allow drm:::Device to dispatch work and delayed work items to driver private data - add dma_resv_lock helper and raw accessors core: - introduce DRM RAS infrastructure over netlink - add connector panel_type property - fourcc: add ARM interleaved 64k modifier - colorop: add destroy helper - suballoc: split into alloc and init helpers - mode: provide DRM_ARGB_GET() macros for reading color components edid: - provide drm_output_color_Format dma-buf: - provide revoke mechanism for shared buffers - rename move_notify to invalidate_mappings - always enable move_notify - protect dma_fence_ops with RCU and improve locking - clean pages with helpers atomic: - allocate drm_private_state via callback - helper: use system_percpu_wq buddy: - make buddy allocator available to gpu level - add kernel-doc for buddy allocator - improve aligned allocation ttm: - fix fence signalling - improve tests and docs - improve handling of gfp_retry_mayfail - use per-node stat counters to track memory allocations - port pool to use list_lru - drop NUMA specific pools - make pool shrinker numa aware - track allocated pages per numa node coreboot: - cleanup coreboot framebuffer support sched: - fix race condition in drm_sched_fini pagemap: - enable THP support - pass pagemap_addr by reference gem-shmem: - Track page accessed/dirty status across mmap/vmap gpusvm: - reenable device to device migration - fix unbalanced unclock bridge: - anx7625: Support USB-C plus DT bindings - connector: Fix EDID detection - dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others - fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor' - imx8qxp-pixel-link: Improve bridge reference handling - lt9611: Support Port-B-only input plus DT bindings - tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up - Support TH1520 HDMI plus DT bindings - waveshare-dsi: Fix register and attach; Support 1..4 DSI lanes plus DT bindings - anx7625: Fix USB Type-C handling - cdns-mhdp8546-core: Handle HDCP state in bridge atomic_check - Support Lontium LT8713SX DP MST bridge plus DT bindings - analogix_dp: Use DP helpers for link training panel: - panel-jdi-lt070me05000: Use mipi-dsi multi functions - panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes - panel-edp: Fix timings for BOE NV140WUM-N64 - ilitek-ili9882t: Allow GPIO calls to sleep - jadard: Support TAIGUAN XTI05101-01A - lxd: Support LXD M9189A plus DT bindings - mantix: Fix pixel clock; Clean up - motorola: Support Motorola Atrix 4G and Droid X2 plus DT bindings - novatek: Support Novatek/Tianma NT37700F plus DT bindings - simple: Support EDT ET057023UDBA plus DT bindings; Support Powertip PH800480T032-ZHC19 plus DT bindings; Support Waveshare 13.3" - novatek-nt36672a: Use mipi_dsi__multi() functions - panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW MNF307QS3-2 - support Himax HX83121A plus DT bindings - support JuTouch JT070TM041 plus DT bindings - support Samsung S6E8FC0 plus DT bindings - himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support backlight - ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings - simple: support Tianma TM050RDH03 plus DT bindings amdgpu: - enable DC by default on CIK APUs - userq fence ioctl param size fixes - set panel_type to OLED for eDP - refactor DC i2c code - FAMS2 update - rework ttm handling to allow multiple engines - DC DCE 6.x cleanup - DC support for NUTMEG/TRAVIS DP bridge - DCN 4.2 support - GC12 idle power fix for compute - use struct drm_edid in non-DC code - enable NV12/P010 support on primary planes - support newer IP discovery tables - VCN/JPEG 5.0.2 support - GC/MES 12.1 updates - USERQ fixes - add DC idle state manager - eDP DSC seamless boot amdkfd: - GC 12.1 updates - non 4K page fixes xe: - basic Xe3p_LPG and NVL-P enabling patches - allow VM_BIND decompress support - add purgeable buffer object support - add xe_vm_get_property_ioctl - restrict multi-lrc to VCS/VECS engines - allow disabling VM overcommit in fault mode - dGPU memory optimizations - Workaround cleanups and simplification - Allow VFs VRAM quote changes using sysfs - convert GT stats to per-cpu counters - pagefault refactors - enable multi-queue on xe3p_xpc - disable DCC on PTL - make MMIO communication more robust - disable D3Cold for BMG on specific platforms - vfio: improve FLR sync for Xe VFIO i915/display: - C10/C20/LT PHY PLL divider verification - use trans push mechanism to generate PSR frame change on LNL+ - refactor DP DSC slice config - VGA decode refactoring - refactor DPT, gen2-4 overlay, masked field register macro helpers - refactor stolen memory allocation decisions - prepare for UHBR DP tunnels - refactor LT PHY PLL to use DPLL framework - implement register polling/waiting in display code - add shared stepping header between i915 and display i915: - fix potential overflow of shmem scatterlist length nouveau: - provide Z cull info to userspace - initial GA100 support - shutdown on PCI device shutdown nova-core: - harden GSP command queue - add support for large RPCs - simplify GSP sequencer and message handling - refactor falcon firmware handling - convert to new register macro - conver to new DMA coherent API - use checked arithmetic - add debugfs support for gsp-rm log buffers - fix aux device registration for multi-GPU msm: - CI: - Uprev mesa - Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices - Core: - Switched to of_get_available_child_by_name() - DPU: - Fixes for DSC panels - Fixed brownout because of the frequency / OPP mismatch - Quad pipe preparation (not enabled yet) - Switched to virtual planes by default - Dropped VBIF_NRT support - Added support for Eliza platform - Reworked alpha handling - Switched to correct CWB definitions on Eliza - Dropped dummy INTF_0 on MSM8953 - Corrected INTFs related to DP-MST - DP: - Removed debug prints looking into PHY internals - DSI: - Fixes for DSC panels - RGB101010 support - Support for SC8280XP - Moved PHY bindings from display/ to phy/ - GPU: - Preemption support for x2-85 and a840 - IFPC support for a840 - SKU detection support for x2-85 and a840 - Expose AQE support (VK ray-pipeline) - Avoid locking in VM_BIND fence signaling path - Fix to avoid reclaim in GPU snapshot path - Disallow foreign mapping of _NO_SHARE BOs - HDMI: - Fixed infoframes programming - MDP5: - Dropped support for MSM8974v1 - Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998 panthor: - add tracepoints for power and IRQs - fix fence handling - extend timestamp query with flags - support various sources for timestamp queries tyr: - fix names and model/versions rockchip: - vop2: use drm logging function - rk3576 displayport support - support CRTC background color atmel-hlcdc: - support sana5d65 LCD controller tilcdc: - use DT bindings schema - use managed DRM interfaces - support DRM_BRIDGE_ATTACH_NO_CONNECTOR verisilicon: - support DC8200 + DT bindings virtgpu: - support PRIME import with 3D enabled komeda: - fix integer overflow in AFBC checks mcde: - improve bridge handling gma500: - use drm client buffer for fbdev framebuffer amdxdna: - add sensors ioctls - provide NPU power estimate - support column utilization sensor - allow forcing DMA through IOMMU IOVA - support per-BO mem usage queries - refactor GEM implementation ivpu: - update boot API to v3.29.4 - limit per-user number of doorbells/contexts - perform engine reset on TDR error loongson: - replace custom code with drm_gem_ttm_dumb_map_offset() imx: - support planes behind the primary plane - fix bus-format selection vkms: - support CRTC background color v3d: - improve handling of struct v3d_stats komeda: - support Arm China Linlon D6 plus DT bindings imagination: - improve power-off sequence - support context-reset notification from firmware mediatek: - mtk_dsi: enable hs clock during pre-enable - Remove all conflicting aperture devices during probe - Add support for mt8167 display blocks" * tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernel: (1735 commits) drm/ttm/tests: Remove checks from ttm_pool_free_no_dma_alloc drm/ttm/tests: fix lru_count ASSERT drm/vram: remove DRM_VRAM_MM_FILE_OPERATIONS from docs drm/fb-helper: Fix a locking bug in an error path dma-fence: correct kernel-doc function parameter @flags ttm/pool: track allocated_pages per numa node. ttm/pool: make pool shrinker NUMA aware (v2) ttm/pool: drop numa specific pools ttm/pool: port to list_lru. (v2) drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) mm: add gpu active/reclaim per-node stat counters (v2) gpu: nova-core: fix missing colon in SEC2 boot debug message gpu: nova-core: vbios: use from_le_bytes() for PCI ROM header parsing gpu: nova-core: bitfield: fix broken Default implementation gpu: nova-core: falcon: pad firmware DMA object size to required block alignment gpu: nova-core: gsp: fix undefined behavior in command queue code drm/shmem_helper: Make sure PMD entries get the writeable upgrade accel/ivpu: Trigger recovery on TDR with OS scheduling drm/msm: Use of_get_available_child_by_name() dt-bindings: display/msm: move DSI PHY bindings to phy/ subdir ...	2026-04-15 08:45:00 -07:00
Linus Torvalds	ef3da345cc	vfs-7.1-rc1.misc Please consider pulling these changes from the signed vfs-7.1-rc1.misc tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCwAKCRCRxhvAZXjc ohhBAQCAmQMlMRAXAgUZFYMTZpeQlcujP5rv+/vT2Tf/xS76YwD/dRDaw1FH294+ qtk/Z1NjleNixzE2sld1K9J32NxeyAc= =+g9q -----END PGP SIGNATURE----- Merge tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - coredump: add tracepoint for coredump events - fs: hide file and bfile caches behind runtime const machinery Fixes: - fix architecture-specific compat_ftruncate64 implementations - dcache: Limit the minimal number of bucket to two - fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START - fs/mbcache: cancel shrink work before destroying the cache - dcache: permit dynamic_dname()s up to NAME_MAX Cleanups: - remove or unexport unused fs_context infrastructure - trivial ->setattr cleanups - selftests/filesystems: Assume that TIOCGPTPEER is defined - writeback: fix kernel-doc function name mismatch for wb_put_many() - autofs: replace manual symlink buffer allocation in autofs_dir_symlink - init/initramfs.c: trivial fix: FSM -> Finite-state machine - fs: remove stale and duplicate forward declarations - readdir: Introduce dirent_size() - fs: Replace user_access_{begin/end} by scoped user access - kernel: acct: fix duplicate word in comment - fs: write a better comment in step_into() concerning .mnt assignment - fs: attr: fix comment formatting and spelling issues" * tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) dcache: permit dynamic_dname()s up to NAME_MAX fs: attr: fix comment formatting and spelling issues fs: hide file and bfile caches behind runtime const machinery fs: write a better comment in step_into() concerning .mnt assignment proc: rename proc_notify_change to proc_setattr proc: rename proc_setattr to proc_nochmod_setattr affs: rename affs_notify_change to affs_setattr adfs: rename adfs_notify_change to adfs_setattr hfs: update comments on hfs_inode_setattr kernel: acct: fix duplicate word in comment fs: Replace user_access_{begin/end} by scoped user access readdir: Introduce dirent_size() coredump: add tracepoint for coredump events fs: remove do_sys_truncate fs: pass on FTRUNCATE_* flags to do_truncate fs: fix archiecture-specific compat_ftruncate64 fs: remove stale and duplicate forward declarations init/initramfs.c: trivial fix: FSM -> Finite-state machine autofs: replace manual symlink buffer allocation in autofs_dir_symlink fs/mbcache: cancel shrink work before destroying the cache ...	2026-04-13 14:20:11 -07:00
Linus Torvalds	b7d74ea0fd	vfs-7.1-rc1.kino Please consider pulling these changes from the signed vfs-7.1-rc1.kino tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc otmnAP4sbsxZQdz2TG2hJuOwnEZOkkxZQOUMc3ERVyZaWXIeTAEA7e5M+8FpoG9n 8ipO76UoaXdGLESrqVdp9EOhLqOW7QY= =uMeJ -----END PGP SIGNATURE----- Merge tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64	2026-04-13 12:19:01 -07:00
Linus Torvalds	3383589700	vfs-7.1-rc1.directory Please consider pulling these changes from the signed vfs-7.1-rc1.directory tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCgAKCRCRxhvAZXjc oj2yAQDHLWMfxDr8INpBuGc09tqX04qyb7td3U+zfM5c3bvLLAD/eDFSSUvzLtPD u540EWvTaJBV5ALUx3vQK96PPvS6Vg4= =20jQ -----END PGP SIGNATURE----- Merge tag 'vfs-7.1-rc1.directory' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory updates from Christian Brauner: "Recently 'start_creating', 'start_removing', 'start_renaming' and related interfaces were added which combine the locking and the lookup. At that time many callers were changed to use the new interfaces. However there are still an assortment of places out side of the core vfs where the directory is locked explictly, whether with inode_lock() or lock_rename() or similar. These were missed in the first pass for an assortment of uninteresting reasons. This addresses the remaining places where explicit locking is used, and changes them to use the new interfaces, or otherwise removes the explicit locking. The biggest changes are in overlayfs. The other changes are quite simple, though maybe the cachefiles changes is the least simple of those" * tag 'vfs-7.1-rc1.directory' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: unexport lock_rename(), lock_rename_child(), unlock_rename() ovl: remove ovl_lock_rename_workdir() ovl: use is_subdir() for testing if one thing is a subdir of another ovl: change ovl_create_real() to get a new lock when re-opening created file. ovl: pass name buffer to ovl_start_creating_temp() cachefiles: change cachefiles_bury_object to use start_renaming_dentry() ovl: Simplify ovl_lookup_real_one() VFS: make lookup_one_qstr_excl() static. nfsd: switch purge_old() to use start_removing_noperm() selinux: Use simple_start_creating() / simple_done_creating() Apparmor: Use simple_start_creating() / simple_done_creating() libfs: change simple_done_creating() to use end_creating() VFS: move the start_dirop() kerndoc comment to before start_dirop() fs/proc: Don't lock root inode when creating "self" and "thread-self" VFS: note error returns in documentation for various lookup functions	2026-04-13 10:24:33 -07:00
Dave Airlie	2232ba9c79	mm: add gpu active/reclaim per-node stat counters (v2) While discussing memcg intergration with gpu memory allocations, it was pointed out that there was no numa/system counters for GPU memory allocations. With more integrated memory GPU server systems turning up, and more requirements for memory tracking it seems we should start closing the gap. Add two counters to track GPU per-node system memory allocations. The first is currently allocated to GPU objects, and the second is for memory that is stored in GPU page pools that can be reclaimed, by the shrinker. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2026-04-08 06:52:47 +10:00
Johannes Weiner	b9ec0ed907	mm: vmalloc: streamline vmalloc memory accounting Use a vmstat counter instead of a custom, open-coded atomic. This has the added benefit of making the data available per-node, and prepares for cleaning up the memcg accounting as well. Link: https://lkml.kernel.org/r/20260223160147.3792777-1-hannes@cmpxchg.org Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:53:04 -07:00
Mike Rapoport (Microsoft)	9a1d0c738b	mm: rename my_zero_pfn() to zero_pfn() my_zero_pfn() is a silly name. Rename zero_pfn variable to zero_page_pfn and my_zero_pfn() function to zero_pfn(). While on it, move extern declarations of zero_page_pfn outside the functions that use it and add a comment about what ZERO_PAGE is. Link: https://lkml.kernel.org/r/20260211103141.3215197-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-04-05 13:53:01 -07:00
Jaime Saguillo Revilla	38d581ba76	proc: array: drop stale FIXME about RCU in task_sig() task_sig() already wraps the SigQ rlimit read in an explicit RCU read-side critical section. Drop the stale FIXME comment and keep using task_ucounts() for the ucounts access. No functional change. Link: https://lkml.kernel.org/r/20260215124511.14227-1-jaime.saguillo@gmail.com Signed-off-by: Jaime Saguillo Revilla <jaime.saguillo@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-03-27 21:19:32 -07:00
Christoph Hellwig	2ecd46d161	proc: rename proc_notify_change to proc_setattr Make the function name match the method that it implements. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260325063711.3298685-6-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-26 15:16:54 +01:00
Christoph Hellwig	690005b0b1	proc: rename proc_setattr to proc_nochmod_setattr What is currently proc_setattr is a special version added after the more general procfs ->seattr in commit `6d76fa58b0` ("Don't allow chmod() on the /proc/<pid>/ files"). Give it a name that reflects that to free the proc_setattr name and better describe what is doing. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260325063711.3298685-5-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-26 15:16:54 +01:00
Jeff Layton	0b2600f81c	treewide: change inode->i_ino from unsigned long to u64 On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-06 14:31:28 +01:00
NeilBrown	9ab6838984	fs/proc: Don't lock root inode when creating "self" and "thread-self" proc_setup_self() and proc_setup_thread_self() are only called from proc_fill_super() which is before the filesystem is "live". So there is no need to lock the root directory when adding "self" and "thread-self". This is clear from simple_fill_super() which provides similar functionality for other filesystems and does not lock anything. The locking is not harmful, except that it may be confusing to a reader. As part of an effort to centralise all locking for directories for name-based operations (prior to changing some locking rules), it is simplest to remove the locking here. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20260224222542.3458677-3-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-06 10:24:11 +01:00
Linus Torvalds	0e335a7745	vfs-7.0-rc2.fixes Please consider pulling these changes from the signed vfs-7.0-rc2.fixes tag. Thanks! Christian -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaZ7xWAAKCRCRxhvAZXjc onpeAP4qOrTURIAX9M/NGCHywvjI91ZJt20J6vm0X6KbVV/ebQD/eoJ21xzPhG9M gN7oRcZ9SW3e/AdtdnlqB0PEP+cyGwM= =9Ji+ -----END PGP SIGNATURE----- Merge tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix an uninitialized variable in file_getattr(). The flags_valid field wasn't initialized before calling vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse - Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is not enabled. sysctl_hung_task_timeout_secs is 0 in that case causing spurious "waiting for writeback completion for more than 1 seconds" warnings - Fix a null-ptr-deref in do_statmount() when the mount is internal - Add missing kernel-doc description for the @private parameter in iomap_readahead() - Fix mount namespace creation to hold namespace_sem across the mount copy in create_new_namespace(). The previous drop-and-reacquire pattern was fragile and failed to clean up mount propagation links if the real rootfs was a shared or dependent mount - Fix /proc mount iteration where m->index wasn't updated when m->show() overflows, causing a restart to repeatedly show the same mount entry in a rapidly expanding mount table - Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the inode number is out of range - Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared. copy_mnt_ns() received the live fs_struct so if a subsequent namespace creation failed the rollback would leave pwd and root pointing to detached mounts. Always allocate a new fs_struct when CLONE_NEWNS is requested - fserror bug fixes: - Remove the unused fsnotify_sb_error() helper now that all callers have been converted to fserror_report_metadata - Fix a lockdep splat in fserror_report() where igrab() takes inode::i_lock which can be held in IRQ context. Replace igrab() with a direct i_count bump since filesystems should not report inodes that are about to be freed or not yet exposed - Handle error pointer in procfs for try_lookup_noperm() - Fix an integer overflow in ep_loop_check_proc() where recursive calls returning INT_MAX would overflow when +1 is added, breaking the recursion depth check - Fix a misleading break in pidfs * tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pidfs: avoid misleading break eventpoll: Fix integer overflow in ep_loop_check_proc() proc: Fix pointer error dereference fserror: fix lockdep complaint when igrabbing inode fsnotify: drop unused helper unshare: fix unshare_fs() handling minix: Correct errno in minix_new_inode namespace: fix proc mount iteration mount: hold namespace_sem across copy in create_new_namespace() iomap: Describe @private in iomap_readahead() statmount: Fix the null-ptr-deref in do_statmount() writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK fs: init flags_valid before calling vfs_fileattr_get	2026-02-25 10:34:23 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00

1 2 3 4 5 ...

2793 Commits