Commit Graph

7706 Commits

Author SHA1 Message Date
Linus Torvalds
2a4c0c11c0 s390 updates for 7.1 merge window
- Add support for CONFIG_PAGE_TABLE_CHECK and enable it in
   debug_defconfig. s390 can only tell user from kernel PTEs via the mm,
   so mm_struct is now passed into pxx_user_accessible_page() callbacks
 
 - Expose the PCI function UID as an arch-specific slot attribute in
   sysfs so a function can be identified by its user-defined id while
   still in standby. Introduces a generic ARCH_PCI_SLOT_GROUPS hook in
   drivers/pci/slot.c
 
 - Refresh s390 PCI documentation to reflect current behavior and cover
   previously undocumented sysfs attributes
 
 - zcrypt device driver cleanup series: consistent field types, clearer
   variable naming, a kernel-doc warning fix, and a comment explaining
   the intentional synchronize_rcu() in pkey_handler_register()
 
 - Provide an s390 arch_raw_cpu_ptr() that avoids the detour via
   get_lowcore() using alternatives, shrinking defconfig by ~27 kB
 
 - Guard identity-base randomization with kaslr_enabled() so nokaslr
   keeps the identity mapping at 0 even with
   CONFIG_RANDOMIZE_IDENTITY_BASE=y
 
 - Build S390_MODULES_SANITY_TEST as a module only by requiring
   KUNIT && m, since built-in would not exercise module loading
 
 - Remove the permanently commented-out HMCDRV_DEV_CLASS create_class()
   code in the hmcdrv driver
 
 - Drop stale ident_map_size extern conflicting with asm/page.h
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmno78kACgkQjYWKoQLX
 FBiHbggAmW5hPIDf4F8HLomMREaaQb7QAyYwfeefwhcFUXSMu8td8S68aN4UkOnS
 DGSFjb+V6Nqd+ewrF7IS9pRU9YFsmBqo3MnLdcJ/ojZFz8BlwoAi+E4AD1a38hY2
 9zh2siPBMjydqBRUn6zjsK8auk4e8r44iS5MNNMXDF2ePE/PnPKTm93GhbtnnM6r
 a7mQkiPbi6j0sN/UU+pQkhS4fm2XNaGpCGGX0W0v2RdLIYZ9zQQdg4TaEsjQ5wZA
 OC3P8LG3OyJjnxsY2J8PIKK0VM0JP67KUGnQOi1y8HbN1LkFfAWF6CK7tsyUE/JM
 TYg7ENs2mUMmaa8niOGkiXzjjAxD0g==
 =NpmP
 -----END PGP SIGNATURE-----

Merge tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Vasily Gorbik:

 - Add support for CONFIG_PAGE_TABLE_CHECK and enable it in
   debug_defconfig. s390 can only tell user from kernel PTEs via the mm,
   so mm_struct is now passed into pxx_user_accessible_page() callbacks

 - Expose the PCI function UID as an arch-specific slot attribute in
   sysfs so a function can be identified by its user-defined id while
   still in standby. Introduces a generic ARCH_PCI_SLOT_GROUPS hook in
   drivers/pci/slot.c

 - Refresh s390 PCI documentation to reflect current behavior and cover
   previously undocumented sysfs attributes

 - zcrypt device driver cleanup series: consistent field types, clearer
   variable naming, a kernel-doc warning fix, and a comment explaining
   the intentional synchronize_rcu() in pkey_handler_register()

 - Provide an s390 arch_raw_cpu_ptr() that avoids the detour via
   get_lowcore() using alternatives, shrinking defconfig by ~27 kB

 - Guard identity-base randomization with kaslr_enabled() so nokaslr
   keeps the identity mapping at 0 even with RANDOMIZE_IDENTITY_BASE=y

 - Build S390_MODULES_SANITY_TEST as a module only by requiring KUNIT &&
   m, since built-in would not exercise module loading

 - Remove the permanently commented-out HMCDRV_DEV_CLASS create_class()
   code in the hmcdrv driver

 - Drop stale ident_map_size extern conflicting with asm/page.h

* tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/zcrypt: Fix warning about wrong kernel doc comment
  PCI: s390: Expose the UID as an arch specific PCI slot attribute
  docs: s390/pci: Improve and update PCI documentation
  s390/pkey: Add comment about synchronize_rcu() to pkey base
  s390/hmcdrv: Remove commented out code
  s390/zcrypt: Slight rework on the agent_id field
  s390/zcrypt: Explicitly use a card variable in _zcrypt_send_cprb
  s390/zcrypt: Rework MKVP fields and handling
  s390/zcrypt: Make apfs a real unsigned int field
  s390/zcrypt: Rework domain processing within zcrypt device driver
  s390/zcrypt: Move inline function rng_type6cprb_msgx from header to code
  s390/percpu: Provide arch_raw_cpu_ptr()
  s390: Enable page table check for debug_defconfig
  s390/pgtable: Add s390 support for page table check
  s390/pgtable: Use set_pmd_bit() to invalidate PMD entry
  mm/page_table_check: Pass mm_struct to pxx_user_accessible_page()
  s390/boot: Respect kaslr_enabled() for identity randomization
  s390/Kconfig: Make modules sanity test a module-only option
  s390/setup: Drop stale ident_map_size declaration
2026-04-22 11:13:45 -07:00
Linus Torvalds
440d6635b2 mm.git review status for linus..mm-nonmm-stable
Total patches:       126
 Reviews/patch:       0.92
 Reviewed rate:       76%
 
 - The 2 patch series "pid: make sub-init creation retryable" from Oleg
   Nesterov increases the robustness of our creation of init in a new
   namespace.  By clearing away some historical cruft which is no longer
   needed.  Also some documentation fixups are provided.
 
 - The 2 patch series "selftests/fchmodat2: Error handling and general"
   from Mark Brown has a fixup and a cleanup for the fchmodat2() syscall
   selftest.
 
 - The 3 patch series "lib: polynomial: Move to math/ and clean up" from
   Andy Shevchenko does as advertised.
 
 - The 3 patch series "hung_task: Provide runtime reset interface for
   hung task detector" from Aaron Tomlin gives administrators the ability
   to zero out /proc/sys/kernel/hung_task_detect_count.
 
 - The 2 patch series "tools/getdelays: use the static UAPI headers from
   tools/include/uapi" from Thomas Weißschuh teaches getdelays to use the
   in-kernel UAPI headers rather than the system-provided ones.
 
 - The 5 patch series "watchdog/hardlockup: Improvements to hardlockup"
   from Mayank Rungta provides several cleanups and fixups to the
   hardlockup detector code and its documentation.
 
 - The 2 patch series "lib/bch: fix undefined behavior from signed
   left-shifts" from Josh Law provides a couple of small/theoretical fixes
   in the bch code.
 
 - The 2 patch series "ocfs2/dlm: fix two bugs in dlm_match_regions()"
   from Junrui Luo does what is claims.
 
 - The 27 patch series "cleanup the RAID5 XOR library" from Christoph
   Hellwig is a quite far-reaching cleanup to this code.  I can't do better
   than to quote Christoph:
 
     The XOR library used for the RAID5 parity is a bit of a mess right
     now.  The main file sits in crypto/ despite not being cryptography and
     not using the crypto API, with the generic implementations sitting in
     include/asm-generic and the arch implementations sitting in an asm/
     header in theory.  The latter doesn't work for many cases, so
     architectures often build the code directly into the core kernel, or
     create another module for the architecture code.
 
     Change this to a single module in lib/ that also contains the
     architecture optimizations, similar to the library work Eric Biggers
     has done for the CRC and crypto libraries later.  After that it
     changes to better calling conventions that allow for smarter
     architecture implementations (although none is contained here yet),
     and uses static_call to avoid indirection function call overhead.
 
 - The 2 patch series "lib/list_sort: Clean up list_sort() scheduling
   workarounds" from Kuan-Wei Chiu cleans up this library code by removing
   a hacky thing which was added for UBIFS, which UBIFS doesn't actually
   need.
 
 - The 5 patch series "Fix bugs in extract_iter_to_sg()" from Christian
   Ehrhardt fixes a few bugs in the scatterlist code, adds in-kernel tests
   for the now-fixed bugs and fixes a leak in the test itself.
 
 - The 3 patch series "kdump: Enable LUKS-encrypted dump target support
   in ARM64 and PowerPC" from Coiby Xu eenables support of the
   LUKS-encrypted device dump target on arm64 and powerpc.
 
 - The 4 patch series "ocfs2: consolidate extent list validation into
   block read callbacks" from Joseph Qi addresses ocfs2's validation of
   extent list fields - cleanup, simplification, robustness.  (Kernel test
   robot loves mounting corrupted fs images!)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad90rQAKCRDdBJ7gKXxA
 jl7rAQD4/Rq7ZSSnEv6FS4gOwc3MgTdWcZZaXkqL1KiWyYhRwAEA+cVCO344+AKb
 znBOjet/hUr+/kBwyViifiC8LHzchwM=
 =Nfnf
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "pid: make sub-init creation retryable" (Oleg Nesterov)

   Make creation of init in a new namespace more robust by clearing away
   some historical cruft which is no longer needed. Also some
   documentation fixups

 - "selftests/fchmodat2: Error handling and general" (Mark Brown)

   Fix and a cleanup for the fchmodat2() syscall selftest

 - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko)

 - "hung_task: Provide runtime reset interface for hung task detector"
   (Aaron Tomlin)

   Give administrators the ability to zero out
   /proc/sys/kernel/hung_task_detect_count

 - "tools/getdelays: use the static UAPI headers from
   tools/include/uapi" (Thomas Weißschuh)

   Teach getdelays to use the in-kernel UAPI headers rather than the
   system-provided ones

 - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta)

   Several cleanups and fixups to the hardlockup detector code and its
   documentation

 - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law)

   A couple of small/theoretical fixes in the bch code

 - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo)

 - "cleanup the RAID5 XOR library" (Christoph Hellwig)

   A quite far-reaching cleanup to this code. I can't do better than to
   quote Christoph:

     "The XOR library used for the RAID5 parity is a bit of a mess right
      now. The main file sits in crypto/ despite not being cryptography
      and not using the crypto API, with the generic implementations
      sitting in include/asm-generic and the arch implementations
      sitting in an asm/ header in theory. The latter doesn't work for
      many cases, so architectures often build the code directly into
      the core kernel, or create another module for the architecture
      code.

      Change this to a single module in lib/ that also contains the
      architecture optimizations, similar to the library work Eric
      Biggers has done for the CRC and crypto libraries later. After
      that it changes to better calling conventions that allow for
      smarter architecture implementations (although none is contained
      here yet), and uses static_call to avoid indirection function call
      overhead"

 - "lib/list_sort: Clean up list_sort() scheduling workarounds"
   (Kuan-Wei Chiu)

   Clean up this library code by removing a hacky thing which was added
   for UBIFS, which UBIFS doesn't actually need

 - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt)

   Fix a few bugs in the scatterlist code, add in-kernel tests for the
   now-fixed bugs and fix a leak in the test itself

 - "kdump: Enable LUKS-encrypted dump target support in ARM64 and
   PowerPC" (Coiby Xu)

   Enable support of the LUKS-encrypted device dump target on arm64 and
   powerpc

 - "ocfs2: consolidate extent list validation into block read callbacks"
   (Joseph Qi)

   Cleanup, simplify, and make more robust ocfs2's validation of extent
   list fields (Kernel test robot loves mounting corrupted fs images!)

* tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits)
  ocfs2: validate group add input before caching
  ocfs2: validate bg_bits during freefrag scan
  ocfs2: fix listxattr handling when the buffer is full
  doc: watchdog: fix typos etc
  update Sean's email address
  ocfs2: use get_random_u32() where appropriate
  ocfs2: split transactions in dio completion to avoid credit exhaustion
  ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()
  ocfs2: validate extent block list fields during block read
  ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()
  ocfs2: validate dx_root extent list fields during block read
  ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
  ocfs2: handle invalid dinode in ocfs2_group_extend
  .get_maintainer.ignore: add Askar
  ocfs2: validate bg_list extent bounds in discontig groups
  checkpatch: exclude forward declarations of const structs
  tools/accounting: handle truncated taskstats netlink messages
  taskstats: set version in TGID exit notifications
  ocfs2/heartbeat: fix slot mapping rollback leaks on error paths
  arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
  ...
2026-04-16 20:11:56 -07:00
Linus Torvalds
334fbe734e mm.git review status for linus..mm-stable
Everything:
 
 Total patches:       368
 Reviews/patch:       1.56
 Reviewed rate:       74%
 
 Excluding DAMON:
 
 Total patches:       316
 Reviews/patch:       1.77
 Reviewed rate:       81%
 
 Excluding DAMON and zram:
 
 Total patches:       306
 Reviews/patch:       1.81
 Reviewed rate:       82%
 
 Excluding DAMON, zram and maple_tree:
 
 Total patches:       276
 Reviews/patch:       2.01
 Reviewed rate:       91%
 
 Significant patch series in this merge:
 
 - The 30 patch series "maple_tree: Replace big node with maple copy"
   from Liam Howlett is mainly prepararatory work for ongoing development
   but it does reduce stack usage and is an improvement.
 
 - The 12 patch series "mm, swap: swap table phase III: remove swap_map"
   from Kairui Song offers memory savings by removing the static swap_map.
   It also yields some CPU savings and implements several cleanups.
 
 - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush
   Yadav adds file seal preservation to LUO's memfd code.
 
 - The 2 patch series "mm: zswap: add per-memcg stat for incompressible
   pages" from Jiayuan Chen adds additional userspace stats reportng to
   zswap.
 
 - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike
   Rapoport implements some cleanups for our handling of ZERO_PAGE() and
   zero_pfn.
 
 - The 2 patch series "mm/kmemleak: Improve scan_should_stop()
   implementation" from Zhongqiu Han provides an robustness improvement and
   some cleanups in the kmemleak code.
 
 - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang
   "improves the khugepaged scan logic and reduces CPU consumption by
   prioritizing scanning tasks that access memory frequently".
 
 - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies
   Kexec Handover by "transitioning KHO from an xarray-based metadata
   tracking system with serialization to a radix tree data structure that
   can be passed directly to the next kernel"
 
 - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan
   tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's
   tracepointing.
 
 - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper
   and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow
   stack code: remove per-arch code in favour of a generic implementation.
 
 - The 2 patch series "Fix KASAN support for KHO restored vmalloc
   regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO
   restores a vmalloc area.
 
 - The 4 patch series "mm: Remove stray references to pagevec" from Tal
   Zussman provides several cleanups, mainly udpating references to "struct
   pagevec", which became folio_batch three years ago.
 
 - The 17 patch series "mm: Eliminate fake head pages from vmemmap
   optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap
   optimization (HVO) by changing how tail pages encode their relationship
   to the head page.
 
 - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for
   core layer filters" from SeongJae Park improves two problematic
   behaviors of DAMOS that makes it less efficient when core layer filters
   are used.
 
 - The 3 patch series "mm/damon: strictly respect min_nr_regions" from
   SeongJae Park improves DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter.
 
 - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil
   Babka is a proper fix for a previously hotfixed SMP=n issue.  Code
   simplifications and cleanups ennsed.
 
 - The 16 patch series "mm: cleanups around unmapping / zapping" from
   David Hildenbrand implements "a bunch of cleanups around unmapping and
   zapping.  Mostly simplifications, code movements, documentation and
   renaming of zapping functions".
 
 - The 6 patch series "support batched checking of the young flag for
   MGLRU" from Baolin Wang supports batched checking of the young flag for
   MGLRU.  It's part cleanups; one benchmark shows large performance
   benefits for arm64.
 
 - The 5 patch series "memcg: obj stock and slab stat caching cleanups"
   from Johannes Weiner provides memcg cleanup and robustness improvements.
 
 - The 5 patch series "Allow order zero pages in page reporting" from
   Yuvraj Sakshith enhances page_reporting's free page reporting - it is
   presently and undesirably order-0 pages when reporting free memory.
 
 - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is
   cleanup work following from the recent conversion of the VMA flags to a
   bitmap.
 
 - The 10 patch series "mm/damon: add optional debugging-purpose sanity
   checks" from SeongJae Park adds some more developer-facing debug checks
   into DAMON core.
 
 - The 2 patch series "mm/damon: test and document power-of-2
   min_region_sz requirement" from SeongJae Park adds an additional DAMON
   kunit test and makes some adjustments to the addr_unit parameter
   handling.
 
 - The 3 patch series "mm/damon/core: make passed_sample_intervals
   comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time
   overflow issue in DAMON core.
 
 - The 7 patch series "mm/damon: improve/fixup/update ratio calculation,
   test and documentation" from SeongJae Park is a "batch of misc/minor
   improvements and fixups" for DAMON.
 
 - The 4 patch series "mm: move vma_(kernel|mmu)_pagesize() out of
   hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device
   when CONFIG_HUGETLB=n.  Some code movement was required.
 
 - The 6 patch series "zram: recompression cleanups and tweaks" from
   Sergey Senozhatsky provides "a somewhat random mix of fixups,
   recompression cleanups and improvements" in the zram code.
 
 - The 11 patch series "mm/damon: support multiple goal-based quota
   tuning algorithms" from SeongJae Park extend DAMOS quotas goal
   auto-tuning to support multiple tuning algorithms that users can select.
 
 - The 4 patch series "mm: thp: reduce unnecessary
   start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs
   handling so we no longer spam the logs with reams of junk when
   starting/stopping khugepaged.
 
 - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes
   provides some cleanups and slight fixes in the mremap, mmap and vma
   code.
 
 - The 5 patch series "mm/damon: support addr_unit on default monitoring
   targets for modules" from SeongJae Park extends the use of DAMON core's
   addr_unit tunable.
 
 - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites"
   from Nico Pache provides cleanups in the khugepaged and is a base for
   Nico's planned khugepaged mTHP support.
 
 - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups"
   from David Hildenbrand implements code movement and cleanups in the
   memhotplug and sparsemem code.
 
 - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and
   cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some
   memhotplug Kconfig support.
 
 - The 6 patch series "change young flag check functions to return bool"
   from Baolin Wang is "a cleanup patchset to change all young flag check
   functions to return bool".
 
 - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL
   dereference issues" from Josh Law and SeongJae Park fixes a few
   potential DAMON bugs.
 
 - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma
   code" from "converts a lot of the existing use of the legacy vm_flags_t
   data type to the new vma_flags_t type which replaces it".  Mainly in the
   vma code.
 
 - The 21 patch series "mm: expand mmap_prepare functionality and usage"
   from Lorenzo Stoakes "expands the mmap_prepare functionality, which is
   intended to replace the deprecated f_op->mmap hook which has been the
   source of bugs and security issues for some time".  Cleanups,
   documentation, extension of mmap_prepare into filesystem drivers.
 
 - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from
   Lorenzo Stoakes simplifies and cleans up zap_huge_pmd().  Additional
   cleanups around vm_normal_folio_pmd() and the softleaf functionality are
   performed.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA
 jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi
 GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ=
 =1Q7o
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "maple_tree: Replace big node with maple copy" (Liam Howlett)

   Mainly prepararatory work for ongoing development but it does reduce
   stack usage and is an improvement.

 - "mm, swap: swap table phase III: remove swap_map" (Kairui Song)

   Offers memory savings by removing the static swap_map. It also yields
   some CPU savings and implements several cleanups.

 - "mm: memfd_luo: preserve file seals" (Pratyush Yadav)

   File seal preservation to LUO's memfd code

 - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
   Chen)

   Additional userspace stats reportng to zswap

 - "arch, mm: consolidate empty_zero_page" (Mike Rapoport)

   Some cleanups for our handling of ZERO_PAGE() and zero_pfn

 - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
   Han)

   A robustness improvement and some cleanups in the kmemleak code

 - "Improve khugepaged scan logic" (Vernon Yang)

   Improve khugepaged scan logic and reduce CPU consumption by
   prioritizing scanning tasks that access memory frequently

 - "Make KHO Stateless" (Jason Miu)

   Simplify Kexec Handover by transitioning KHO from an xarray-based
   metadata tracking system with serialization to a radix tree data
   structure that can be passed directly to the next kernel

 - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
   Ballasi and Steven Rostedt)

   Enhance vmscan's tracepointing

 - "mm: arch/shstk: Common shadow stack mapping helper and
   VM_NOHUGEPAGE" (Catalin Marinas)

   Cleanup for the shadow stack code: remove per-arch code in favour of
   a generic implementation

 - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)

   Fix a WARN() which can be emitted the KHO restores a vmalloc area

 - "mm: Remove stray references to pagevec" (Tal Zussman)

   Several cleanups, mainly udpating references to "struct pagevec",
   which became folio_batch three years ago

 - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
   Shutsemau)

   Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
   pages encode their relationship to the head page

 - "mm/damon/core: improve DAMOS quota efficiency for core layer
   filters" (SeongJae Park)

   Improve two problematic behaviors of DAMOS that makes it less
   efficient when core layer filters are used

 - "mm/damon: strictly respect min_nr_regions" (SeongJae Park)

   Improve DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter

 - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)

   The proper fix for a previously hotfixed SMP=n issue. Code
   simplifications and cleanups ensued

 - "mm: cleanups around unmapping / zapping" (David Hildenbrand)

   A bunch of cleanups around unmapping and zapping. Mostly
   simplifications, code movements, documentation and renaming of
   zapping functions

 - "support batched checking of the young flag for MGLRU" (Baolin Wang)

   Batched checking of the young flag for MGLRU. It's part cleanups; one
   benchmark shows large performance benefits for arm64

 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)

   memcg cleanup and robustness improvements

 - "Allow order zero pages in page reporting" (Yuvraj Sakshith)

   Enhance free page reporting - it is presently and undesirably order-0
   pages when reporting free memory.

 - "mm: vma flag tweaks" (Lorenzo Stoakes)

   Cleanup work following from the recent conversion of the VMA flags to
   a bitmap

 - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
   Park)

   Add some more developer-facing debug checks into DAMON core

 - "mm/damon: test and document power-of-2 min_region_sz requirement"
   (SeongJae Park)

   An additional DAMON kunit test and makes some adjustments to the
   addr_unit parameter handling

 - "mm/damon/core: make passed_sample_intervals comparisons
   overflow-safe" (SeongJae Park)

   Fix a hard-to-hit time overflow issue in DAMON core

 - "mm/damon: improve/fixup/update ratio calculation, test and
   documentation" (SeongJae Park)

   A batch of misc/minor improvements and fixups for DAMON

 - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
   Hildenbrand)

   Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
   movement was required.

 - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)

   A somewhat random mix of fixups, recompression cleanups and
   improvements in the zram code

 - "mm/damon: support multiple goal-based quota tuning algorithms"
   (SeongJae Park)

   Extend DAMOS quotas goal auto-tuning to support multiple tuning
   algorithms that users can select

 - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)

   Fix the khugpaged sysfs handling so we no longer spam the logs with
   reams of junk when starting/stopping khugepaged

 - "mm: improve map count checks" (Lorenzo Stoakes)

   Provide some cleanups and slight fixes in the mremap, mmap and vma
   code

 - "mm/damon: support addr_unit on default monitoring targets for
   modules" (SeongJae Park)

   Extend the use of DAMON core's addr_unit tunable

 - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)

   Cleanups to khugepaged and is a base for Nico's planned khugepaged
   mTHP support

 - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)

   Code movement and cleanups in the memhotplug and sparsemem code

 - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
   CONFIG_MIGRATION" (David Hildenbrand)

   Rationalize some memhotplug Kconfig support

 - "change young flag check functions to return bool" (Baolin Wang)

   Cleanups to change all young flag check functions to return bool

 - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
   Law and SeongJae Park)

   Fix a few potential DAMON bugs

 - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
   Stoakes)

   Convert a lot of the existing use of the legacy vm_flags_t data type
   to the new vma_flags_t type which replaces it. Mainly in the vma
   code.

 - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)

   Expand the mmap_prepare functionality, which is intended to replace
   the deprecated f_op->mmap hook which has been the source of bugs and
   security issues for some time. Cleanups, documentation, extension of
   mmap_prepare into filesystem drivers

 - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)

   Simplify and clean up zap_huge_pmd(). Additional cleanups around
   vm_normal_folio_pmd() and the softleaf functionality are performed.

* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm: fix deferred split queue races during migration
  mm/khugepaged: fix issue with tracking lock
  mm/huge_memory: add and use has_deposited_pgtable()
  mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
  mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
  mm/huge_memory: separate out the folio part of zap_huge_pmd()
  mm/huge_memory: use mm instead of tlb->mm
  mm/huge_memory: remove unnecessary sanity checks
  mm/huge_memory: deduplicate zap deposited table call
  mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
  mm/huge_memory: add a common exit path to zap_huge_pmd()
  mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
  mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
  mm/huge: avoid big else branch in zap_huge_pmd()
  mm/huge_memory: simplify vma_is_specal_huge()
  mm: on remap assert that input range within the proposed VMA
  mm: add mmap_action_map_kernel_pages[_full]()
  uio: replace deprecated mmap hook with mmap_prepare in uio_info
  drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
  mm: allow handling of stacked mmap_prepare hooks in more drivers
  ...
2026-04-15 12:59:16 -07:00
Linus Torvalds
3203a08c12 powerpc updates for 7.1
- powerpc support for huge pfnmaps
 
  - Cleanups to use masked user access
 
  - Rework pnv_ioda_pick_m64_pe() to use better bitmap API
 
  - Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
 
  - Backup region offset update to eflcorehdr
 
  - Fixes for wii/ps3 platform
 
  - Implement JIT support for private stack in powerpc
 
  - Implement JIT support for fsession in powerpc64 trampoline
 
  - Add support for instruction array and indirect jump in powerpc
 
  - Misc selftest fixes and cleanups
 
 Thanks to:
 Abhishek Dubey, Aditya Gupta, Alex Williamson, Amit Machhiwal, Andrew
 Donnellan, Bartosz Golaszewski, Cédric Le Goater, Chen Ni, Christophe Leroy (CS
 GROUP), Hari Bathini, J. Neuschäfer, Mukesh Kumar Chaurasiya (IBM), Nam Cao,
 Nilay Shroff, Pavithra Prakash, Randy Dunlap, Ritesh Harjani (IBM), Shrikanth
 Hegde, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote, Yury Norov (NVIDIA)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmnc9bQACgkQpnEsdPSH
 ZJS0wg//TIma7bIPw0rFNnDWRHJq2YqXCSOKjU24gkYPD+T5lvuE4JtkYx1U7FOC
 yByjCwmWg0RMOlcDC0vtyqvcq8fZSc5VwrebjZaHtEcRFkcip5r57A1DkV1TLWF/
 3EXQzhOzh5Sf6+A7leg4hVhk0ZvCG/DFqKCW6s1uBBBALD2nv9oC9WLzdZeutiWm
 hC+f1pFbfr4ZAAcjeTTBLJPLa4ayWUbszseW2CX9B3Rf041fh4puAH709Wlf9gYz
 7Gh7ZS/XhLYi5urH9bd7+0CTgDxeYI9D/0hBh1szXFHYB2N+1OdQNmMz0TfKlzsB
 byHGd9OW5EnfwKOFb9/wYQI6vODoZRSVB74oSHlJ7GoLe8FnNnBhqUaYuorJIYJF
 j8wuhOz/fdC8EWpol2lTsnQ+Lvo7N/YG7BIa2SiSTwpnOoZ2iQR3GeZcp4TCW8A/
 ZUHEUuaM/zuNZxIZqxnM7wHBE+nGHjOcfRMZcZz1kwanw+Xyu0lDrTbn0O6h404R
 p0N/GG5Q8phVD/GQaXi3jt5/316D+SOTo4hhl7nyP1/UFvqH8YEEwXy2KnldBLKH
 C0S0pdmOx03fgjpG+Utkk2zYpz06KfYL58tiEaCFikoBRtQBfTFq7qyqNrgYRx1V
 7h4lnjRqTLjyurBUAUPNZPHC6BY1inv3wJj1x5ooRbaPSDXw5VU=
 =P25l
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Madhavan Srinivasan:

 - powerpc support for huge pfnmaps

 - Cleanups to use masked user access

 - Rework pnv_ioda_pick_m64_pe() to use better bitmap API

 - Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC

 - Backup region offset update to eflcorehdr

 - Fixes for wii/ps3 platform

 - Implement JIT support for private stack in powerpc

 - Implement JIT support for fsession in powerpc64 trampoline

 - Add support for instruction array and indirect jump in powerpc

 - Misc selftest fixes and cleanups

Thanks to Abhishek Dubey, Aditya Gupta, Alex Williamson, Amit Machhiwal,
Andrew Donnellan, Bartosz Golaszewski, Cédric Le Goater, Chen Ni,
Christophe Leroy (CS GROUP), Hari Bathini, J. Neuschäfer, Mukesh Kumar
Chaurasiya (IBM), Nam Cao, Nilay Shroff, Pavithra Prakash, Randy Dunlap,
Ritesh Harjani (IBM), Shrikanth Hegde, Sourabh Jain, Vaibhav Jain,
Venkat Rao Bagalkote, and Yury Norov (NVIDIA)

* tag 'powerpc-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (47 commits)
  mailmap: Add entry for Andrew Donnellan
  powerpc32/bpf: fix loading fsession func metadata using PPC_LI32
  selftest/bpf: Enable gotox tests for powerpc64
  powerpc64/bpf: Add support for indirect jump
  selftest/bpf: Enable instruction array test for powerpc
  powerpc/bpf: Add support for instruction array
  powerpc32/bpf: Add fsession support
  powerpc64/bpf: Implement fsession support
  selftests/bpf: Enable private stack tests for powerpc64
  powerpc64/bpf: Implement JIT support for private stack
  powerpc: pci-ioda: Optimize pnv_ioda_pick_m64_pe()
  powerpc: pci-ioda: use bitmap_alloc() in pnv_ioda_pick_m64_pe()
  powerpc/net: Inline checksum wrappers and convert to scoped user access
  powerpc/sstep: Convert to scoped user access
  powerpc/align: Convert emulate_spe() to scoped user access
  powerpc/ptrace: Convert gpr32_set_common_user() to scoped user access
  powerpc/futex: Use masked user access
  powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
  cpuidle: powerpc: avoid double clear when breaking snooze
  powerpc/ps3: spu.c: fix enum and Return kernel-doc warnings
  ...
2026-04-14 17:10:15 -07:00
Linus Torvalds
f21f7b5162 Update to the VDSO subsystem:
- Make the handling of compat functions consistent and more robust
 
      - Rework the underlying data store so that it is dynamically
        allocated, which allows the conversion of the last holdout SPARC64
        to the generic VDSO implementation
 
      - Rework the SPARC64 VDSO to utilize the generic implementation
 
      - Mop up the left overs of the non-generic VDSO support in the core
        code.
 
      - Expand the VDSO selftest and make them more robust
 
      - Allow time namespaces to be enabled independently of the generic
        VDSO support, which was not possible before due to SPARC64 not
        using it.
 
      - Various cleanups and improvements in the related code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmnb0v8QHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYocfqD/9ywgnvwRH6B612mY4PI3qCbLHs6n9f78aH
 YwyXmmfBZ5vt1ZtptHD+BAxiIMm9GC+/exdj5zhcOWucnBVhorcloE6evxhkJAMn
 RhTQFKkEmcA/UV2Yfct9r+33kgZRyu4IIul4J7hgn2o5T1BqwZbOil0W/O5adr5P
 MDLxjT1OLV80ZZWI9qbWcR/aR7W7sHcdwfVPPqjhombRY7f391Mo3dZeM5C2y55x
 8TXCEqVpN1RJzFinWEgQN7QpP4OmF0rRuXSrDQpkH6pk/+RSqNlT/QGG7MJtmCQR
 E6CeBjNRUn318KiroaGyTKlM9xsL3gNoiCY24ZTwzZxx3g5gSAR3KTCTJhQU0hpu
 Svxj+ksqEAyW7fAOIsbce6W8fUPKC2KM+juXgPKcqZ5hjE2fALD+eEYMlq00jSiu
 sj71007cM9tZKOXPdWs3Fv7AY2Yj7iiRiRz9gv1wqS1z7ybxiaFjxjLYYakej0tr
 rmwBDEGhNow7msZZttr01BRZk9hDUWfIiJtL+0BrgRLNzst2A7WoagtZ2s0Z7Psl
 RjtWgYNBDJ878xK0J+Djqb9TyLraGWZShIIna9uYCAJX9i954xfKJ//NOnUkZhcl
 jslDLHhdttyJ+TmgIsc1ntUGvYvHqH5ywQpyDfWepMKyIYdaJLHOr2K6bwFnGHdw
 uocXvLrkXw==
 =8ixX
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull vdso updates from Thomas Gleixner:

 - Make the handling of compat functions consistent and more robust

 - Rework the underlying data store so that it is dynamically allocated,
   which allows the conversion of the last holdout SPARC64 to the
   generic VDSO implementation

 - Rework the SPARC64 VDSO to utilize the generic implementation

 - Mop up the left overs of the non-generic VDSO support in the core
   code

 - Expand the VDSO selftest and make them more robust

 - Allow time namespaces to be enabled independently of the generic VDSO
   support, which was not possible before due to SPARC64 not using it

 - Various cleanups and improvements in the related code

* tag 'timers-vdso-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  timens: Use task_lock guard in timens_get*()
  timens: Use mutex guard in proc_timens_set_offset()
  timens: Simplify some calls to put_time_ns()
  timens: Add a __free() wrapper for put_time_ns()
  timens: Remove dependency on the vDSO
  vdso/timens: Move functions to new file
  selftests: vDSO: vdso_test_correctness: Add a test for time()
  selftests: vDSO: vdso_test_correctness: Use facilities from parse_vdso.c
  selftests: vDSO: vdso_test_correctness: Handle different tv_usec types
  selftests: vDSO: vdso_test_correctness: Drop SYS_getcpu fallbacks
  selftests: vDSO: vdso_test_gettimeofday: Remove nolibc checks
  Revert "selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers"
  random: vDSO: Remove ifdeffery
  random: vDSO: Trim vDSO includes
  vdso/datapage: Trim down unnecessary includes
  vdso/datapage: Remove inclusion of gettimeofday.h
  vdso/helpers: Explicitly include vdso/processor.h
  vdso/gettimeofday: Add explicit includes
  random: vDSO: Add explicit includes
  MIPS: vdso: Explicitly include asm/vdso/vdso.h
  ...
2026-04-14 10:53:44 -07:00
Linus Torvalds
fdcbb1bc06 Merge branch 'nocache-cleanup'
This series cleans up some of the special user copy functions naming and
semantics.  In particular, get rid of the (very traditional) double
underscore names and behavior: the whole "optimize away the range check"
model has been largely excised from the other user accessors because
it's so subtle and can be unsafe, but also because it's just not a
relevant optimization any more.

To do that, a couple of drivers that misused the "user" copies as kernel
copies in order to get non-temporal stores had to be fixed up, but that
kind of code should never have been allowed anyway.

The x86-only "nocache" version was also renamed to more accurately
reflect what it actually does.

This was all done because I looked at this code due to a report by Jann
Horn, and I just couldn't stand the inconsistent naming, the horrible
semantics, and the random misuse of these functions.  This code should
probably be cleaned up further, but it's at least slightly closer to
normal semantics.

I had a more intrusive series that went even further in trying to
normalize the semantics, but that ended up hitting so many other
inconsistencies between different architectures in this area (eg
'size_t' vs 'unsigned long' vs 'int' as size arguments, and various
iovec check differences that Vasily Gorbik pointed out) that I ended up
with this more limited version that fixed the worst of the issues.

Reported-by: Jann Horn <jannh@google.com>
Tested-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/all/CAHk-=wgg1QVWNWG-UCFo1hx0zqrPnB3qhPzUTrWNft+MtXQXig@mail.gmail.com/

* nocache-cleanup:
  x86-64/arm64/powerpc: clean up and rename __copy_from_user_flushcache
  x86: rename and clean up __copy_from_user_inatomic_nocache()
  x86-64: rename misleadingly named '__copy_user_nocache()' function
2026-04-13 08:39:51 -07:00
Lorenzo Stoakes (Oracle)
3a6455d56b mm: convert do_brk_flags() to use vma_flags_t
In order to be able to do this, we need to change VM_DATA_DEFAULT_FLAGS
and friends and update the architecture-specific definitions also.

We then have to update some KSM logic to handle VMA flags, and introduce
VMA_STACK_FLAGS to define the vma_flags_t equivalent of VM_STACK_FLAGS.

We also introduce two helper functions for use during the time we are
converting legacy flags to vma_flags_t values - vma_flags_to_legacy() and
legacy_to_vma_flags().

This enables us to iteratively make changes to break these changes up into
separate parts.

We use these explicitly here to keep VM_STACK_FLAGS around for certain
users which need to maintain the legacy vm_flags_t values for the time
being.

We are no longer able to rely on the simple VM_xxx being set to zero if
the feature is not enabled, so in the case of VM_DROPPABLE we introduce
VMA_DROPPABLE as the vma_flags_t equivalent, which is set to
EMPTY_VMA_FLAGS if the droppable flag is not available.

While we're here, we make the description of do_brk_flags() into a kdoc
comment, as it almost was already.

We use vma_flags_to_legacy() to not need to update the vm_get_page_prot()
logic as this time.

Note that in create_init_stack_vma() we have to replace the BUILD_BUG_ON()
with a VM_WARN_ON_ONCE() as the tested values are no longer build time
available.

We also update mprotect_fixup() to use VMA flags where possible, though we
have to live with a little duplication between vm_flags_t and vma_flags_t
values for the time being until further conversions are made.

While we're here, update VM_SPECIAL to be defined in terms of
VMA_SPECIAL_FLAGS now we have vma_flags_to_legacy().

Finally, we update the VMA tests to reflect these changes.

Link: https://lkml.kernel.org/r/d02e3e45d9a33d7904b149f5604904089fd640ae.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>	[SELinux]
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:40 -07:00
Baolin Wang
fb87c88272 mm: change to return bool for pudp_test_and_clear_young()
The pudp_test_and_clear_young() is used to clear the young flag, returning
whether the young flag was set for this PUD entry.  Change the return type
to bool to make the intention clearer.

Link: https://lkml.kernel.org/r/2c56fe52c1bf9404145274d7e91d4a65060f6c7c.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:35 -07:00
Baolin Wang
42e26354c4 mm: change to return bool for pmdp_test_and_clear_young()
Callers use pmdp_test_and_clear_young() to clear the young flag and check
whether it was set for this PMD entry.  Change the return type to bool to
make the intention clearer.

Link: https://lkml.kernel.org/r/f1d31307a13365d3d0fed5809727dcc2dd59631b.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:35 -07:00
Baolin Wang
06c4dfa3ce mm: change to return bool for ptep_clear_flush_young()/clear_flush_young_ptes()
The ptep_clear_flush_young() and clear_flush_young_ptes() are used to
clear the young flag and flush the TLB, returning whether the young flag
was set.  Change the return type to bool to make the intention clearer.

Link: https://lkml.kernel.org/r/24af5144b96103631594501f77d4525f2475c1be.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:35 -07:00
Baolin Wang
a62ca3f40f mm: change to return bool for ptep_test_and_clear_young()
Patch series "change young flag check functions to return bool", v2.

This is a cleanup patchset to change all young flag check functions to
return bool, as discussed with David in the previous thread[1].  Since
callers only care about whether the young flag was set, returning bool
makes the intention clearer.  No functional changes intended.


This patch (of 6):

Callers use ptep_test_and_clear_young() to clear the young flag and check
whether it was set.  Change the return type to bool to make the intention
clearer.

Link: https://lkml.kernel.org/r/cover.1774075004.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/57e70efa9703d43959aa645246ea3cbdba14fa17.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:35 -07:00
Mike Rapoport (Microsoft)
6215d9f447 arch, mm: consolidate empty_zero_page
Reduce 22 declarations of empty_zero_page to 3 and 23 declarations of
ZERO_PAGE() to 4.

Every architecture defines empty_zero_page that way or another, but for the
most of them it is always a page aligned page in BSS and most definitions
of ZERO_PAGE do virt_to_page(empty_zero_page).

Move Linus vetted x86 definition of empty_zero_page and ZERO_PAGE() to the
core MM and drop these definitions in architectures that do not implement
colored zero page (MIPS and s390).

ZERO_PAGE() remains a macro because turning it to a wrapper for a static
inline causes severe pain in header dependencies.

For the most part the change is mechanical, with these being noteworthy:

* alpha: aliased empty_zero_page with ZERO_PGE that was also used for boot
  parameters. Switching to a generic empty_zero_page removes the aliasing
  and keeps ZERO_PGE for boot parameters only
* arm64: uses __pa_symbol() in ZERO_PAGE() so that definition of
  ZERO_PAGE() is kept intact.
* m68k/parisc/um: allocated empty_zero_page from memblock,
  although they do not support zero page coloring and having it in BSS
  will work fine.
* sparc64 can have empty_zero_page in BSS rather allocate it, but it
  can't use virt_to_page() for BSS. Keep it's definition of ZERO_PAGE()
  but instead of allocating it, make mem_map_zero point to
  empty_zero_page.
* sh: used empty_zero_page for boot parameters at the very early boot.
  Rename the parameters page to boot_params_page and let sh use the generic
  empty_zero_page.
* hexagon: had an amusing comment about empty_zero_page

	/* A handy thing to have if one has the RAM. Declared in head.S */

  that unfortunately had to go :)

Link: https://lkml.kernel.org/r/20260211103141.3215197-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Helge Deller <deller@gmx.de>		[parisc]
Tested-by: Helge Deller <deller@gmx.de>		[parisc]
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Magnus Lindholm <linmag7@gmail.com>	[alpha]
Acked-by: Dinh Nguyen <dinguyen@kernel.org>	[nios2]
Acked-by: Andreas Larsson <andreas@gaisler.com>	[sparc]
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:01 -07:00
Seongsu Park
3d443691ed mm/pkeys: remove unused tsk parameter from arch_set_user_pkey_access()
The tsk parameter in arch_set_user_pkey_access() is never used in the
function implementations across all architectures (arm64, powerpc, x86).

Link: https://lkml.kernel.org/r/20260219063506.545148-1-sgsu.park@samsung.com
Signed-off-by: Seongsu Park <sgsu.park@samsung.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:52:57 -07:00
Christoph Hellwig
e20043b476 xor: make xor.ko self-contained in lib/raid/
Move the asm/xor.h headers to lib/raid/xor/$(SRCARCH)/xor_arch.h and
include/linux/raid/xor_impl.h to lib/raid/xor/xor_impl.h so that the
xor.ko module implementation is self-contained in lib/raid/.

As this remove the asm-generic mechanism a new kconfig symbol is added to
indicate that a architecture-specific implementations exists, and
xor_arch.h should be included.

Link: https://lkml.kernel.org/r/20260327061704.3707577-22-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-02 23:36:20 -07:00
Christoph Hellwig
3f276cece4 powerpc: move the XOR code to lib/raid/
Move the optimized XOR into lib/raid and include it it in xor.ko instead
of always building it into the main kernel image.

Link: https://lkml.kernel.org/r/20260327061704.3707577-16-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-02 23:36:18 -07:00
Christoph Hellwig
35ebc4de10 xor: remove macro abuse for XOR implementation registrations
Drop the pretty confusing historic XOR_TRY_TEMPLATES and
XOR_SELECT_TEMPLATE, and instead let the architectures provide a
arch_xor_init that calls either xor_register to register candidates or
xor_force to force a specific implementation.

Link: https://lkml.kernel.org/r/20260327061704.3707577-10-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-02 23:36:17 -07:00
Christophe Leroy (CS GROUP)
cae734710d powerpc/net: Inline checksum wrappers and convert to scoped user access
Commit 861574d51b ("powerpc/uaccess: Implement masked user access")
provides optimised user access by avoiding the cost of access_ok().

Convert csum_and_copy_to_user() and csum_and_copy_from_user() to
scoped user access to benefit from masked user access.

csum_and_copy_to_user() and csum_and_copy_from_user() are only
called respectively by csum_and_copy_to_iter() and
csum_and_copy_from_iter_full() and they are only called twice.

Those functions used to be large but they were first reduced by
commit c693cc4676 ("saner calling conventions for
csum_and_copy_..._user()") then commit 70d65cd555 ("ppc: propagate
the calling conventions change down to csum_partial_copy_generic()").
With the additional size reduction provided by conversion to scoped
user access they are not worth being kept out of line.

  $ ./scripts/bloat-o-meter vmlinux.0 vmlinux.1
  add/remove: 0/2 grow/shrink: 2/0 up/down: 136/-176 (-40)
  Function                                     old     new   delta
  csum_and_copy_to_iter                       2416    2488     +72
  csum_and_copy_from_iter_full                2272    2336     +64
  csum_and_copy_to_user                         88       -     -88
  csum_and_copy_from_user                       88       -     -88
  Total: Before=11514471, After=11514431, chg -0.00%

Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/f44e1b2760dbed35b237040001a91bc8304b726b.1773137098.git.chleroy@kernel.org
2026-04-01 09:21:07 +05:30
Christophe Leroy (CS GROUP)
40a1b9d044 powerpc/futex: Use masked user access
Commit 861574d51b ("powerpc/uaccess: Implement masked user access")
provides optimised user access by avoiding the cost of access_ok().

Use masked user access in arch_futex_atomic_op_inuser()

Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/e29f6a5c14e5938df68d94bfac6b2f762fb922aa.1773136636.git.chleroy@kernel.org
2026-04-01 09:21:06 +05:30
Christophe Leroy
f26ad12356 powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
Commit e65e1fc2d2 ("[PATCH] syscall class hookup for all normal
targets") added generic support for AUDIT but that didn't include
support for bi-arch like powerpc.

Commit 4b58841149 ("audit: Add generic compat syscall support")
added generic support for bi-arch.

Convert powerpc to that bi-arch generic audit support.

With this change generated text is similar.

Thomas has confirmed that the previously failing filter_exclude/test
is now successful both without and with this patch, see [1]

[1] https://lore.kernel.org/all/20260306115350-ef265661-6d6b-4043-9bd0-8e6b437d0d67@linutronix.de/

Link: https://github.com/linuxppc/issues/issues/412
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/261b1be5b8dc526b83d73e8281e682a73536ea28.1773155031.git.chleroy@kernel.org
2026-04-01 09:21:06 +05:30
Randy Dunlap
d1e6f90d6b powerpc/ps3: fix ps3.h kernel-doc warnings
Fix some kernel-doc warnings in ps3.h:

- add @dev to struct ps3_dma_region
- don't mark a function as "struct"
- add Returns: description for one function
- add a short description for ps3_system_bus_set_drvdata()
- correct an enum @name
- move intervening "struct ps3_system_bus_device;" from between
  kernel-doc for ps3_dma_region_init() and the function declaration

to eliminate these warnings:

Warning: arch/powerpc/include/asm/ps3.h:96 struct member 'dev' not
 described in 'ps3_dma_region'
Warning: arch/powerpc/include/asm/ps3.h:118 struct ps3_system_bus_device;
 error: Cannot parse struct or union!
Warning: arch/powerpc/include/asm/ps3.h:166 int
 ps3_mmio_region_init(struct ps3_system_bus_device *dev, struct
 ps3_mmio_region *r, unsigned long bus_addr, unsigned long len, enum
 ps3_mmio_page_size page_size); error: Cannot parse struct or union!
Warning: arch/powerpc/include/asm/ps3.h:167 No description found for
 return value of 'ps3_mmio_region_init'
Warning: arch/powerpc/include/asm/ps3.h:407 missing initial short
 description on line:
 * ps3_system_bus_set_drvdata -
Warning: arch/powerpc/include/asm/ps3.h:473 Enum value
 'PS3_LPM_TB_TYPE_INTERNAL' not described in enum 'ps3_lpm_tb_type'
Warning: arch/powerpc/include/asm/ps3.h:473 Excess enum value
 '@PS3_LPM_RIGHTS_USE_TB' description in 'ps3_lpm_tb_type'

This leaves struct members in several structs and function parameters in
one function still undescribed.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20251129183636.1893634-1-rdunlap@infradead.org
2026-04-01 09:21:06 +05:30
Sourabh Jain
f53b24d1fa powerpc/crash: Update backup region offset in elfcorehdr on memory hotplug
When elfcorehdr is prepared for kdump, the program header representing
the first 64 KB of memory is expected to have its offset point to the
backup region. This is required because purgatory copies the first 64 KB
of the crashed kernel memory to this backup region following a kernel
crash. This allows the capture kernel to use the first 64 KB of memory
to place the exception vectors and other required data.

When elfcorehdr is recreated due to memory hotplug, the offset of
the program header representing the first 64 KB is not updated.
As a result, the capture kernel exports the first 64 KB at offset
0, even though the data actually resides in the backup region.

Fix this by calling sync_backup_region_phdr() to update the program
header offset in the elfcorehdr created during memory hotplug.

sync_backup_region_phdr() works for images loaded via the
kexec_file_load syscall. However, it does not work for kexec_load,
because image->arch.backup_start is not initialized in that case.
So introduce machine_kexec_post_load() to process the elfcorehdr
prepared by kexec-tools and initialize image->arch.backup_start for
kdump images loaded via kexec_load syscall.

Rename update_backup_region_phdr() to sync_backup_region_phdr() and
extend it to synchronize the backup region offset between the kdump
image and the ELF core header. The helper now supports updating either
the kdump image from the ELF program header or updating the ELF program
header from the kdump image, avoiding code duplication.

Define ARCH_HAS_KIMAGE_ARCH and struct kimage_arch when
CONFIG_KEXEC_FILE or CONFIG_CRASH_DUMP is enabled so that
kimage->arch.backup_start is available with the kexec_load system call.

This patch depends on the patch titled
"powerpc/crash: fix backup region offset update to elfcorehdr".

Fixes: 849599b702 ("powerpc/crash: add crash memory hotplug support")
Reviewed-by: Aditya Gupta <adityag@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260312083051.1935737-3-sourabhjain@linux.ibm.com
2026-04-01 09:21:04 +05:30
Linus Torvalds
809b997a5c x86-64/arm64/powerpc: clean up and rename __copy_from_user_flushcache
This finishes the work on these odd functions that were only implemented
by a handful of architectures.

The 'flushcache' function was only used from the iterator code, and
let's make it do the same thing that the nontemporal version does:
remove the two underscores and add the user address checking.

Yes, yes, the user address checking is also done at iovec import time,
but we have long since walked away from the old double-underscore thing
where we try to avoid address checking overhead at access time, and
these functions shouldn't be so special and old-fashioned.

The arm64 version already did the address check, in fact, so there it's
just a matter of renaming it.  For powerpc and x86-64 we now do the
proper user access boilerplate.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-03-30 15:05:57 -07:00
Ritesh Harjani (IBM)
d1503aa9ab powerpc/64s: Add support for huge pfnmaps
This uses _RPAGE_SW2 bit for the PMD and PUDs similar to PTEs.
This also adds support for {pte,pmd,pud}_pgprot helpers needed for
follow_pfnmap APIs.

This allows us to extend the PFN mappings, e.g. PCI MMIO bars where
it can grow as large as 8GB or even bigger, to map at PMD / PUD level.
VFIO PCI core driver already supports fault handling at PMD / PUD level
for more efficient BAR mappings.

Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/6fca726574236f556dd4e1e259692e82a4c29e85.1773058761.git.ritesh.list@gmail.com
2026-03-30 15:52:17 +05:30
Ritesh Harjani (IBM)
bf7c1497d2 powerpc/64s: Move serialize_against_pte_lookup() to hash_pgtable.c
Originally,
commit fa4531f753 ("powerpc/mm: Don't send IPI to all cpus on THP updates")
introduced serialize_against_pte_lookup() call for both Radix and Hash.

However below commit fixed the race with Radix
commit 70cbc3cc78 ("mm: gup: fix the fast GUP race against THP collapse")

And therefore following commit removed the
serialize_against_pte_lookup() call from radix_pgtable.c
commit bedf034169
("powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush")

Now since serialize_against_pte_lookup() only gets called from
hash__pmdp_collapse_flush(), thus move the related functions to
hash_pgtable.c

Hence this patch:
- moves serialize_against_pte_lookup() from radix_pgtable.c to hash_pgtable.c
- removes the radix specific calls from do_serialize()
- renames do_serialize() to do_nothing().

There should not be any functionality change in this patch.

Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/a73ebe800a9be257329507703779f822363f8b2f.1773078178.git.ritesh.list@gmail.com
2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)
4a342f3e6f powerpc/64s/tlbflush-radix: Remove unused radix__flush_tlb_pwc()
Commit 52162ec784
("powerpc/mm/book3s64/radix: Use freed_tables instead of need_flush_all")
removed radix__flush_tlb_pwc() definition, but missed to remove the extern
declaration. This patch removes it.

Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/b79c8ce8f00aa3e96ab9b1c77bc004759c397d3f.1773078178.git.ritesh.list@gmail.com
2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)
68b1fa0ed5 powerpc/64s: Fix _HPAGE_CHG_MASK to include _PAGE_SPECIAL bit
commit af38538801 ("mm/memory: factor out common code from vm_normal_page_*()"),
added a VM_WARN_ON_ONCE for huge zero pfn.

This can lead to the following call stack.

 ------------[ cut here ]------------
 WARNING: mm/memory.c:735 at vm_normal_page_pmd+0xf0/0x140, CPU#19: hmm-tests/3366
 NIP [c00000000078d0c0] vm_normal_page_pmd+0xf0/0x140
 LR [c00000000078d060] vm_normal_page_pmd+0x90/0x140
 Call Trace:
 [c00000016f56f850] [c00000000078d060] vm_normal_page_pmd+0x90/0x140 (unreliable)
 [c00000016f56f8a0] [c0000000008a9e30] change_huge_pmd+0x7c0/0x870
 [c00000016f56f930] [c0000000007b2bc4] change_protection+0x17a4/0x1e10
 [c00000016f56fba0] [c0000000007b3440] mprotect_fixup+0x210/0x4c0
 [c00000016f56fc30] [c0000000007b3c3c] do_mprotect_pkey+0x54c/0x780
 [c00000016f56fdb0] [c0000000007b3ed8] sys_mprotect+0x68/0x90
 [c00000016f56fdf0] [c00000000003ae40] system_call_exception+0x190/0x500
 [c00000016f56fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec

This happens when we call mprotect -> change_huge_pmd()
mprotect()
  change_pmd_range()
    pmd_modify(oldpmd, newprot) 	# this clears _PAGE_SPECIAL for zero huge pmd
	    pmdv = pmd_val(pmd);
	    pmdv &= _HPAGE_CHG_MASK;	# -> gets cleared here
	    return pmd_set_protbits(__pmd(pmdv), newprot);
    can_change_pmd_writable(vma, vmf->address, pmd)
      vm_normal_page_pmd(vma, addr, pmd)
        __vm_normal_page()
          VM_WARN_ON(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn));  # this get hits as _PAGE_SPECIAL for zero huge pmd was cleared.

It can be easily reproduced with the following testcase:
	p = mmap(NULL, 2 * hpage_pmd_size, PROT_READ, MAP_PRIVATE |
		 MAP_ANONYMOUS, -1, 0);
	madvise((void *)p, 2 * hpage_pmd_size, MADV_HUGEPAGE);
	aligned = (char*)(((unsigned long)p + hpage_pmd_size - 1) &
				~(hpage_pmd_size - 1));
	(void)(*(volatile char*)aligned);  // read fault, installs huge zero PMD
	mprotect((void *)aligned, hpage_pmd_size, PROT_READ | PROT_WRITE);

This patch adds _PAGE_SPECIAL to _HPAGE_CHG_MASK similar to
_PAGE_CHG_MASK, as we don't want to clear this bit when calling
pmd_modify() while changing protection bits.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/7416f5cdbcfeaad947860fcac488b483f1287172.1773078178.git.ritesh.list@gmail.com
2026-03-17 13:56:37 +05:30
Ritesh Harjani (IBM)
bbcbf045d6 powerpc/64s: Fix unmap race with PMD migration entries
The following race is possible with migration swap entries or
device-private THP entries. e.g. when move_pages is called on a PMD THP
page, then there maybe an intermediate state, where PMD entry acts as
a migration swap entry (pmd_present() is true). Then if an munmap
happens at the same time, then this VM_BUG_ON() can happen in
pmdp_huge_get_and_clear_full().

This patch fixes that.

Thread A: move_pages() syscall
  add_folio_for_migration()
    mmap_read_lock(mm)
    folio_isolate_lru(folio)
    mmap_read_unlock(mm)

  do_move_pages_to_node()
    migrate_pages()
      try_to_migrate_one()
        spin_lock(ptl)
        set_pmd_migration_entry()
          pmdp_invalidate()     # PMD: _PAGE_INVALID | _PAGE_PTE | pfn
          set_pmd_at()          # PMD: migration swap entry (pmd_present=0)
        spin_unlock(ptl)
        [page copy phase]       # <--- RACE WINDOW -->

Thread B: munmap()
  mmap_write_downgrade(mm)
  unmap_vmas() -> zap_pmd_range()
    zap_huge_pmd()
      __pmd_trans_huge_lock()
        pmd_is_huge():          # !pmd_present && !pmd_none -> TRUE (swap entry)
        pmd_lock() -> 		# spin_lock(ptl), waits for Thread A to release ptl
      pmdp_huge_get_and_clear_full()
        VM_BUG_ON(!pmd_present(*pmdp))  # HITS!

[  287.738700][ T1867] ------------[ cut here ]------------
[  287.743843][ T1867] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:187!
cpu 0x0: Vector: 700 (Program Check) at [c00000044037f4f0]
    pc: c000000000094ca4: pmdp_huge_get_and_clear_full+0x6c/0x23c
    lr: c000000000645dec: zap_huge_pmd+0xb0/0x868
    sp: c00000044037f790
   msr: 800000000282b033
  current = 0xc0000004032c1a00
  paca    = 0xc000000004fe0000   irqmask: 0x03   irq_happened: 0x09
    pid   = 1867, comm = a.out
kernel BUG at :187!
Linux version 6.19.0-12136-g14360d4f917c-dirty (powerpc64le-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #27 SMP PREEMPT Sun Feb 22 10:38:56 IST 2026
enter ? for help
[link register   ] c000000000645dec zap_huge_pmd+0xb0/0x868
[c00000044037f790] c00000044037f7d0 (unreliable)
[c00000044037f7d0] c000000000645dcc zap_huge_pmd+0x90/0x868
[c00000044037f840] c0000000005724cc unmap_page_range+0x176c/0x1f40
[c00000044037fa00] c000000000572ea0 unmap_vmas+0xb0/0x1d8
[c00000044037fa90] c0000000005af254 unmap_region+0xb4/0x128
[c00000044037fb50] c0000000005af400 vms_complete_munmap_vmas+0x138/0x310
[c00000044037fbe0] c0000000005b0f1c do_vmi_align_munmap+0x1ec/0x238
[c00000044037fd30] c0000000005b3688 __vm_munmap+0x170/0x1f8
[c00000044037fdf0] c000000000587f74 sys_munmap+0x2c/0x40
[c00000044037fe10] c000000000032668 system_call_exception+0x128/0x350
[c00000044037fe50] c00000000000d05c system_call_vectored_common+0x15c/0x2ec
---- Exception: 3000 (System Call Vectored) at 0000000010064a2c
SP (7fff9b1ee9c0) is in userspace
0:mon> zh

commit a30b48bf1b ("mm/migrate_device: implement THP migration of zone device pages"),
enabled migration for device-private PMD entries. Hence this is one
other path where this warning could get trigger from.

 ------------[ cut here ]------------
 WARNING: arch/powerpc/mm/book3s64/hash_pgtable.c:199 at hash__pmd_hugepage_update+0x48/0x284, CPU#3: hmm-tests/1905
 Modules linked in: test_hmm
 CPU: 3 UID: 0 PID: 1905 Comm: hmm-tests Tainted: G    B   W    L   N  7.0.0-rc1-01438-g7e2f0ee7581c #21 PREEMPT
 Tainted: [B]=BAD_PAGE, [W]=WARN, [L]=SOFTLOCKUP, [N]=TEST
 Hardware name: IBM pSeries (emulated by qemu) POWER10 (architected) 0x801200 0xf000006 of:SLOF,git-ee03ae pSeries
 NIP [c000000000096b70] hash__pmd_hugepage_update+0x48/0x284
 LR [c000000000096e7c] hash__pmdp_huge_get_and_clear+0xd0/0xd4
 Call Trace:
 [c000000604707670] [c000000004e102b8] 0xc000000004e102b8 (unreliable)
 [c000000604707700] [c00000000064ec3c] set_pmd_migration_entry+0x414/0x498
 [c000000604707760] [c00000000063e5a4] migrate_vma_collect_pmd+0x12e8/0x16c4
 [c000000604707890] [c00000000059282c] walk_pgd_range+0x7fc/0xd2c
 [c000000604707990] [c000000000592e40] __walk_page_range+0xe4/0x2ac
 [c000000604707a10] [c000000000593534] walk_page_range_mm_unsafe+0x204/0x2a4
 [c000000604707ab0] [c00000000063af10] migrate_vma_setup+0x1dc/0x2e8
 [c000000604707b10] [c008000006a21838] dmirror_migrate_to_system.constprop.0+0x210/0x4b0 [test_hmm]
 [c000000604707c30] [c008000006a245b0] dmirror_fops_unlocked_ioctl+0x454/0xa5c [test_hmm]
 [c000000604707d20] [c0000000006aab84] sys_ioctl+0x4ec/0x1178
 [c000000604707e10] [c0000000000326a8] system_call_exception+0x128/0x350
 [c000000604707e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
 ---- interrupt: 3000 at 0x7fffbe44f50c

Fixes: 75358ea359 ("powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race")
Fixes: a30b48bf1b ("mm/migrate_device: implement THP migration of zone device pages")
Reported-by: Pavithra Prakash <pavrampu@linux.vnet.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/9437e5ef28d1e2f5cbdd7f8286350ce93c1d43c5.1773078178.git.ritesh.list@gmail.com
2026-03-17 13:56:37 +05:30
Tobias Huschle
cf8771ca4c mm/page_table_check: Pass mm_struct to pxx_user_accessible_page()
Unlike other architectures, s390 does not have means to
distinguish kernel vs user page table entries - neither
an entry itself, nor the address could be used for that.
It is only the mm_struct that indicates whether an entry
in question is mapped to a user space. So pass mm_struct
to pxx_user_accessible_page() callbacks.

[agordeev@linux.ibm.com: rephrased commit message, removed braces]

Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> #powerpc
Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/ca77f3489453c2fe01b25e50e53b778929e0dfc5.1772812343.git.agordeev@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-13 00:07:47 +01:00
Sayali Patil
6bc9c0a905 powerpc: fix KUAP warning in VMX usercopy path
On powerpc with PREEMPT_FULL or PREEMPT_LAZY and function tracing enabled,
KUAP warnings can be triggered from the VMX usercopy path under memory
stress workloads.

KUAP requires that no subfunctions are called once userspace access has
been enabled. The existing VMX copy implementation violates this
requirement by invoking enter_vmx_usercopy() from the assembly path after
userspace access has already been enabled. If preemption occurs
in this window, the AMR state may not be preserved correctly,
leading to unexpected userspace access state and resulting in
KUAP warnings.

Fix this by restructuring the VMX usercopy flow so that VMX selection
and VMX state management are centralized in raw_copy_tofrom_user(),
which is invoked by the raw_copy_{to,from,in}_user() wrappers.

The new flow is:

  - raw_copy_{to,from,in}_user() calls raw_copy_tofrom_user()
  - raw_copy_tofrom_user() decides whether to use the VMX path
    based on size and CPU capability
  - Call enter_vmx_usercopy() before enabling userspace access
  - Enable userspace access as per the copy direction
    and perform the VMX copy
  - Disable userspace access as per the copy direction
  - Call exit_vmx_usercopy()
  - Fall back to the base copy routine if the VMX copy faults

With this change, the VMX assembly routines no longer perform VMX state
management or call helper functions; they only implement the
copy operations.
The previous feature-section based VMX selection inside
__copy_tofrom_user_power7() is removed, and a dedicated
__copy_tofrom_user_power7_vmx() entry point is introduced.

This ensures correct KUAP ordering, avoids subfunction calls
while KUAP is unlocked, and eliminates the warnings while preserving
the VMX fast path.

Fixes: de78a9c42a ("powerpc: Add a framework for Kernel Userspace Access Protection")
Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Closes: https://lore.kernel.org/all/20260109064917.777587-2-sshegde@linux.ibm.com/
Suggested-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Co-developed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Sayali Patil <sayalip@linux.ibm.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260304122201.153049-1-sayalip@linux.ibm.com
2026-03-12 11:03:47 +05:30
Thomas Weißschuh
9c89d8bd64 powerpc/vdso: Explicitly include asm/cputable.h and asm/feature-fixups.h
The usage of ASM_FTR_IFCLR(CPU_TR_ARCH_31) requires asm/cputable.h and
asm/feature-fixups.h. Currently these headers are included transitively,
but that transitive inclusion is about to go away.

Explicitly include the headers.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://patch.msgid.link/20260227-vdso-header-cleanups-v2-5-35d60acf7410@linutronix.de
2026-03-11 15:12:36 +01:00
Thomas Weißschuh
3852dd5abc powerpc/vdso/gettimeofday: Explicitly include vdso/time32.h
The usage of 'struct old_timespec32' requires vdso/time32.h. Currently
this header is included transitively, but that transitive inclusion is
about to go away.

Explicitly include the header.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://patch.msgid.link/20260227-vdso-header-cleanups-v2-4-35d60acf7410@linutronix.de
2026-03-11 15:12:31 +01:00
Christophe Leroy (CS GROUP)
0ee95a1d45 powerpc/uaccess: Fix inline assembly for clang build on PPC32
Test robot reports the following error with clang-16.0.6:

   In file included from kernel/rseq.c:75:
   include/linux/rseq_entry.h:141:3: error: invalid operand for instruction
                   unsafe_get_user(offset, &ucs->post_commit_offset, efault);
                   ^
   include/linux/uaccess.h:608:2: note: expanded from macro 'unsafe_get_user'
           arch_unsafe_get_user(x, ptr, local_label);      \
           ^
   arch/powerpc/include/asm/uaccess.h:518:2: note: expanded from macro 'arch_unsafe_get_user'
           __get_user_size_goto(__gu_val, __gu_addr, sizeof(*(p)), e); \
           ^
   arch/powerpc/include/asm/uaccess.h:284:2: note: expanded from macro '__get_user_size_goto'
           __get_user_size_allowed(x, ptr, size, __gus_retval);    \
           ^
   arch/powerpc/include/asm/uaccess.h:275:10: note: expanded from macro '__get_user_size_allowed'
           case 8: __get_user_asm2(x, (u64 __user *)ptr, retval);  break;  \
                   ^
   arch/powerpc/include/asm/uaccess.h:258:4: note: expanded from macro '__get_user_asm2'
                   "       li %1+1,0\n"                    \
                    ^
   <inline asm>:7:5: note: instantiated into assembly here
           li 31+1,0
              ^
   1 error generated.

On PPC32, for 64 bits vars a pair of registers is used. Usually the
lower register in the pair is the high part and the higher register is
the low part. GCC uses r3/r4 ... r11/r12 ... r14/r15 ... r30/r31

In older kernel code inline assembly was using %1 and %1+1 to represent
64 bits values. However here it looks like clang uses r31 as high part,
allthough r32 doesn't exist hence the error.

Allthoug %1+1 should work, most places now use %L1 instead of %1+1, so
let's do the same here.

With that change, the build doesn't fail anymore and a disassembly shows
clang uses r17/r18 and r31/r14 pair when GCC would have used r16/r17 and
r30/r31:

	Disassembly of section .fixup:

	00000000 <.fixup>:
	   0:	38 a0 ff f2 	li      r5,-14
	   4:	3a 20 00 00 	li      r17,0
	   8:	3a 40 00 00 	li      r18,0
	   c:	48 00 00 00 	b       c <.fixup+0xc>
				c: R_PPC_REL24	.text+0xbc
	  10:	38 a0 ff f2 	li      r5,-14
	  14:	3b e0 00 00 	li      r31,0
	  18:	39 c0 00 00 	li      r14,0
	  1c:	48 00 00 00 	b       1c <.fixup+0x1c>
				1c: R_PPC_REL24	.text+0x144

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202602021825.otcItxGi-lkp@intel.com/
Fixes: c20beffeec ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()")
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8ca3a657a650e497a96bfe7acde2f637dadab344.1770103646.git.chleroy@kernel.org
2026-03-04 11:06:02 +05:30
Christophe Leroy
b9e7e3ea60 powerpc/e500: Always use 64 bits PTE
Today there are two PTE formats for e500:
- The 64 bits format, used
 - On 64 bits kernel
 - On 32 bits kernel with 64 bits physical addresses
 - On 32 bits kernel with support of huge pages
- The 32 bits format, used in other cases

Maintaining two PTE formats means unnecessary maintenance burden
because every change needs to be implemented and tested for both
formats.

Remove the 32 bits PTE format. The memory usage increase due to
larger PTEs is minimal (approx. 0,1% of memory).

This also means that from now on huge pages are supported also
with 32 bits physical addresses.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/04a658209ea78dcc0f3dbde6b2c29cf1939adfe9.1767721208.git.chleroy@kernel.org
2026-03-04 11:05:06 +05:30
Linus Torvalds
4cff5c05e0 mm.git review status for linus..mm-stable
Everything:
 
 Total patches:       325
 Reviews/patch:       1.39
 Reviewed rate:       72%
 
 Excluding DAMON:
 
 Total patches:       262
 Reviews/patch:       1.63
 Reviewed rate:       82%
 
 Excluding DAMON and zram:
 
 Total patches:       248
 Reviews/patch:       1.72
 Reviewed rate:       86%
 
 - The 14 patch series "powerpc/64s: do not re-activate batched TLB
   flush" from Alexander Gordeev makes arch_{enter|leave}_lazy_mmu_mode()
   nest properly.
 
   It adds a generic enter/leave layer and switches architectures to use
   it.  Various hacks were removed in the process.
 
 - The 7 patch series "zram: introduce compressed data writeback" from
   Richard Chang and Sergey Senozhatsky implements data compression for
   zram writeback.
 
 - The 8 patch series "mm: folio_zero_user: clear page ranges" from David
   Hildenbrand adds clearing of contiguous page ranges for hugepages.
   Large improvements during demand faulting are demonstrated.
 
 - The 2 patch series "memcg cleanups" from Chen Ridong tideis up some
   memcg code.
 
 - The 12 patch series "mm/damon: introduce {,max_}nr_snapshots and
   tracepoint for damos stats" from SeongJae Park improves DAMOS stat's
   provided information, deterministic control, and readability.
 
 - The 3 patch series "selftests/mm: hugetlb cgroup charging: robustness
   fixes" from Li Wang fixes a few issues in the hugetlb cgroup charging
   selftests.
 
 - The 5 patch series "Fix va_high_addr_switch.sh test failure - again"
   from Chunyu Hu addresses several issues in the va_high_addr_switch test.
 
 - The 5 patch series "mm/damon/tests/core-kunit: extend existing test
   scenarios" from Shu Anzai improves the KUnit test coverage for DAMON.
 
 - The 2 patch series "mm/khugepaged: fix dirty page handling for
   MADV_COLLAPSE" from Shivank Garg fixes a glitch in khugepaged which was
   causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN.
 
 - The 29 patch series "arch, mm: consolidate hugetlb early reservation"
   from Mike Rapoport reworks and consolidates a pile of straggly code
   related to reservation of hugetlb memory from bootmem and creation of
   CMA areas for hugetlb.
 
 - The 9 patch series "mm: clean up anon_vma implementation" from Lorenzo
   Stoakes cleans up the anon_vma implementation in various ways.
 
 - The 3 patch series "tweaks for __alloc_pages_slowpath()" from
   Vlastimil Babka does a little streamlining of the page allocator's
   slowpath code.
 
 - The 8 patch series "memcg: separate private and public ID namespaces"
   from Shakeel Butt cleans up the memcg ID code and prevents the
   internal-only private IDs from being exposed to userspace.
 
 - The 6 patch series "mm: hugetlb: allocate frozen gigantic folio" from
   Kefeng Wang cleans up the allocation of frozen folios and avoids some
   atomic refcount operations.
 
 - The 11 patch series "mm/damon: advance DAMOS-based LRU sorting" from
   SeongJae Park improves DAMOS's movement of memory betewwn the active and
   inactive LRUs and adds auto-tuning of the ratio-based quotas and of
   monitoring intervals.
 
 - The 18 patch series "Support page table check on PowerPC" from Andrew
   Donnellan makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc.
 
 - The 3 patch series "nodemask: align nodes_and{,not} with underlying
   bitmap ops" from Yury Norov makes nodes_and() and nodes_andnot()
   propagate the return values from the underlying bit operations, enabling
   some cleanup in calling code.
 
 - The 5 patch series "mm/damon: hide kdamond and kdamond_lock from API
   callers" from SeongJae Park cleans up some DAMON internal interfaces.
 
 - The 4 patch series "mm/khugepaged: cleanups and scan limit fix" from
   Shivank Garg does some cleanup work in khupaged and fixes a scan limit
   accounting issue.
 
 - The 24 patch series "mm: balloon infrastructure cleanups" from David
   Hildenbrand goes to town on the balloon infrastructure and its page
   migration function.  Mainly cleanups, also some locking simplification.
 
 - The 2 patch series "mm/vmscan: add tracepoint and reason for
   kswapd_failures reset" from Jiayuan Chen adds additional tracepoints to
   the page reclaim code.
 
 - The 3 patch series "Replace wq users and add WQ_PERCPU to
   alloc_workqueue() users" from Marco Crivellari is part of Marco's
   kernel-wide migration from the legacy workqueue APIs over to the
   preferred unbound workqueues.
 
 - The 9 patch series "Various mm kselftests improvements/fixes" from
   Kevin Brodsky provides various unrelated improvements/fixes for the mm
   kselftests.
 
 - The 5 patch series "mm: accelerate gigantic folio allocation" from
   Kefeng Wang greatly speeds up gigantic folio allocation, mainly by
   avoiding unnecessary work in pfn_range_valid_contig().
 
 - The 5 patch series "selftests/damon: improve leak detection and wss
   estimation reliability" from SeongJae Park improves the reliability of
   two of the DAMON selftests.
 
 - The 8 patch series "mm/damon: cleanup kdamond, damon_call(), damos
   filter and DAMON_MIN_REGION" from SeongJae Park does some cleanup work
   in the core DAMON code.
 
 - The 8 patch series "Docs/mm/damon: update intro, modules, maintainer
   profile, and misc" from SeongJae Park performs maintenance work on the
   DAMON documentation.
 
 - The 10 patch series "mm: add and use vma_assert_stabilised() helper"
   from Lorenzo Stoakes refactors and cleans up the core VMA code.  The
   main aim here is to be able to use the mmap write lock's lockdep state
   to perform various assertions regarding the locking which the VMA code
   requires.
 
 - The 19 patch series "mm, swap: swap table phase II: unify swapin use"
   from Kairui Song removes some old swap code (swap cache bypassing and
   swap synchronization) which wasn't working very well.  Various other
   cleanups and simplifications were made.  The end result is a 20% speedup
   in one benchmark.
 
 - The 8 patch series "enable PT_RECLAIM on more 64-bit architectures"
   from Qi Zheng makes PT_RECLAIM available on 64-bit alpha, loongarch,
   mips, parisc, um,  Various cleanups were performed along the way.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaY1HfAAKCRDdBJ7gKXxA
 jqhZAP9H8ZlKKqCEgnr6U5XXmJ63Ep2FDQpl8p35yr9yVuU9+gEAgfyWiJ43l1fP
 rT0yjsUW3KQFBi/SEA3R6aYarmoIBgI=
 =+HLt
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "powerpc/64s: do not re-activate batched TLB flush" makes
   arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)

   It adds a generic enter/leave layer and switches architectures to use
   it. Various hacks were removed in the process.

 - "zram: introduce compressed data writeback" implements data
   compression for zram writeback (Richard Chang and Sergey Senozhatsky)

 - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
   page ranges for hugepages. Large improvements during demand faulting
   are demonstrated (David Hildenbrand)

 - "memcg cleanups" tidies up some memcg code (Chen Ridong)

 - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
   stats" improves DAMOS stat's provided information, deterministic
   control, and readability (SeongJae Park)

 - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
   issues in the hugetlb cgroup charging selftests (Li Wang)

 - "Fix va_high_addr_switch.sh test failure - again" addresses several
   issues in the va_high_addr_switch test (Chunyu Hu)

 - "mm/damon/tests/core-kunit: extend existing test scenarios" improves
   the KUnit test coverage for DAMON (Shu Anzai)

 - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
   glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
   transiently return -EAGAIN (Shivank Garg)

 - "arch, mm: consolidate hugetlb early reservation" reworks and
   consolidates a pile of straggly code related to reservation of
   hugetlb memory from bootmem and creation of CMA areas for hugetlb
   (Mike Rapoport)

 - "mm: clean up anon_vma implementation" cleans up the anon_vma
   implementation in various ways (Lorenzo Stoakes)

 - "tweaks for __alloc_pages_slowpath()" does a little streamlining of
   the page allocator's slowpath code (Vlastimil Babka)

 - "memcg: separate private and public ID namespaces" cleans up the
   memcg ID code and prevents the internal-only private IDs from being
   exposed to userspace (Shakeel Butt)

 - "mm: hugetlb: allocate frozen gigantic folio" cleans up the
   allocation of frozen folios and avoids some atomic refcount
   operations (Kefeng Wang)

 - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
   of memory betewwn the active and inactive LRUs and adds auto-tuning
   of the ratio-based quotas and of monitoring intervals (SeongJae Park)

 - "Support page table check on PowerPC" makes
   CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)

 - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
   nodes_and() and nodes_andnot() propagate the return values from the
   underlying bit operations, enabling some cleanup in calling code
   (Yury Norov)

 - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
   some DAMON internal interfaces (SeongJae Park)

 - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
   in khupaged and fixes a scan limit accounting issue (Shivank Garg)

 - "mm: balloon infrastructure cleanups" goes to town on the balloon
   infrastructure and its page migration function. Mainly cleanups, also
   some locking simplification (David Hildenbrand)

 - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
   additional tracepoints to the page reclaim code (Jiayuan Chen)

 - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
   part of Marco's kernel-wide migration from the legacy workqueue APIs
   over to the preferred unbound workqueues (Marco Crivellari)

 - "Various mm kselftests improvements/fixes" provides various unrelated
   improvements/fixes for the mm kselftests (Kevin Brodsky)

 - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
   folio allocation, mainly by avoiding unnecessary work in
   pfn_range_valid_contig() (Kefeng Wang)

 - "selftests/damon: improve leak detection and wss estimation
   reliability" improves the reliability of two of the DAMON selftests
   (SeongJae Park)

 - "mm/damon: cleanup kdamond, damon_call(), damos filter and
   DAMON_MIN_REGION" does some cleanup work in the core DAMON code
   (SeongJae Park)

 - "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
   performs maintenance work on the DAMON documentation (SeongJae Park)

 - "mm: add and use vma_assert_stabilised() helper" refactors and cleans
   up the core VMA code. The main aim here is to be able to use the mmap
   write lock's lockdep state to perform various assertions regarding
   the locking which the VMA code requires (Lorenzo Stoakes)

 - "mm, swap: swap table phase II: unify swapin use" removes some old
   swap code (swap cache bypassing and swap synchronization) which
   wasn't working very well. Various other cleanups and simplifications
   were made. The end result is a 20% speedup in one benchmark (Kairui
   Song)

 - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
   available on 64-bit alpha, loongarch, mips, parisc, and um. Various
   cleanups were performed along the way (Qi Zheng)

* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
  mm/memory: handle non-split locks correctly in zap_empty_pte_table()
  mm: move pte table reclaim code to memory.c
  mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
  mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
  um: mm: enable MMU_GATHER_RCU_TABLE_FREE
  parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
  LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
  alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
  mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
  zsmalloc: make common caches global
  mm: add SPDX id lines to some mm source files
  mm/zswap: use %pe to print error pointers
  mm/vmscan: use %pe to print error pointers
  mm/readahead: fix typo in comment
  mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
  mm: refactor vma_map_pages to use vm_insert_pages
  mm/damon: unify address range representation with damon_addr_range
  mm/cma: replace snprintf with strscpy in cma_new_area
  ...
2026-02-12 11:32:37 -08:00
Linus Torvalds
192c015940 powerpc updates for 7.0
- Implement masked user access
  - Add support for internal only per-CPU instructions and inline the bpf_get_smp_processor_id() and bpf_get_current_task()
  - Fix pSeries MSI-X allocation failure when quota is exceeded
  - Fix recursive pci_lock_rescan_remove locking in EEH event handling
  - Support tailcalls with subprogs & BPF exceptions on 64bit
  - Extend "trusted" keys to support the PowerVM Key Wrapping Module (PKWM)
 
 Thanks to: Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li, Jarkko
 Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam Cao, Narayana
 Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket Kumar Bhaskar, Sourabh
 Jain, Srish Srinivasan, Venkat Rao Bagalkote,
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmmL7S0ACgkQpnEsdPSH
 ZJR2xA/9G+tZp9+2TidbLSyPT5o063uC5H5+j5QcvvHWi/ImGUNtixlDm5HcZLCR
 yKywE1piKYBU8HoISjzAt0+JCVd3ZjJ8chTpKgCHLXPRSBTgdR1MG+SXQysDJSWb
 yA4pwDikmwoLlfi+pf500F0nX2DRCRdU2Yi28ZFeaF/khJ7ebwj41QJ7LjN22+Q1
 G8Kq543obZluzSoVvfG4xUK4ByWER+Zdd2F6699iMP68yw5PJ8PPc0SUGt8nuD4i
 FUs0Lw7XV7i/K3+zm/ZgH5+Cvn7wOIcMNkXgFlxJXkFit97KXUDijifYPoXQyKLL
 ksD7SPFdV0++Sc+3mWcgW4j+hQZC0Pn864unmh8C6ug3SagQ+3pE1JYWKwCmoyXd
 49ROH0y+npArJ4NAc79eweunhafGcRYTSG+zV7swQvpRocMujEqa4CDz4uk1ll5W
 1yAac08AN6PnfcXl2VMrcDboziTlQVFcnNQbK/ieYMC7KpgA+udw1hd2rOWNZCPd
 u0byXxR1ak5YaAEuyMztd/39hrExx8306Jtkh5FIRZKWGAO+3np5bi3vxk11rDni
 c9BGh2JIMtuPUGys3wcFPGMRTKwF2bDFW/pB+5hMHeLUdlkni9WGCX8eLe2klYsy
 T7fBVb4d99IVrJGYv3J1lwELgjrgXvv35XOaUiyJyZhcbng15cc=
 =zJoL
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates for 7.0

 - Implement masked user access

 - Add bpf support for internal only per-CPU instructions and inline the
   bpf_get_smp_processor_id() and bpf_get_current_task() functions

 - Fix pSeries MSI-X allocation failure when quota is exceeded

 - Fix recursive pci_lock_rescan_remove locking in EEH event handling

 - Support tailcalls with subprogs & BPF exceptions on 64bit

 - Extend "trusted" keys to support the PowerVM Key Wrapping Module
   (PKWM)

Thanks to Abhishek Dubey, Christophe Leroy, Gaurav Batra, Guangshuo Li,
Jarkko Sakkinen, Mahesh Salgaonkar, Mimi Zohar, Miquel Sabaté Solà, Nam
Cao, Narayana Murty N, Nayna Jain, Nilay Shroff, Puranjay Mohan, Saket
Kumar Bhaskar, Sourabh Jain, Srish Srinivasan, and Venkat Rao Bagalkote.

* tag 'powerpc-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (27 commits)
  powerpc/pseries: plpks: export plpks_wrapping_is_supported
  docs: trusted-encryped: add PKWM as a new trust source
  keys/trusted_keys: establish PKWM as a trusted source
  pseries/plpks: add HCALLs for PowerVM Key Wrapping Module
  pseries/plpks: expose PowerVM wrapping features via the sysfs
  powerpc/pseries: move the PLPKS config inside its own sysfs directory
  pseries/plpks: fix kernel-doc comment inconsistencies
  powerpc/smp: Add check for kcalloc() failure in parse_thread_groups()
  powerpc: kgdb: Remove OUTBUFMAX constant
  powerpc64/bpf: Additional NVR handling for bpf_throw
  powerpc64/bpf: Support exceptions
  powerpc64/bpf: Add arch_bpf_stack_walk() for BPF JIT
  powerpc64/bpf: Avoid tailcall restore from trampoline
  powerpc64/bpf: Support tailcalls with subprogs
  powerpc64/bpf: Moving tail_call_cnt to bottom of frame
  powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling
  powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded
  powerpc/iommu: bypass DMA APIs for coherent allocations for pre-mapped memory
  powerpc64/bpf: Inline bpf_get_smp_processor_id() and bpf_get_current_task/_btf()
  powerpc64/bpf: Support internal-only MOV instruction to resolve per-CPU addrs
  ...
2026-02-10 21:46:12 -08:00
Linus Torvalds
57cb845067 - A nice cleanup to the paravirt code containing a unification of the paravirt
clock interface, taming the include hell by splitting the pv_ops structure
   and removing of a bunch of obsolete code. Work by Juergen Gross.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmmLKHAACgkQEsHwGGHe
 VUrURg//Ucf+3EAIkLCmFkH0WwYmQl2JjRYww8bPAw3iJMIVxy4dMnaBbsUiAtUp
 kYza+pgEtvyAwwd8RIEs85c9VhZn0DKoaWV8goBH3zFH6YvIRiLwb0w2QvjkF+70
 FNU+4zlvt/I3FD+tWNElAgVtkFL3Gmzm44qyLLsPtlYaJ71xFl2XB7V+TlqXMHzE
 m8BMenP9/CrbTlBBdNJGzAkAbWi1uAP+IydvuFNolH/F2lqVM2z5Ta3gUWWCIk/q
 jWrPLDZCHr2WlBZNUGamKVVH9NEh+7YNwBAGUrSNYGZFoaFjqeX6lN3djzS+wXIj
 0nDoW35jN0QNKz239MdXZDf1mfpb6ZQd/iOhFjo4dAvbm+J8WPAMr98ac8wR3Dyb
 2LF/BxkoKWRabxQApXSCrLPXEuqT6Qc1+lDA0bNHg51zBoqP5vRNVZRwArnzGB+O
 LxDKx+o4VYOf+UCaB6oQHjylbSgFvIedZ9p822hBe3QG9act8indRE8LWip7Utld
 peoJGgvlQ0xtClh6FjVHpvmVfAvk7Zki5ywj2GwmB/TZ0yywuGStAjE3UqY168/M
 gb7MSajh+HHZNj1/2+b/se4CUYlAgIPDQ+SwHJPm5TqyopvnOVi/2XWmjbx8I5jT
 jS0nxaxD+SbESSZ6IMAsppnAAxAYbvRHGIS+6mtNCXVkaV1pMbA=
 =AeFt
 -----END PGP SIGNATURE-----

Merge tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 paravirt updates from Borislav Petkov:

 - A nice cleanup to the paravirt code containing a unification of the
   paravirt clock interface, taming the include hell by splitting the
   pv_ops structure and removing of a bunch of obsolete code (Juergen
   Gross)

* tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted()
  x86/paravirt: Remove trailing semicolons from alternative asm templates
  x86/pvlocks: Move paravirt spinlock functions into own header
  x86/paravirt: Specify pv_ops array in paravirt macros
  x86/paravirt: Allow pv-calls outside paravirt.h
  objtool: Allow multiple pv_ops arrays
  x86/xen: Drop xen_mmu_ops
  x86/xen: Drop xen_cpu_ops
  x86/xen: Drop xen_irq_ops
  x86/paravirt: Move pv_native_*() prototypes to paravirt.c
  x86/paravirt: Introduce new paravirt-base.h header
  x86/paravirt: Move paravirt_sched_clock() related code into tsc.c
  x86/paravirt: Use common code for paravirt_steal_clock()
  riscv/paravirt: Use common code for paravirt_steal_clock()
  loongarch/paravirt: Use common code for paravirt_steal_clock()
  arm64/paravirt: Use common code for paravirt_steal_clock()
  arm/paravirt: Use common code for paravirt_steal_clock()
  sched: Move clock related paravirt code to kernel/sched
  paravirt: Remove asm/paravirt_api_clock.h
  x86/paravirt: Move thunk macros to paravirt_types.h
  ...
2026-02-10 19:01:45 -08:00
Qi Zheng
086498aed3 mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
For architectures that define __HAVE_ARCH_TLB_REMOVE_TABLE, the page
tables at the pmd/pud level are generally not of struct ptdesc type, and
do not have pt_rcu_head member, thus these architectures cannot support
PT_RECLAIM.

In preparation for enabling PT_RECLAIM on more architectures, convert
__HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config,
so that we can make conditional judgments in Kconfig.

Link: https://lkml.kernel.org/r/5ebfa3d4b56e63c6906bda5eccaa9f7194d3a86b.1769515122.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Tested-by: Andreas Larsson <andreas@gaisler.com>	[sparc, UP&SMP]
Acked-by: Andreas Larsson <andreas@gaisler.com>		[sparc]
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-06 15:47:19 -08:00
Srish Srinivasan
133aa79e21 pseries/plpks: add HCALLs for PowerVM Key Wrapping Module
The hypervisor generated wrapping key is an AES-GCM-256 symmetric key which
is stored in a non-volatile, secure, and encrypted storage called the Power
LPAR Platform KeyStore. It has policy based protections that prevent it
from being read out or exposed to the user.

Implement H_PKS_GEN_KEY, H_PKS_WRAP_OBJECT, and H_PKS_UNWRAP_OBJECT HCALLs
to enable using the PowerVM Key Wrapping Module (PKWM) as a new trust
source for trusted keys. Disallow H_PKS_READ_OBJECT, H_PKS_SIGNED_UPDATE,
and H_PKS_WRITE_OBJECT for objects with the 'wrapping key' policy set.
Capture the availability status for the H_PKS_WRAP_OBJECT interface.

Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-5-ssrish@linux.ibm.com
2026-01-30 09:27:26 +05:30
Srish Srinivasan
447eb1d5ef pseries/plpks: expose PowerVM wrapping features via the sysfs
Starting with Power11, PowerVM supports a new feature called "Key Wrapping"
that protects user secrets by wrapping them using a hypervisor generated
wrapping key. The status of this feature can be read by the
H_PKS_GET_CONFIG HCALL.

Expose the Power LPAR Platform KeyStore (PLPKS) wrapping features config
via the sysfs file /sys/firmware/plpks/config/wrapping_features.

Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-4-ssrish@linux.ibm.com
2026-01-30 09:27:26 +05:30
Srish Srinivasan
40850c909f powerpc/pseries: move the PLPKS config inside its own sysfs directory
The /sys/firmware/secvar/config directory represents Power LPAR Platform
KeyStore (PLPKS) configuration properties such as max_object_size, signed_
update_algorithms, supported_policies, total_size, used_space, and version.
These attributes describe the PLPKS, and not the secure boot variables
(secvars).

Create /sys/firmware/plpks directory and move the PLPKS config inside this
directory. For backwards compatibility, create a soft link from the secvar
sysfs directory to this config and emit a warning stating that the older
sysfs path has been deprecated. Separate out the plpks specific
documentation from secvar.

Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-3-ssrish@linux.ibm.com
2026-01-30 09:27:26 +05:30
Srish Srinivasan
88372ad4b0 pseries/plpks: fix kernel-doc comment inconsistencies
Fix issues with comments for all the applicable functions to be
consistent with kernel-doc format. Move them before the function
definition as opposed to the function prototype.

Signed-off-by: Srish Srinivasan <ssrish@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260127145228.48320-2-ssrish@linux.ibm.com
2026-01-30 09:27:26 +05:30
Miquel Sabaté Solà
4acc549e80 powerpc: kgdb: Remove OUTBUFMAX constant
This constant was introduced in commit 17ce452f7e ("kgdb, powerpc:
arch specific powerpc kgdb support"), but it is no longer used anywhere
in the source tree.

Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250915141808.146695-1-mikisabate@gmail.com
2026-01-29 09:06:01 +05:30
Rohan McLure
641d47d4c9 powerpc/mm: support page table check
On creation and clearing of a page table mapping, instrument such calls by
invoking page_table_check_pte_set and page_table_check_pte_clear
respectively.  These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK on powerpc, except when HUGETLB_PAGE
is enabled (powerpc has some weirdness in how it implements
set_huge_pte_at(), which may require some further work).

See also:

riscv support in commit 3fee229a8e ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6 ("x86: mm: add x86_64 support for page table
check")

[ajd@linux.ibm.com: rebase, add additional instrumentation, misc fixes]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-12-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:36 -08:00
Rohan McLure
2360f523a4 powerpc/mm: use set_pte_at_unchecked() for internal usages
In the new set_ptes() API, set_pte_at() (a special case of set_ptes()) is
intended to be instrumented by the page table check facility.  There are
however several other routines that constitute the API for setting page
table entries, including set_pmd_at() among others.  Such routines are
themselves implemented in terms of set_ptes_at().

A future patch providing support for page table checking on powerpc must
take care to avoid duplicate calls to page_table_check_p{te,md,ud}_set(). 
Allow for assignment of pte entries without instrumentation through the
set_pte_at_unchecked() routine introduced in this patch.

Cause API-facing routines that call set_pte_at() to instead call
set_pte_at_unchecked(), which will remain uninstrumented by page table
check.  set_ptes() is itself implemented by calls to __set_pte_at(), so
this eliminates redundant code.

[ajd@linux.ibm.com: don't change to unchecked for early boot/kernel mappings]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-11-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:36 -08:00
Rohan McLure
2f5e576598 powerpc/mm: implement *_user_accessible_page() for ptes
Page table checking depends on architectures providing an implementation
of p{te,md,ud}_user_accessible_page.  With refactorisations made on
powerpc/mm, the pte_access_permitted() and similar methods verify whether
a userland page is accessible with the required permissions.

Since page table checking is the only user of
p{te,md,ud}_user_accessible_page(), implement these for all platforms,
using some of the same preliminary checks taken by pte_access_permitted()
on that platform.

Since commit 8e9bd41e4c ("powerpc/nohash: Replace pte_user() by
pte_read()") pte_user() is no longer required to be present on all
platforms as it may be equivalent to or implied by pte_read().  Hence
implementations of pte_user_accessible_page() are specialised.

[ajd@linux.ibm.com: rebase and clean up]
Link: https://lkml.kernel.org/r/20251219-pgtable_check_v18rebase-v18-10-755bc151a50b@linux.ibm.com
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Guo Weikang <guoweikang.kernel@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:35 -08:00
Mike Rapoport (Microsoft)
9fac145b6d mm, arch: consolidate hugetlb CMA reservation
Every architecture that supports hugetlb_cma command line parameter
reserves CMA areas for hugetlb during setup_arch().

This obfuscates the ordering of hugetlb CMA initialization with respect to
the rest initialization of the core MM.

Introduce arch_hugetlb_cma_order() callback to allow architectures report
the desired order-per-bit of CMA areas and provide a week implementation
of arch_hugetlb_cma_order() for architectures that don't support hugetlb
with CMA.

Use this callback in hugetlb_cma_reserve() instead if passing the order as
parameter and call hugetlb_cma_reserve() from mm_core_init_early() rather
than have it spread over architecture specific code.

Link: https://lkml.kernel.org/r/20260111082105.290734-28-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:19 -08:00
Mike Rapoport (Microsoft)
4267739cab arch, mm: consolidate initialization of SPARSE memory model
Every architecture calls sparse_init() during setup_arch() although the
data structures created by sparse_init() are not used until the
initialization of the core MM.

Beside the code duplication, calling sparse_init() from architecture
specific code causes ordering differences of vmemmap and HVO
initialization on different architectures.

Move the call to sparse_init() from architecture specific code to
free_area_init() to ensure that vmemmap and HVO initialization order is
always the same.

Link: https://lkml.kernel.org/r/20260111082105.290734-25-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26 20:02:18 -08:00
David Hildenbrand
8e38607aa4 treewide: provide a generic clear_user_page() variant
Patch series "mm: folio_zero_user: clear page ranges", v11.

This series adds clearing of contiguous page ranges for hugepages.

The series improves on the current discontiguous clearing approach in two
ways:

  - clear pages in a contiguous fashion.
  - use batched clearing via clear_pages() wherever exposed.

The first is useful because it allows us to make much better use of
hardware prefetchers.

The second, enables advertising the real extent to the processor.  Where
specific instructions support it (ex.  string instructions on x86; "mops"
on arm64 etc), a processor can optimize based on this because, instead of
seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a
larger unit being operated on.

For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to
a mode where they start eliding cacheline allocation.  This is helpful not
just because it results in higher bandwidth, but also because now the
cache is not evicting useful cachelines and replacing them with zeroes.

Demand faulting a 64GB region shows performance improvement:

 $ perf bench mem mmap -p $pg-sz -f demand -s 64GB -l 5

                       baseline              +series
                   (GBps +- %stdev)      (GBps +- %stdev)

   pg-sz=2MB       11.76 +- 1.10%        25.34 +- 1.18% [*]   +115.47%  	preempt=*

   pg-sz=1GB       24.85 +- 2.41%        39.22 +- 2.32%       + 57.82%  	preempt=none|voluntary
   pg-sz=1GB         (similar)           52.73 +- 0.20% [#]   +112.19%  	preempt=full|lazy

 [*] This improvement is because switching to sequential clearing
  allows the hardware prefetchers to do a much better job.

 [#] For pg-sz=1GB a large part of the improvement is because of the
  cacheline elision mentioned above. preempt=full|lazy improves upon
  that because, not needing explicit invocations of cond_resched() to
  ensure reasonable preemption latency, it can clear the full extent
  as a single unit. In comparison the maximum extent used for
  preempt=none|voluntary is PROCESS_PAGES_NON_PREEMPT_BATCH (32MB).

  When provided the full extent the processor forgoes allocating
  cachelines on this path almost entirely.

  (The hope is that eventually, in the fullness of time, the lazy
   preemption model will be able to do the same job that none or
   voluntary models are used for, allowing us to do away with
   cond_resched().)

Raghavendra also tested previous version of the series on AMD Genoa and
sees similar improvement [1] with preempt=lazy.

  $ perf bench mem map -p $page-size -f populate -s 64GB -l 10

                    base               patched              change
   pg-sz=2MB       12.731939 GB/sec    26.304263 GB/sec     106.6%
   pg-sz=1GB       26.232423 GB/sec    61.174836 GB/sec     133.2%


This patch (of 8):

Let's drop all variants that effectively map to clear_page() and provide
it in a generic variant instead.

We'll use the macro clear_user_page to indicate whether an architecture
provides it's own variant.

Also, clear_user_page() is only called from the generic variant of
clear_user_highpage(), so define it only if the architecture does not
provide a clear_user_highpage().  And, for simplicity define it in
linux/highmem.h.

Note that for parisc, clear_page() and clear_user_page() map to
clear_page_asm(), so we can just get rid of the custom clear_user_page()
implementation.  There is a clear_user_page_asm() function on parisc, that
seems to be unused.  Not sure what's up with that.

Link: https://lkml.kernel.org/r/20260107072009.1615991-1-ankur.a.arora@oracle.com
Link: https://lkml.kernel.org/r/20260107072009.1615991-2-ankur.a.arora@oracle.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:39 -08:00
Kevin Brodsky
313a05a15a powerpc/mm: replace batch->active with is_lazy_mmu_mode_active()
A per-CPU batch struct is activated when entering lazy MMU mode; its
lifetime is the same as the lazy MMU section (it is deactivated when
leaving the mode).  Preemption is disabled in that interval to ensure that
the per-CPU reference remains valid.

The generic lazy_mmu layer now tracks whether a task is in lazy MMU mode. 
We can therefore use the generic helper is_lazy_mmu_mode_active() to tell
whether a batch struct is active instead of tracking it explicitly.

Link: https://lkml.kernel.org/r/20251215150323.2218608-12-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:34 -08:00
Kevin Brodsky
7303ecbfe4 mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE
Architectures currently opt in for implementing lazy_mmu helpers by
defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE.

In preparation for introducing a generic lazy_mmu layer that will require
storage in task_struct, let's switch to a cleaner approach: instead of
defining a macro, select a CONFIG option.

This patch introduces CONFIG_ARCH_HAS_LAZY_MMU_MODE and has each arch
select it when it implements lazy_mmu helpers. 
__HAVE_ARCH_ENTER_LAZY_MMU_MODE is removed and <linux/pgtable.h> relies on
the new CONFIG instead.

On x86, lazy_mmu helpers are only implemented if PARAVIRT_XXL is selected.
This creates some complications in arch/x86/boot/, because a few files
manually undefine PARAVIRT* options.  As a result <asm/paravirt.h> does
not define the lazy_mmu helpers, but this breaks the build as
<linux/pgtable.h> only defines them if !CONFIG_ARCH_HAS_LAZY_MMU_MODE. 
There does not seem to be a clean way out of this - let's just undefine
that new CONFIG too.

Link: https://lkml.kernel.org/r/20251215150323.2218608-7-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Acked-by: Andreas Larsson <andreas@gaisler.com>	[sparc]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20 19:24:33 -08:00