Commit Graph

1198 Commits

Author SHA1 Message Date
Linus Torvalds
cf950766e9 drm fixes for 7.1-rc1
atomic:
 - raise the vblank timeout to avoid it on virtual drivers
 - fix colorop duplication
 
 bridge:
 - stm_lvds: state check fix
 - dw-mipi-dsi: bridge reference leak fix
 
 panel:
 - visionx-rm69299: init fix
 
 dma-fence:
 - fix sparse warning
 
 dma-buf:
 - UAF fix
 
 panthor:
 - mapping fix
 
 arcgpu:
 - device_node reference leak fix
 
 nouveau:
 - memory leak in error path fix
 - overflow in reloc path for old hw fix
 
 hv:
 - Kconfig fix
 
 v3d:
 - infinite loop fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmnrCR4ACgkQDHTzWXnE
 hr7HEw/9EynaVUxBlI+B+kKakkvt1fpRfaSwaWyawcqaS1XpBjwuUR/aUS44fHzE
 Z1ru6DACZPwKqRTTLm1urE7qjb3T1HKDgSV9LbLJCe3MOZ1HMmyfitoRgWvP/nXU
 bpwT7K79g0g87HMwVzyS13jqSeXmAbuuqF8cgyUsYUU7Jvl3tEd9FfOhwo4BG/4j
 4EDlj9SDrd6/3SbnKzVfrt6xl/ikpNxoYJlMVW3NIDzfZ8aKGdf/jYgKb3ezluDB
 2UAiZdFRiCQd0DunQvCBwnY5/dpxXIajWtAW2f6FWEqIS5mtXkVG6PPzpLRNEmkp
 wetjtnisqwkwSxFxz0pgEII0PsKoxKrQPBXz7zXH6bK6e3i4LQ09mt1Ufc3w9am3
 f29IW2AWYZzB1tOOmMt34mvDm49xXWt/7Q6hHAorQ9ABtb6V3SzsDKfDElEF2ULF
 g1WMCN9uh5ErWkEc1rqIjQ3TPI9EISfUG+hvvr75PA59EHE7a4tQGf45DNWPeC8+
 rO5m8aDKgif7exkoiA8fc8zq4shiH7/tGbJsnZP1GTgyQTBtT4Aik/rVT0zifDpY
 yQ0gwybnd9I7lXMVlhFPaq83HMsr38e8HyViubNmzA8zbd58cPRqBygPqIW6trWr
 6/kzoNFhjgICJsOSi8KkQAtT1zRnakiR1I3GfVos+1DyhC9yPIo=
 =WXJa
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2026-04-24' of https://gitlab.freedesktop.org/drm/kernel

Pull more drm fixes from Dave Airlie:
 "These are the regular fixes that have built up over last couple of
  weeks, all pretty minor and spread all over.

  atomic:
   - raise the vblank timeout to avoid it on virtual drivers
   - fix colorop duplication

  bridge:
   - stm_lvds: state check fix
   - dw-mipi-dsi: bridge reference leak fix

  panel:
   - visionx-rm69299: init fix

  dma-fence:
   - fix sparse warning

  dma-buf:
   - UAF fix

  panthor:
   - mapping fix

  arcgpu:
   - device_node reference leak fix

  nouveau:
   - memory leak in error path fix
   - overflow in reloc path for old hw fix

  hv:
   - Kconfig fix

  v3d:
   - infinite loop fix"

* tag 'drm-fixes-2026-04-24' of https://gitlab.freedesktop.org/drm/kernel:
  drm/nouveau: fix u32 overflow in pushbuf reloc bounds check
  MAINTAINERS: split hisilicon maintenance and add Yongbang Shi for hibmc-drm matainers
  drm/v3d: Reject empty multisync extension to prevent infinite loop
  drm/panel: visionox-rm69299: Make use of prepare_prev_first
  drm/drm_atomic: duplicate colorop states if plane color pipeline in use
  drm/nouveau: fix nvkm_device leak on aperture removal failure
  hv: Select CONFIG_SYSFB only for CONFIG_HYPERV_VMBUS
  dma-fence: Silence sparse warning in dma_fence_describe
  drm/bridge: dw-mipi-dsi: Fix bridge leak when host attach fails
  drm/arcpgu: fix device node leak
  drm/panthor: Fix outdated function documentation
  drm/panthor: Extend VM locked region for remap case to be a superset
  dma-buf: fix UAF in dma_buf_put() tracepoint
  drm/bridge: stm_lvds: Do not fail atomic_check on disabled connector
  drm/atomic: Increase timeout in drm_atomic_helper_wait_for_vblanks()
2026-04-24 11:44:52 -07:00
Dave Airlie
56d0a0b38f This week in drm-misc-fixes, we have:
- A patch to raise the vblank timeout to avoid it on virtual drivers
 - a state check fix for stm_lvds
 - a use-after-free fix for dma-buf
 - a mapping fix for panthor
 - a device_node reference leak fix for arcgpu
 - a bridge reference leak fix for dw-mipi-dsi
 - a sparse warning fix for dma-fence
 - a kconfig fix for hv
 - a memory leak fix for nouveau
 - a fix to duplicate colorop when duplicating states
 - a panel initialisation order fix for visionox-rm69299
 - a fix to prevent an infinite loop for v3d
 - an overflow fix for nouveau
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCaemw5QAKCRAnX84Zoj2+
 dt7GAYCkTEouOPCjZdw+4v7sGvRMT4gVXA1Eqj8wQysdLwLgEMxUPRetn2PIjlJl
 4fGe9AUBgNm6OsKxq8939+l8EDxAIUxl4X4noFDbIgebazcK3QmkHDjlhucu5SoA
 Y7Z3hWfwdw==
 =XpOi
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2026-04-23' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

This week in drm-misc-fixes, we have:
- A patch to raise the vblank timeout to avoid it on virtual drivers
- a state check fix for stm_lvds
- a use-after-free fix for dma-buf
- a mapping fix for panthor
- a device_node reference leak fix for arcgpu
- a bridge reference leak fix for dw-mipi-dsi
- a sparse warning fix for dma-fence
- a kconfig fix for hv
- a memory leak fix for nouveau
- a fix to duplicate colorop when duplicating states
- a panel initialisation order fix for visionox-rm69299
- a fix to prevent an infinite loop for v3d
- an overflow fix for nouveau

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260423-realistic-eager-reindeer-4dacf7@houat
2026-04-24 13:56:54 +10:00
Linus Torvalds
8fd12b03c7 hyperv-next for v7.1
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmnobFATHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXhafB/wMvf8yu4FgapbSRIlPboeW/ONDyuHd
 k0y1a0IFMQLspWfCxK8+snEcHT3g9xzG3ksqcnac0SPgzqxQ4dlK/c7Xr+5EBXuf
 TEt/bHrt6KT5BUpb1k/XxcdoObCsvZd3vfqR020OsHijw1Ni9PrdqeZxk56/vFJs
 nvgyKvsrAEyfALOH1Vgwg0gNWpqJBj1KcT3Kl1o4p5lwQzVpUREZii6RyvgXT/pu
 mckN63FrEPOpDaJllHmCPcfFSzqNi+wFQcFxm35w9rDQsVdnMsRWoJbcXNYW5QGM
 +KOZ1/tzw4Z3queW78hcxsH6sXiElLDsJgtDohBhxbVvUXy+xHrJCOm5
 =4Ke3
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20260421' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V updates from Wei Liu:

 - Fix cross-compilation for hv tools (Aditya Garg)

 - Fix vmemmap_shift exceeding MAX_FOLIO_ORDER in mshv_vtl (Naman Jain)

 - Limit channel interrupt scan to relid high water mark (Michael
   Kelley)

 - Export hv_vmbus_exists() and use it in pci-hyperv (Dexuan Cui)

 - Fix cleanup and shutdown issues for MSHV (Jork Loeser)

 - Introduce more tracing support for MSHV (Stanislav Kinsburskii)

* tag 'hyperv-next-signed-20260421' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Skip LP/VP creation on kexec
  x86/hyperv: move stimer cleanup to hv_machine_shutdown()
  Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing
  mshv: Add tracepoint for GPA intercept handling
  mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER
  tools: hv: Fix cross-compilation
  Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hyperv
  mshv: Introduce tracing support
  Drivers: hv: vmbus: Limit channel interrupt scan to relid high water mark
2026-04-22 09:50:46 -07:00
Jork Loeser
5170a82e89 x86/hyperv: Skip LP/VP creation on kexec
After a kexec the logical processors and virtual processors already
exist in the hypervisor because they were created by the previous
kernel. Attempting to add them again causes either a BUG_ON or
corrupted VP state leading to MCEs in the new kernel.

Add hv_lp_exists() to probe whether an LP is already present by
calling HVCALL_GET_LOGICAL_PROCESSOR_RUN_TIME. When it succeeds the
LP exists and we skip the add-LP and create-VP loops entirely.

Also add hv_call_notify_all_processors_started() which informs the
hypervisor that all processors are online. This is required after
adding LPs (fresh boot) and is a no-op on kexec since we skip that
path.

Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Co-developed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Signed-off-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>
Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-22 06:23:25 +00:00
Jork Loeser
f7ce370b52 x86/hyperv: move stimer cleanup to hv_machine_shutdown()
Move hv_stimer_global_cleanup() from vmbus's hv_kexec_handler() to
hv_machine_shutdown() in the platform code. This ensures stimer cleanup
happens before the vmbus unload, which is required for root partition
kexec to work correctly.

Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-22 06:23:25 +00:00
Jork Loeser
3c42b33433 Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing
vmbus_alloc_synic_and_connect() declares a local 'int
hyperv_cpuhp_online' that shadows the file-scope global of the same
name. The cpuhp state returned by cpuhp_setup_state() is stored in
the local, leaving the global at 0 (CPUHP_OFFLINE). When
hv_kexec_handler() or hv_machine_shutdown() later call
cpuhp_remove_state(hyperv_cpuhp_online) they pass 0, which hits the
BUG_ON in __cpuhp_remove_state_cpuslocked().

Remove the local declaration so the cpuhp state is stored in the
file-scope global where hv_kexec_handler() and hv_machine_shutdown()
expect it.

Fixes: 2647c96649 ("Drivers: hv: Support establishing the confidential VMBus connection")
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-22 06:23:25 +00:00
Stanislav Kinsburskii
cfc42685e5 mshv: Add tracepoint for GPA intercept handling
Provide visibility into GPA intercept operations for debugging and
performance analysis of Microsoft Hypervisor guest memory management.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-22 06:23:25 +00:00
Linus Torvalds
334fbe734e mm.git review status for linus..mm-stable
Everything:
 
 Total patches:       368
 Reviews/patch:       1.56
 Reviewed rate:       74%
 
 Excluding DAMON:
 
 Total patches:       316
 Reviews/patch:       1.77
 Reviewed rate:       81%
 
 Excluding DAMON and zram:
 
 Total patches:       306
 Reviews/patch:       1.81
 Reviewed rate:       82%
 
 Excluding DAMON, zram and maple_tree:
 
 Total patches:       276
 Reviews/patch:       2.01
 Reviewed rate:       91%
 
 Significant patch series in this merge:
 
 - The 30 patch series "maple_tree: Replace big node with maple copy"
   from Liam Howlett is mainly prepararatory work for ongoing development
   but it does reduce stack usage and is an improvement.
 
 - The 12 patch series "mm, swap: swap table phase III: remove swap_map"
   from Kairui Song offers memory savings by removing the static swap_map.
   It also yields some CPU savings and implements several cleanups.
 
 - The 2 patch series "mm: memfd_luo: preserve file seals" from Pratyush
   Yadav adds file seal preservation to LUO's memfd code.
 
 - The 2 patch series "mm: zswap: add per-memcg stat for incompressible
   pages" from Jiayuan Chen adds additional userspace stats reportng to
   zswap.
 
 - The 4 patch series "arch, mm: consolidate empty_zero_page" from Mike
   Rapoport implements some cleanups for our handling of ZERO_PAGE() and
   zero_pfn.
 
 - The 2 patch series "mm/kmemleak: Improve scan_should_stop()
   implementation" from Zhongqiu Han provides an robustness improvement and
   some cleanups in the kmemleak code.
 
 - The 4 patch series "Improve khugepaged scan logic" from Vernon Yang
   "improves the khugepaged scan logic and reduces CPU consumption by
   prioritizing scanning tasks that access memory frequently".
 
 - The 2 patch series "Make KHO Stateless" from Jason Miu simplifies
   Kexec Handover by "transitioning KHO from an xarray-based metadata
   tracking system with serialization to a radix tree data structure that
   can be passed directly to the next kernel"
 
 - The 3 patch series "mm: vmscan: add PID and cgroup ID to vmscan
   tracepoints" from Thomas Ballasi and Steven Rostedt enhances vmscan's
   tracepointing.
 
 - The 5 patch series "mm: arch/shstk: Common shadow stack mapping helper
   and VM_NOHUGEPAGE" from Catalin Marinas is a cleanup for the shadow
   stack code: remove per-arch code in favour of a generic implementation.
 
 - The 2 patch series "Fix KASAN support for KHO restored vmalloc
   regions" from Pasha Tatashin fixes a WARN() which can be emitted the KHO
   restores a vmalloc area.
 
 - The 4 patch series "mm: Remove stray references to pagevec" from Tal
   Zussman provides several cleanups, mainly udpating references to "struct
   pagevec", which became folio_batch three years ago.
 
 - The 17 patch series "mm: Eliminate fake head pages from vmemmap
   optimization" from Kiryl Shutsemau simplifies the HugeTLB vmemmap
   optimization (HVO) by changing how tail pages encode their relationship
   to the head page.
 
 - The 2 patch series "mm/damon/core: improve DAMOS quota efficiency for
   core layer filters" from SeongJae Park improves two problematic
   behaviors of DAMOS that makes it less efficient when core layer filters
   are used.
 
 - The 3 patch series "mm/damon: strictly respect min_nr_regions" from
   SeongJae Park improves DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter.
 
 - The 3 patch series "mm/page_alloc: pcp locking cleanup" from Vlastimil
   Babka is a proper fix for a previously hotfixed SMP=n issue.  Code
   simplifications and cleanups ennsed.
 
 - The 16 patch series "mm: cleanups around unmapping / zapping" from
   David Hildenbrand implements "a bunch of cleanups around unmapping and
   zapping.  Mostly simplifications, code movements, documentation and
   renaming of zapping functions".
 
 - The 6 patch series "support batched checking of the young flag for
   MGLRU" from Baolin Wang supports batched checking of the young flag for
   MGLRU.  It's part cleanups; one benchmark shows large performance
   benefits for arm64.
 
 - The 5 patch series "memcg: obj stock and slab stat caching cleanups"
   from Johannes Weiner provides memcg cleanup and robustness improvements.
 
 - The 5 patch series "Allow order zero pages in page reporting" from
   Yuvraj Sakshith enhances page_reporting's free page reporting - it is
   presently and undesirably order-0 pages when reporting free memory.
 
 - The 6 patch series "mm: vma flag tweaks" from Lorenzo Stoakes is
   cleanup work following from the recent conversion of the VMA flags to a
   bitmap.
 
 - The 10 patch series "mm/damon: add optional debugging-purpose sanity
   checks" from SeongJae Park adds some more developer-facing debug checks
   into DAMON core.
 
 - The 2 patch series "mm/damon: test and document power-of-2
   min_region_sz requirement" from SeongJae Park adds an additional DAMON
   kunit test and makes some adjustments to the addr_unit parameter
   handling.
 
 - The 3 patch series "mm/damon/core: make passed_sample_intervals
   comparisons overflow-safe" from SeongJae Park fixes a hard-to-hit time
   overflow issue in DAMON core.
 
 - The 7 patch series "mm/damon: improve/fixup/update ratio calculation,
   test and documentation" from SeongJae Park is a "batch of misc/minor
   improvements and fixups" for DAMON.
 
 - The 4 patch series "mm: move vma_(kernel|mmu)_pagesize() out of
   hugetlb.c" from David Hildenbrand fixes a possible issue with dax-device
   when CONFIG_HUGETLB=n.  Some code movement was required.
 
 - The 6 patch series "zram: recompression cleanups and tweaks" from
   Sergey Senozhatsky provides "a somewhat random mix of fixups,
   recompression cleanups and improvements" in the zram code.
 
 - The 11 patch series "mm/damon: support multiple goal-based quota
   tuning algorithms" from SeongJae Park extend DAMOS quotas goal
   auto-tuning to support multiple tuning algorithms that users can select.
 
 - The 4 patch series "mm: thp: reduce unnecessary
   start_stop_khugepaged()" from Breno Leitao fixes the khugpaged sysfs
   handling so we no longer spam the logs with reams of junk when
   starting/stopping khugepaged.
 
 - The 3 patch series "mm: improve map count checks" from Lorenzo Stoakes
   provides some cleanups and slight fixes in the mremap, mmap and vma
   code.
 
 - The 5 patch series "mm/damon: support addr_unit on default monitoring
   targets for modules" from SeongJae Park extends the use of DAMON core's
   addr_unit tunable.
 
 - The 5 patch series "mm: khugepaged cleanups and mTHP prerequisites"
   from Nico Pache provides cleanups in the khugepaged and is a base for
   Nico's planned khugepaged mTHP support.
 
 - The 15 patch series "mm: memory hot(un)plug and SPARSEMEM cleanups"
   from David Hildenbrand implements code movement and cleanups in the
   memhotplug and sparsemem code.
 
 - The 2 patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and
   cleanup CONFIG_MIGRATION" from David Hildenbrand rationalizes some
   memhotplug Kconfig support.
 
 - The 6 patch series "change young flag check functions to return bool"
   from Baolin Wang is "a cleanup patchset to change all young flag check
   functions to return bool".
 
 - The 3 patch series "mm/damon/sysfs: fix memory leak and NULL
   dereference issues" from Josh Law and SeongJae Park fixes a few
   potential DAMON bugs.
 
 - The 25 patch series "mm/vma: convert vm_flags_t to vma_flags_t in vma
   code" from "converts a lot of the existing use of the legacy vm_flags_t
   data type to the new vma_flags_t type which replaces it".  Mainly in the
   vma code.
 
 - The 21 patch series "mm: expand mmap_prepare functionality and usage"
   from Lorenzo Stoakes "expands the mmap_prepare functionality, which is
   intended to replace the deprecated f_op->mmap hook which has been the
   source of bugs and security issues for some time".  Cleanups,
   documentation, extension of mmap_prepare into filesystem drivers.
 
 - The 13 patch series "mm/huge_memory: refactor zap_huge_pmd()" from
   Lorenzo Stoakes simplifies and cleans up zap_huge_pmd().  Additional
   cleanups around vm_normal_folio_pmd() and the softleaf functionality are
   performed.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad3HDQAKCRDdBJ7gKXxA
 jrUQAPwNhPk5nPSxnyxjAeQtOBHqgCdnICeEismLajPKd9aYRgEA0s2XAu3tSUYi
 GrBnWImHG3s4ePQxVcPCegWTsOUrXgQ=
 =1Q7o
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - "maple_tree: Replace big node with maple copy" (Liam Howlett)

   Mainly prepararatory work for ongoing development but it does reduce
   stack usage and is an improvement.

 - "mm, swap: swap table phase III: remove swap_map" (Kairui Song)

   Offers memory savings by removing the static swap_map. It also yields
   some CPU savings and implements several cleanups.

 - "mm: memfd_luo: preserve file seals" (Pratyush Yadav)

   File seal preservation to LUO's memfd code

 - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
   Chen)

   Additional userspace stats reportng to zswap

 - "arch, mm: consolidate empty_zero_page" (Mike Rapoport)

   Some cleanups for our handling of ZERO_PAGE() and zero_pfn

 - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
   Han)

   A robustness improvement and some cleanups in the kmemleak code

 - "Improve khugepaged scan logic" (Vernon Yang)

   Improve khugepaged scan logic and reduce CPU consumption by
   prioritizing scanning tasks that access memory frequently

 - "Make KHO Stateless" (Jason Miu)

   Simplify Kexec Handover by transitioning KHO from an xarray-based
   metadata tracking system with serialization to a radix tree data
   structure that can be passed directly to the next kernel

 - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
   Ballasi and Steven Rostedt)

   Enhance vmscan's tracepointing

 - "mm: arch/shstk: Common shadow stack mapping helper and
   VM_NOHUGEPAGE" (Catalin Marinas)

   Cleanup for the shadow stack code: remove per-arch code in favour of
   a generic implementation

 - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)

   Fix a WARN() which can be emitted the KHO restores a vmalloc area

 - "mm: Remove stray references to pagevec" (Tal Zussman)

   Several cleanups, mainly udpating references to "struct pagevec",
   which became folio_batch three years ago

 - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
   Shutsemau)

   Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
   pages encode their relationship to the head page

 - "mm/damon/core: improve DAMOS quota efficiency for core layer
   filters" (SeongJae Park)

   Improve two problematic behaviors of DAMOS that makes it less
   efficient when core layer filters are used

 - "mm/damon: strictly respect min_nr_regions" (SeongJae Park)

   Improve DAMON usability by extending the treatment of the
   min_nr_regions user-settable parameter

 - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)

   The proper fix for a previously hotfixed SMP=n issue. Code
   simplifications and cleanups ensued

 - "mm: cleanups around unmapping / zapping" (David Hildenbrand)

   A bunch of cleanups around unmapping and zapping. Mostly
   simplifications, code movements, documentation and renaming of
   zapping functions

 - "support batched checking of the young flag for MGLRU" (Baolin Wang)

   Batched checking of the young flag for MGLRU. It's part cleanups; one
   benchmark shows large performance benefits for arm64

 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)

   memcg cleanup and robustness improvements

 - "Allow order zero pages in page reporting" (Yuvraj Sakshith)

   Enhance free page reporting - it is presently and undesirably order-0
   pages when reporting free memory.

 - "mm: vma flag tweaks" (Lorenzo Stoakes)

   Cleanup work following from the recent conversion of the VMA flags to
   a bitmap

 - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
   Park)

   Add some more developer-facing debug checks into DAMON core

 - "mm/damon: test and document power-of-2 min_region_sz requirement"
   (SeongJae Park)

   An additional DAMON kunit test and makes some adjustments to the
   addr_unit parameter handling

 - "mm/damon/core: make passed_sample_intervals comparisons
   overflow-safe" (SeongJae Park)

   Fix a hard-to-hit time overflow issue in DAMON core

 - "mm/damon: improve/fixup/update ratio calculation, test and
   documentation" (SeongJae Park)

   A batch of misc/minor improvements and fixups for DAMON

 - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
   Hildenbrand)

   Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
   movement was required.

 - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)

   A somewhat random mix of fixups, recompression cleanups and
   improvements in the zram code

 - "mm/damon: support multiple goal-based quota tuning algorithms"
   (SeongJae Park)

   Extend DAMOS quotas goal auto-tuning to support multiple tuning
   algorithms that users can select

 - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)

   Fix the khugpaged sysfs handling so we no longer spam the logs with
   reams of junk when starting/stopping khugepaged

 - "mm: improve map count checks" (Lorenzo Stoakes)

   Provide some cleanups and slight fixes in the mremap, mmap and vma
   code

 - "mm/damon: support addr_unit on default monitoring targets for
   modules" (SeongJae Park)

   Extend the use of DAMON core's addr_unit tunable

 - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)

   Cleanups to khugepaged and is a base for Nico's planned khugepaged
   mTHP support

 - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)

   Code movement and cleanups in the memhotplug and sparsemem code

 - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
   CONFIG_MIGRATION" (David Hildenbrand)

   Rationalize some memhotplug Kconfig support

 - "change young flag check functions to return bool" (Baolin Wang)

   Cleanups to change all young flag check functions to return bool

 - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
   Law and SeongJae Park)

   Fix a few potential DAMON bugs

 - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
   Stoakes)

   Convert a lot of the existing use of the legacy vm_flags_t data type
   to the new vma_flags_t type which replaces it. Mainly in the vma
   code.

 - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)

   Expand the mmap_prepare functionality, which is intended to replace
   the deprecated f_op->mmap hook which has been the source of bugs and
   security issues for some time. Cleanups, documentation, extension of
   mmap_prepare into filesystem drivers

 - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)

   Simplify and clean up zap_huge_pmd(). Additional cleanups around
   vm_normal_folio_pmd() and the softleaf functionality are performed.

* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm: fix deferred split queue races during migration
  mm/khugepaged: fix issue with tracking lock
  mm/huge_memory: add and use has_deposited_pgtable()
  mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
  mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
  mm/huge_memory: separate out the folio part of zap_huge_pmd()
  mm/huge_memory: use mm instead of tlb->mm
  mm/huge_memory: remove unnecessary sanity checks
  mm/huge_memory: deduplicate zap deposited table call
  mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
  mm/huge_memory: add a common exit path to zap_huge_pmd()
  mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
  mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
  mm/huge: avoid big else branch in zap_huge_pmd()
  mm/huge_memory: simplify vma_is_specal_huge()
  mm: on remap assert that input range within the proposed VMA
  mm: add mmap_action_map_kernel_pages[_full]()
  uio: replace deprecated mmap hook with mmap_prepare in uio_info
  drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
  mm: allow handling of stacked mmap_prepare hooks in more drivers
  ...
2026-04-15 12:59:16 -07:00
Thomas Zimmermann
d33db956c9 hv: Select CONFIG_SYSFB only for CONFIG_HYPERV_VMBUS
Hyperv's sysfb access only exists in the VMBUS support. Therefore
only select CONFIG_SYSFB for CONFIG_HYPERV_VMBUS. Avoids sysfb code
on systems that don't need it.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 96959283a5 ("Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests")
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Saurabh Sengar <ssengar@linux.microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-hyperv@vger.kernel.org
Cc: <stable@vger.kernel.org> # v6.16+
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://patch.msgid.link/20260402092305.208728-2-tzimmermann@suse.de
2026-04-15 15:37:09 +02:00
Linus Torvalds
db23954eea Update for the core interrupt subsystem:
- Invoke add_interrupt_randomness() in handle_percpu_devid_irq() and
      cleanup the workaround in the Hyper-V driver, which would now invoke
      it twice on ARM64. Removing it from the driver requires to add it to
      the x86 system vector entry point.
 
    - Remove the pointles cpu_read_lock() around reading CPU possible mask,
      which is read only after init.
 
    - Add documentation for the interaction between device tree bindings and
      the interrupt type defines in irq.h.
 
    - Delete stale defines in the matrix allocator and the equivalent in
      loongarch.
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmnbt64QHHRnbHhAa2Vy
 bmVsLm9yZwAKCRCmGPVMDXSYoY/xD/9hdmaaSXX/JySPatJPgkAhekR8cl/brK9t
 K6qhMltg/XoUhmU1y0XSfuNDJwEOa4qvW2DbwfnzknIDr0AXvL39S9oNn+1o9I0x
 BjIbWwHTAsuLVyjQesqPWwyQtZ8HJCIM+3Ju2rUYz4jYuY8q15GYsbh6QN0pYDNG
 F54/OpuNV42/UhX/O01Gxw930GMGnsMkV6ou9dquap23U4FdlIsoY+zU/b09VEU2
 MmJZPGwryPpzhSLbCdOGuWA9oM6wd/FoBFYYU31W5OSXm4B0cRs+weE31WMogYKM
 lqudBuyNAhCIuZ06sdSvBPRswdvkuTUJovrJwG2r+rV6h953ExujNn+/ZxqraPg7
 iPK70Kwp/O2uyMFhHp/6r1u4bylK/AL7q6hace4cZFLqB/Htx5BW+hh/H7Cz6Yan
 H3cyBz9XIE7K5BI3lotKnlWVFwkrHgwYXUzHHrMq/UPdQKog1IPh/E29JekqtR+p
 RS9QzSF+Gs45I7oMP+7P8o5jAZYuuW+cODnXLlQkuz/bJff3QeftFhOKZkzbFk5x
 AFT0DREttxVCGcUCQaQYrWhFshp63+eDXgKGQT1YULleFUC1f9KH2W+6KlqjDwFR
 rvrUgmpMxkH2JJz1owct6vJ6wyffFNQtPeCTmq+7GM87HNm6Ji6AaRzQjRnhVGCt
 mh9VbBTnFw==
 =FvJG
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core irq updates from Thomas Gleixner:

 - Invoke add_interrupt_randomness() in handle_percpu_devid_irq() and
   cleanup the workaround in the Hyper-V driver, which would now invoke
   it twice on ARM64. Removing it from the driver requires to add it to
   the x86 system vector entry point

 - Remove the pointles cpu_read_lock() around reading CPU possible mask,
   which is read only after init

 - Add documentation for the interaction between device tree bindings
   and the interrupt type defines in irq.h

 - Delete stale defines in the matrix allocator and the equivalent in
   loongarch

* tag 'irq-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Drivers: hv: Move add_interrupt_randomness() to hypervisor callback sysvec
  genirq/chip: Invoke add_interrupt_randomness() in handle_percpu_devid_irq()
  genirq/affinity: Remove cpus_read_lock() while reading cpu_possible_mask
  genirq/matrix, LoongArch: Delete IRQ_MATRIX_BITS leftovers
  genirq: Document interaction between <linux/irq.h> and DT binding defines
2026-04-14 10:02:41 -07:00
Naman Jain
404cd6bffe mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER
When registering VTL0 memory via MSHV_ADD_VTL0_MEMORY, the kernel
computes pgmap->vmemmap_shift as the number of trailing zeros in the
OR of start_pfn and last_pfn, intending to use the largest compound
page order both endpoints are aligned to.

However, this value is not clamped to MAX_FOLIO_ORDER, so a
sufficiently aligned range (e.g. physical range
[0x800000000000, 0x800080000000), corresponding to start_pfn=0x800000000
with 35 trailing zeros) can produce a shift larger than what
memremap_pages() accepts, triggering a WARN and returning -EINVAL:

  WARNING: ... memremap_pages+0x512/0x650
  requested folio size unsupported

The MAX_FOLIO_ORDER check was added by
commit 646b67d575 ("mm/memremap: reject unreasonable folio/compound
page sizes in memremap_pages()").

Fix this by clamping vmemmap_shift to MAX_FOLIO_ORDER so we always
request the largest order the kernel supports, in those cases, rather
than an out-of-range value.

Also fix the error path to propagate the actual error code from
devm_memremap_pages() instead of hard-coding -EFAULT, which was
masking the real -EINVAL return.

Fixes: 7bfe3b8ea6 ("Drivers: hv: Introduce mshv_vtl driver")
Cc: stable@vger.kernel.org
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-14 04:44:31 +00:00
Dexuan Cui
0d5acba633 Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hyperv
With commit f84b21da36 ("PCI: hv: Don't load the driver for baremetal root partition"),
the bare metal Linux root partition won't use the pci-hyperv driver, but
when a Linux VM runs on the Linux root partition, pci-hyperv's module_init
function init_hv_pci_drv() can still run, e.g. in the case of
CONFIG_PCI_HYPERV=y, even if the VMBus driver is not used in such a VM
(i.e. the hv_vmbus driver's init function returns -ENODEV due to
vmbus_root_device being NULL).

In such a Linux VM, init_hv_pci_drv() runs with a side effect: the 3
hvpci_block_ops callbacks are set to functions that depend on hv_vmbus.

Later, when the MLX driver in such a VM invokes the callbacks, e.g. in
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c:
mlx5_hv_register_invalidate(), hvpci_block_ops.reg_blk_invalidate() is
hv_register_block_invalidate() rather than a NULL function pointer, and
hv_register_block_invalidate() assumes that it can find a struct
hv_pcibus_device from pdev->bus->sysdata, which is false in such a VM.

Consequently, hv_register_block_invalidate() -> get_pcichild_wslot() ->
spin_lock_irqsave() may hang since it can be accessing an invalid
spinlock pointer.

Fix the issue by exporting hv_vmbus_exists() and using it in pci-hyperv:

    hv_root_partition() is true and hv_nested is false ==>
	hv_vmbus_exists() is false.

    hv_root_partition() is true and hv_nested is true ==>
	hv_vmbus_exists() is true.

    hv_root_partition() is false ==> hv_vmbus_exists() is true.

While at it, rename vmbus_exists() to hv_vmbus_exists() to follow the
convention that all public functions have the hv_ prefix; also change
the return value's type from int to bool to make the code more readable;
also move the two pr_info() calls.

Reported-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-14 04:42:44 +00:00
Stanislav Kinsburskii
80acc80ea2 mshv: Introduce tracing support
Introduces various trace events and use them in the corresponding places
in the driver.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-14 04:42:02 +00:00
Michael Kelley
1c80dd81ca Drivers: hv: vmbus: Limit channel interrupt scan to relid high water mark
When checking for VMBus channel interrupts, current code always scans the
full SynIC receive interrupt bit array to get the relid of the
interrupting channels. The array has HV_EVENT_FLAGS_COUNT (2048) bits.
But VMs rarely have more than 100 channels, and the relid is typically
a small integer that is densely assigned by the Hyper-V host. It's
wasteful to scan 2048 bits when it is highly unlikely that anything will
be found past bit 100. The waste is double with Confidential VMBus because
there are two receive interrupt arrays that must be scanned: one for the
hypervisor SynIC and one for the paravisor SynIC.

Improve the scanning by tracking the largest relid that has been offered
by the Hyper-V host. Then when checking for VMBus channel interrupts, only
scan up to this high water mark.

When channels are rescinded, it's not worth the complexity to recalculate
the high water mark. Hyper-V tends to reuse the rescinded relids for any
new channels that are subsequently added, and the performance benefit of
exactly tracking the high water mark would be minimal.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Roman Kisel <vdso@mailbox.org>
Reviewed-by: Roman Kisel <vdso@mailbox.org>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-14 04:42:02 +00:00
Lorenzo Stoakes (Oracle)
f98cb7ca4a drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
The f_op->mmap interface is deprecated, so update the vmbus driver to use
its successor, mmap_prepare.

This updates all callbacks which referenced the function pointer
hv_mmap_ring_buffer to instead reference hv_mmap_prepare_ring_buffer,
utilising the newly introduced compat_set_desc_from_vma() and
__compat_vma_mmap() to be able to implement this change.

The UIO HV generic driver is the only user of hv_create_ring_sysfs(),
which is the only function which references
vmbus_channel->mmap_prepare_ring_buffer which, in turn, is the only
external interface to hv_mmap_prepare_ring_buffer.

This patch therefore updates this caller to use mmap_prepare instead,
which also previously used vm_iomap_memory(), so this change replaces it
with its mmap_prepare equivalent, mmap_action_simple_ioremap().

[akpm@linux-foundation.org: restore struct vmbus_channel comment, per Michael Kelley]
Link: https://lkml.kernel.org/r/05467cb62267d750e5c770147517d4df0246cda6.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:44 -07:00
Yuvraj Sakshith
fd4bf4f287 hv_balloon: set unspecified page reporting order
Explicitly mention page reporting order to be set to default value using
PAGE_REPORTING_ORDER_UNSPECIFIED fallback value.

Link: https://lkml.kernel.org/r/20260303113032.3008371-4-yuvraj.sakshith@oss.qualcomm.com
Signed-off-by: Yuvraj Sakshith <yuvraj.sakshith@oss.qualcomm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Eugenio Pérez <eperezma@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05 13:53:17 -07:00
Michael Kelley
e8be82c2d7 Drivers: hv: Move add_interrupt_randomness() to hypervisor callback sysvec
The Hyper-V ISRs, for normal guests and when running in the hypervisor root
patition, are calling add_interrupt_randomness() as a primary source of
entropy. The call is currently in the ISRs as a common place to handle both
x86/x64 and arm64.

On x86/x64, hypervisor interrupts come through a custom sysvec entry, and
do not go through a generic interrupt handler.

On arm64, hypervisor interrupts come through an emulated GICv3. GICv3 uses
the generic handler handle_percpu_devid_irq(), which does not do
add_interrupt_randomness() -- unlike its counterpart
handle_percpu_irq().

But handle_percpu_devid_irq() is now updated to do the
add_interrupt_randomness(). So add_interrupt_randomness() is now needed
only in Hyper-V's x86/x64 custom sysvec path.

Move add_interrupt_randomness() from the Hyper-V ISRs into the Hyper-V
x86/x64 custom sysvec path, matching the existing STIMER0 sysvec path.

With this change, add_interrupt_randomness() is no longer called from any
device drivers, which is appropriate.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://patch.msgid.link/20260402202400.1707-3-mhklkml@zohomail.com
2026-04-04 21:01:56 +02:00
Stanislav Kinsburskii
16cbec2489 mshv: Fix infinite fault loop on permission-denied GPA intercepts
Prevent infinite fault loops when guests access memory regions without
proper permissions. Currently, mshv_handle_gpa_intercept() attempts to
remap pages for all faults on movable memory regions, regardless of
whether the access type is permitted. When a guest writes to a read-only
region, the remap succeeds but the region remains read-only, causing
immediate re-fault and spinning the vCPU indefinitely.

Validate intercept access type against region permissions before
attempting remaps. Reject writes to non-writable regions and executes to
non-executable regions early, returning false to let the VMM handle the
intercept appropriately.

This also closes a potential DoS vector where malicious guests could
intentionally trigger these fault loops to consume host resources.

Fixes: b9a66cd5cc ("mshv: Add support for movable memory regions")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-04 05:25:53 +00:00
Stanislav Kinsburskii
c0e296f257 mshv: Fix error handling in mshv_region_pin
The current error handling has two issues:

First, pin_user_pages_fast() can return a short pin count (less than
requested but greater than zero) when it cannot pin all requested pages.
This is treated as success, leading to partially pinned regions being
used, which causes memory corruption.

Second, when an error occurs mid-loop, already pinned pages from the
current batch are not properly accounted for before calling
mshv_region_invalidate_pages(), causing a page reference leak.

Treat short pins as errors and fix partial batch accounting before
cleanup.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-18 16:18:49 +00:00
Stanislav Kinsburskii
6922db2504 mshv: Fix use-after-free in mshv_map_user_memory error path
In the error path of mshv_map_user_memory(), calling vfree() directly on
the region leaves the MMU notifier registered. When userspace later unmaps
the memory, the notifier fires and accesses the freed region, causing a
use-after-free and potential kernel panic.

Replace vfree() with mshv_partition_put() to properly unregister
the MMU notifier before freeing the region.

Fixes: b9a66cd5cc ("mshv: Add support for movable memory regions")
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-13 21:11:18 +00:00
Mukesh R
0fc773b0e4 mshv: pass struct mshv_user_mem_region by reference
For unstated reasons, function mshv_partition_ioctl_set_memory passes
struct mshv_user_mem_region by value instead of by reference. Change
it to pass by reference.

Signed-off-by: Mukesh R <mrathor@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-12 04:32:21 +00:00
Wei Liu
edd20cb693 Revert "mshv: expose the scrub partition hypercall"
This reverts commit 36d6cbb621.

Calling this as a passthrough hypercall leaves the VM in an inconsistent
state. Revert before it is released.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-11 16:54:24 +00:00
Anirudh Rayabharam (Microsoft)
622d68772d mshv: add arm64 support for doorbell & intercept SINTs
On x86, the HYPERVISOR_CALLBACK_VECTOR is used to receive synthetic
interrupts (SINTs) from the hypervisor for doorbells and intercepts.
There is no such vector reserved for arm64.

On arm64, the hypervisor exposes a synthetic register that can be read
to find the INTID that should be used for SINTs. This INTID is in the
PPI range.

To better unify the code paths, introduce mshv_sint_vector_init() that
either reads the synthetic register and obtains the INTID (arm64) or
just uses HYPERVISOR_CALLBACK_VECTOR as the interrupt vector (x86).

Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-25 19:09:49 +00:00
Anirudh Rayabharam (Microsoft)
5a674ef871 mshv: refactor synic init and cleanup
Rename mshv_synic_init() to mshv_synic_cpu_init() and
mshv_synic_cleanup() to mshv_synic_cpu_exit() to better reflect that
these functions handle per-cpu synic setup and teardown.

Use mshv_synic_init/cleanup() to perform init/cleanup that is not per-cpu.
Move all the synic related setup from mshv_parent_partition_init.

Move the reboot notifier to mshv_synic.c because it currently only
operates on the synic cpuhp state.

Move out synic_pages from the global mshv_root since its use is now
completely local to mshv_synic.c.

This is in preparation for adding more stuff to mshv_synic_init().

No functional change.

Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-25 19:09:24 +00:00
Kees Cook
189f164e57 Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-22 08:26:33 -08:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
d31558c077 hyperv-next for v7.0
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmmWuQwTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXnnHB/41Jji+y8FHe2SqpQhUOqHb6NDEr3GX
 YpAybhz2IsBHVhbCQn789UiIcSr0UDR7wnVLAmXe+5eY/jRwNggIO3tFqLYn92pK
 KSTNafgNbLxh3iKBxRsUy0b3JutjD2LytkpFj2KVbBsZfmRxCZmKIV/4V18rV+fA
 uemvoqLwU7emEWkhZ24suHMHPVpv6xKs9O6gOrQ4+zXR0g//eMLDqb17uj8h+8sM
 ZsPsMYeuOihXlvGeBRjbnWYjA1ODWGDvwR9VT+VU4+HWht/KSr15EGeXZdV2eZUt
 e/8swbqOS94a2ZjOgStzVkcPqAF88t9zZ+gvYElTDzLlHjqbrZdpeDDt
 =A7tT
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V updates from Wei Liu:

 - Debugfs support for MSHV statistics (Nuno Das Neves)

 - Support for the integrated scheduler (Stanislav Kinsburskii)

 - Various fixes for MSHV memory management and hypervisor status
   handling (Stanislav Kinsburskii)

 - Expose more capabilities and flags for MSHV partition management
   (Anatol Belski, Muminul Islam, Magnus Kulke)

 - Miscellaneous fixes to improve code quality and stability (Carlos
   López, Ethan Nelson-Moore, Li RongQing, Michael Kelley, Mukesh
   Rathor, Purna Pavan Chandra Aekkaladevi, Stanislav Kinsburskii, Uros
   Bizjak)

 - PREEMPT_RT fixes for vmbus interrupts (Jan Kiszka)

* tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (34 commits)
  mshv: Handle insufficient root memory hypervisor statuses
  mshv: Handle insufficient contiguous memory hypervisor status
  mshv: Introduce hv_deposit_memory helper functions
  mshv: Introduce hv_result_needs_memory() helper function
  mshv: Add SMT_ENABLED_GUEST partition creation flag
  mshv: Add nested virtualization creation flag
  Drivers: hv: vmbus: Simplify allocation of vmbus_evt
  mshv: expose the scrub partition hypercall
  mshv: Add support for integrated scheduler
  mshv: Use try_cmpxchg() instead of cmpxchg()
  x86/hyperv: Fix error pointer dereference
  x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV
  Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
  x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn
  x86/hyperv: Use savesegment() instead of inline asm() to save segment registers
  mshv: fix SRCU protection in irqfd resampler ack handler
  mshv: make field names descriptive in a header struct
  x86/hyperv: Update comment in hyperv_cleanup()
  mshv: clear eventfd counter on irqfd shutdown
  x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()
  ...
2026-02-20 08:48:31 -08:00
Stanislav Kinsburskii
158ebb578c mshv: Handle insufficient root memory hypervisor statuses
When creating guest partition objects, the hypervisor may fail to
allocate root partition pages and return an insufficient memory status.
In this case, deposit memory using the root partition ID instead.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-19 06:42:11 +00:00
Stanislav Kinsburskii
cf82dd5ea9 mshv: Handle insufficient contiguous memory hypervisor status
The HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY status indicates that the
hypervisor lacks sufficient contiguous memory for its internal allocations.

When this status is encountered, allocate and deposit
HV_MAX_CONTIGUOUS_ALLOCATION_PAGES contiguous pages to the hypervisor.
HV_MAX_CONTIGUOUS_ALLOCATION_PAGES is defined in the hypervisor headers, a
deposit of this size will always satisfy the hypervisor's requirements.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-19 06:39:16 +00:00
Stanislav Kinsburskii
ede54383e6 mshv: Introduce hv_deposit_memory helper functions
Introduce hv_deposit_memory_node() and hv_deposit_memory() helper
functions to handle memory deposit with proper error handling.

The new hv_deposit_memory_node() function takes the hypervisor status
as a parameter and validates it before depositing pages. It checks for
HV_STATUS_INSUFFICIENT_MEMORY specifically and returns an error for
unexpected status codes.

This is a precursor patch to new out-of-memory error codes support.
No functional changes intended.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-19 06:38:48 +00:00
Stanislav Kinsburskii
7db44aa173 mshv: Introduce hv_result_needs_memory() helper function
Replace direct comparisons of hv_result(status) against
HV_STATUS_INSUFFICIENT_MEMORY with a new hv_result_needs_memory() helper
function.

This improves code readability and provides a consistent and extendable
interface for checking out-of-memory conditions in hypercall results.

No functional changes intended.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-19 06:36:58 +00:00
Anatol Belski
8927a108a7 mshv: Add SMT_ENABLED_GUEST partition creation flag
Add support for HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUEST
to allow userspace VMMs to enable SMT for guest partitions.

Expose this via new MSHV_PT_BIT_SMT_ENABLED_GUEST flag in the UAPI.

Without this flag, the hypervisor schedules guest VPs incorrectly,
causing SMT unusable.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 23:54:37 +00:00
Muminul Islam
a284dbc96a mshv: Add nested virtualization creation flag
Introduce HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE to
indicate support for nested virtualization during partition creation.

This enables clearer configuration and capability checks for nested
virtualization scenarios.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 23:53:22 +00:00
Michael Kelley
30d25a8fc0 Drivers: hv: vmbus: Simplify allocation of vmbus_evt
The per-cpu variable vmbus_evt is currently dynamically allocated. It's
only 8 bytes, so just allocate it statically to simplify and save a few
lines of code.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 23:40:45 +00:00
Magnus Kulke
36d6cbb621 mshv: expose the scrub partition hypercall
This hypercall needs to be exposed for VMMs to soft-reboot guests. It
will reset APIC and synthetic interrupt controller state, among others.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 23:32:17 +00:00
Stanislav Kinsburskii
4bef6b28ba mshv: Add support for integrated scheduler
Query the hypervisor for integrated scheduler support and use it if
configured.

Microsoft Hypervisor originally provided two schedulers: root and core. The
root scheduler allows the root partition to schedule guest vCPUs across
physical cores, supporting both time slicing and CPU affinity (e.g., via
cgroups). In contrast, the core scheduler delegates vCPU-to-physical-core
scheduling entirely to the hypervisor.

Direct virtualization introduces a new privileged guest partition type - L1
Virtual Host (L1VH) — which can create child partitions from its own
resources. These child partitions are effectively siblings, scheduled by
the hypervisor's core scheduler. This prevents the L1VH parent from setting
affinity or time slicing for its own processes or guest VPs. While cgroups,
CFS, and cpuset controllers can still be used, their effectiveness is
unpredictable, as the core scheduler swaps vCPUs according to its own logic
(typically round-robin across all allocated physical CPUs). As a result,
the system may appear to "steal" time from the L1VH and its children.

To address this, Microsoft Hypervisor introduces the integrated scheduler.
This allows an L1VH partition to schedule its own vCPUs and those of its
guests across its "physical" cores, effectively emulating root scheduler
behavior within the L1VH, while retaining core scheduler behavior for the
rest of the system.

The integrated scheduler is controlled by the root partition and gated by
the vmm_enable_integrated_scheduler capability bit. If set, the hypervisor
supports the integrated scheduler. The L1VH partition must then check if it
is enabled by querying the corresponding extended partition property. If
this property is true, the L1VH partition must use the root scheduler
logic; otherwise, it must use the core scheduler. This requirement makes
reading VMM capabilities in L1VH partition a requirement too.

Signed-off-by: Andreea Pintilie <anpintil@microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 23:28:10 +00:00
Uros Bizjak
0597696017 mshv: Use try_cmpxchg() instead of cmpxchg()
Use !try_cmpxchg() instead of cmpxchg (*ptr, old, new) != old.
x86 CMPXCHG instruction returns success in ZF flag, so this
change saves a compare after CMPXCHG.

The generated assembly code improves from e.g.:

     415:	48 8b 44 24 30       	mov    0x30(%rsp),%rax
     41a:	48 8b 54 24 38       	mov    0x38(%rsp),%rdx
     41f:	f0 49 0f b1 91 a8 02 	lock cmpxchg %rdx,0x2a8(%r9)
     426:	00 00
     428:	48 3b 44 24 30       	cmp    0x30(%rsp),%rax
     42d:	0f 84 09 ff ff ff    	je     33c <...>

to:

     415:	48 8b 44 24 30       	mov    0x30(%rsp),%rax
     41a:	48 8b 54 24 38       	mov    0x38(%rsp),%rdx
     41f:	f0 49 0f b1 91 a8 02 	lock cmpxchg %rdx,0x2a8(%r9)
     426:	00 00
     428:	0f 84 0e ff ff ff    	je     33c <...>

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 23:27:20 +00:00
Jan Kiszka
f8e6343b7a Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
with related guest support enabled:

[    1.127941] hv_vmbus: registering driver hyperv_drm

[    1.132518] =============================
[    1.132519] [ BUG: Invalid wait context ]
[    1.132521] 6.19.0-rc8+ #9 Not tainted
[    1.132524] -----------------------------
[    1.132525] swapper/0/0 is trying to lock:
[    1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0
[    1.132543] other info that might help us debug this:
[    1.132544] context-{2:2}
[    1.132545] 1 lock held by swapper/0/0:
[    1.132547]  #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0
[    1.132557] stack backtrace:
[    1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)}
[    1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
[    1.132567] Call Trace:
[    1.132570]  <IRQ>
[    1.132573]  dump_stack_lvl+0x6e/0xa0
[    1.132581]  __lock_acquire+0xee0/0x21b0
[    1.132592]  lock_acquire+0xd5/0x2d0
[    1.132598]  ? vmbus_chan_sched+0xc4/0x2b0
[    1.132606]  ? lock_acquire+0xd5/0x2d0
[    1.132613]  ? vmbus_chan_sched+0x31/0x2b0
[    1.132619]  rt_spin_lock+0x3f/0x1f0
[    1.132623]  ? vmbus_chan_sched+0xc4/0x2b0
[    1.132629]  ? vmbus_chan_sched+0x31/0x2b0
[    1.132634]  vmbus_chan_sched+0xc4/0x2b0
[    1.132641]  vmbus_isr+0x2c/0x150
[    1.132648]  __sysvec_hyperv_callback+0x5f/0xa0
[    1.132654]  sysvec_hyperv_callback+0x88/0xb0
[    1.132658]  </IRQ>
[    1.132659]  <TASK>
[    1.132660]  asm_sysvec_hyperv_callback+0x1a/0x20

As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT,
the vmbus_isr execution needs to be moved into thread context. Open-
coding this allows to skip the IPI that irq_work would additionally
bring and which we do not need, being an IRQ, never an NMI.

This affects both x86 and arm64, therefore hook into the common driver
logic.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Florian Bezdeka <florian.bezdeka@siemens.com>
Tested-by: Florian Bezdeka <florian.bezdeka@siemens.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18 07:18:37 +00:00
Prasanna Kumar T S M
d65f2978f3 drivers: hv: vmbus_drv: Remove reference to hpyerv_fb
Remove hyperv_fb reference as the driver is removed.

Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-14 11:07:12 +01:00
Linus Torvalds
0c61526621 EFI updates for v7.0
- Quirk the broken EFI framebuffer geometry on the Valve Steam Deck
 
 - Capture the EDID information of the primary display also on non-x86
   EFI systems when booting via the EFI stub.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCaYoTAgAKCRAwbglWLn0t
 XFe8AQDJe2GSNfzWgqoTqgT6tcH7lFG2SjdpIb+jHSmvgHckbAD/cUaY8YnhdYkm
 nz6URLJN/2NHuaDq1mUL8CwJwIot4wk=
 =41tn
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:

 - Quirk the broken EFI framebuffer geometry on the Valve Steam Deck

 - Capture the EDID information of the primary display also on non-x86
   EFI systems when booting via the EFI stub.

* tag 'efi-next-for-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: Support EDID information
  sysfb: Move edid_info into sysfb_primary_display
  sysfb: Pass sysfb_primary_display to devices
  sysfb: Replace screen_info with sysfb_primary_display
  sysfb: Add struct sysfb_display_info
  efi: sysfb_efi: Reduce number of references to global screen_info
  efi: earlycon: Reduce number of references to global screen_info
  efi: sysfb_efi: Fix efidrmfb and simpledrmfb on Valve Steam Deck
  efi: sysfb_efi: Convert swap width and height quirk to a callback
  efi: sysfb_efi: Fix lfb_linelength calculation when applying quirks
  efi: sysfb_efi: Replace open coded swap with the macro
2026-02-09 20:49:19 -08:00
Li RongQing
2e7577cd5d mshv: fix SRCU protection in irqfd resampler ack handler
Replace hlist_for_each_entry_rcu() with hlist_for_each_entry_srcu()
in mshv_irqfd_resampler_ack() to correctly handle SRCU-protected
linked list traversal.

The function uses SRCU (sleepable RCU) synchronization via
partition->pt_irq_srcu, but was incorrectly using the RCU variant
for list iteration. This could lead to race conditions when the
list is modified concurrently.

Also add srcu_read_lock_held() assertion as required by
hlist_for_each_entry_srcu() to ensure we're in the proper
read-side critical section.

Fixes: 621191d709 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-06 07:04:58 +00:00
Mukesh R
51515bfc29 mshv: make field names descriptive in a header struct
When struct fields use very common names like "pages" or "type", it makes
it difficult to find uses of these fields with tools like grep, cscope,
etc when the struct is in a header file included in many places. Add
prefix mreg_ to some fields in struct mshv_mem_region to make it easier
to find them.

There is no functional change.

Signed-off-by: Mukesh R <mrathor@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-06 07:03:35 +00:00
Carlos López
2b4246153e mshv: clear eventfd counter on irqfd shutdown
While unhooking from the irqfd waitqueue, clear the internal eventfd
counter by using eventfd_ctx_remove_wait_queue() instead of
remove_wait_queue(), preventing potential spurious interrupts. This
removes the need to store a pointer into the workqueue, as the eventfd
already keeps track of it.

This mimicks what other similar subsystems do on their equivalent paths
with their irqfds (KVM, Xen, ACRN support, etc).

Signed-off-by: Carlos López <clopez@suse.de>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04 06:36:19 +00:00
Michael Kelley
0236e75df4 Drivers: hv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()
When running with a paravisor or in the root partition, the SynIC event and
message pages are provided by the paravisor or hypervisor respectively,
instead of being allocated by Linux. The provided pages are normal memory,
but are outside of the physical address space seen by Linux. As such they
cannot be accessed via the kernel's direct map, and must be explicitly
mapped to a kernel virtual address.

Current code uses ioremap_cache() and iounmap() to map and unmap the pages.
These functions are for use on I/O address space that may not behave as
normal memory, so they generate or expect addresses with the __iomem
attribute. For normal memory, the preferred functions are memremap() and
memunmap(), which operate similarly but without __iomem.

At the time of the original work on CoCo VMs on Hyper-V, memremap() did not
support creating a decrypted mapping, so ioremap_cache() was used instead,
since I/O address space is always mapped decrypted. memremap() has since
been enhanced to allow decrypted mappings, so replace ioremap_cache() with
memremap() when mapping the event and message pages. Similarly, replace
iounmap() with memunmap(). As a side benefit, the replacement cleans up
'sparse' warnings about __iomem mismatches.

The replacement is done to use the correct functions as long-term goodness
and to clean up the sparse warnings. No runtime bugs are fixed.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601170445.JtZQwndW-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512150359.fMdmbddk-lkp@intel.com/
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04 06:29:12 +00:00
Nuno Das Neves
ff225ba9ad mshv: Add debugfs to view hypervisor statistics
Introduce a debugfs interface to expose root and child partition stats
when running with mshv_root.

Create a debugfs directory "mshv" containing 'stats' files organized by
type and id. A stats file contains a number of counters depending on
its type. e.g. an excerpt from a VP stats file:

TotalRunTime                  : 1997602722
HypervisorRunTime             : 649671371
RemoteNodeRunTime             : 0
NormalizedRunTime             : 1997602721
IdealCpu                      : 0
HypercallsCount               : 1708169
HypercallsTime                : 111914774
PageInvalidationsCount        : 0
PageInvalidationsTime         : 0

On a root partition with some active child partitions, the entire
directory structure may look like:

mshv/
  stats             # hypervisor stats
  lp/               # logical processors
    0/              # LP id
      stats         # LP 0 stats
    1/
    2/
    3/
  partition/        # partition stats
    1/              # root partition id
      stats         # root partition stats
      vp/           # root virtual processors
        0/          # root VP id
          stats     # root VP 0 stats
        1/
        2/
        3/
    42/             # child partition id
      stats         # child partition stats
      vp/           # child VPs
        0/          # child VP id
          stats     # child VP 0 stats
        1/
    43/
    55/

On L1VH, some stats are not present as it does not own the hardware
like the root partition does:
- The hypervisor and lp stats are not present
- L1VH's partition directory is named "self" because it can't get its
  own id
- Some of L1VH's partition and VP stats fields are not populated, because
  it can't map its own HV_STATS_AREA_PARENT page.

Co-developed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Co-developed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Co-developed-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
Co-developed-by: Jinank Jain <jinankjain@microsoft.com>
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04 06:17:05 +00:00
Nuno Das Neves
c23271b636 mshv: Add data for printing stats page counters
Introduce mshv_debugfs_counters.c, containing static data
corresponding to HV_*_COUNTER enums in the hypervisor source.
Defining the enum members as an array instead makes more sense,
since it will be iterated over to print counter information to
debugfs.

Include hypervisor, logical processor, partition, and virtual
processor counters.

Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04 06:17:05 +00:00
Nuno Das Neves
df40f32c87 mshv: Update hv_stats_page definitions
hv_stats_page belongs in hvhdk.h, move it there.

It does not require a union to access the data for different counters,
just use a single u64 array for simplicity and to match the Windows
definitions.

While at it, correct the ARM64 value for VpRootDispatchThreadBlocked.

Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04 06:17:05 +00:00
Stanislav Kinsburskii
c527c7aee2 mshv: Always map child vp stats pages regardless of scheduler type
Currently vp->vp_stats_pages is only used by the root scheduler for fast
interrupt injection.

Soon, vp_stats_pages will also be needed for exposing child VP stats to
userspace via debugfs. Mapping the pages a second time to a different
address causes an error on L1VH.

Remove the scheduler requirement and always map the vp stats pages.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04 06:17:05 +00:00