Commit Graph

5110 Commits

Author SHA1 Message Date
Breno Leitao
5cbb61bf41 arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's
sve_set_common() is the backend for PTRACE_SETREGSET(NT_ARM_SVE) and
PTRACE_SETREGSET(NT_ARM_SSVE). Every write in the function operates on
the tracee (target) - except a single memset that uses current instead,
zeroing the tracer's saved V0-V31 / FPSR / FPCR shadow on every ptrace
SETREGSET call.

The memset is meant to give the tracee a defined zero register image
before the user-supplied payload is copied in (for partial writes,
header-only writes, and FPSIMD<->SVE format switches). Aiming it at
current both denies the tracee that clean slate and silently corrupts
the tracer.

The corruption of the tracer's saved FPSIMD state is not always
observable. Where the tracer's state is live on a CPU, this may be
reused without loading the corrupted state from memory, and will
eventually be written back over the corrupted state. Where the tracer's
state is saved in SVE_PT_REGS_SVE format, only the FPSR and FPCR are
clobbered, and the effective copy of the vectors is in the task's
sve_state.

Reproducible on an arm64 kernel with SVE: a single-threaded tracer that
loads a known pattern into V0-V31, issues PTRACE_SETREGSET(NT_ARM_SVE)
on a child, and reads V0-V31 back observes them all zeroed within tens
of thousands of iterations when a sibling thread keeps stealing the
FPSIMD CPU binding.

Fixes: 316283f276 ("arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE")
Cc: <stable@vger.kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-06 12:11:49 +01:00
Kevin Brodsky
030e8a40ff arm64: signal: Preserve POR_EL0 if poe_context is missing
Commit 2e8a1acea8 ("arm64: signal: Improve POR_EL0 handling to
avoid uaccess failures") delayed the write to POR_EL0 in
rt_sigreturn to avoid spurious uaccess failures. This change however
relies on the poe_context frame record being present: on a system
supporting POE, calling sigreturn without a poe_context record now
results in writing arbitrary data from the kernel stack into POR_EL0.

Fix this by adding a __valid_fields member to struct
user_access_state, and zeroing the struct on allocation.
restore_poe_context() then indicates that the por_el0 field is valid
by setting the corresponding bit in __valid_fields, and
restore_user_access_state() only touches POR_EL0 if there is a valid
value to set it to. This is in line with how POR_EL0 was originally
handled; all frame records are currently optional, except
fpsimd_context.

To ensure that __valid_fields is kept in sync, fields (currently
just por_el0) are now accessed via accessors and prefixed with __ to
discourage direct access.

Fixes: 2e8a1acea8 ("arm64: signal: Improve POR_EL0 handling to avoid uaccess failures")
Cc: <stable@vger.kernel.org>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-01 17:44:25 +01:00
Wentao Guan
4023b7424e arm64/scs: Fix potential sign extension issue of advance_loc4
The expression (*opcode++ << 24) and exp * code_alignment_factor
may overflow signed int and becomes negative.

Fix this by casting each byte to u64 before shifting. Also fix
the misaligned break statement while we are here.

Example of the result can be seen here:
Link: https://godbolt.org/z/zhY8d3595

It maybe not a real problem, but could be a issue in future.

Fixes: d499e9627d ("arm64/scs: Fix handling of advance_loc4")
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-27 12:16:26 +01:00
Linus Torvalds
13f24586a2 arm64 updates for 7.1 (second round):
Core features:
 
  - Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
    DVMSync acknowledgement. The fix consists of sending IPIs on TLB
    maintenance to those CPUs running in user space with SME enabled
 
  - Include kernel-hwcap.h in list of generated files (missed in a recent
    commit generating the KERNEL_HWCAP_* macros)
 
 CCA:
 
  - Fix RSI_INCOMPLETE error check in arm-cca-guest
 
 MPAM:
 
  - Fix an unmount->remount problem with the CDP emulation, uninitialised
    variable and checker warnings
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmnmQZcACgkQa9axLQDI
 XvGouA//SXlo7hQyM41rkgRru9oqftrGg0y6nxz4Z089kv50cm3Jlf/nUuti6vah
 BMBLCGXA1iOQrGIVmuvtCxDRrfYZWpfKGuT9A0gmEoMqrGIpWl9gfBQG+uR+YrQX
 4kp5DLqB85WrJIPiy7HUV6GQoCbFuMrRJwxl89IdWZSobaei3SczTmnttwyJtxG5
 /BMitl024TYdiOPNo8bhiML1wIJCaTHvH4IrtCHPyUHEAtsHSMy00y0OrSKBtA/9
 ZHZRpY7Po/jnL7YUs1AfYwsaSXjkvqXN0K1Tdavzm75k6lpJmbM3VsZabG/CEuvK
 PCOGV++is4Y/A+7aQsCwXKeVnY3b6AC4sextytNq0g3GZ7I+Ht9O6nbsp5ZmyXzB
 HRiFxmFS1pSQOMX9f1neKi3vxDMTy1tKPeccTTzL8dNnxTvUBXnoWfPoJh3cpbjm
 Dbhe1kksiEn01WWFacGtkIPDa9c+Bkd2T+8wrsk85Z+u3Z0JPM5PfOn6v3X9YlKl
 K7W8fhvlDL1wP+iyWcMT5zdo+xzHY4ZxuyWbi9a4RhKc6lFHVVG2mpUuPwSsh2ma
 NnxkDouriuoADHBir89U71N483HSnNfSjhlVSFYD2LFCre5KOZM4KYZ2vwWb8Sy4
 79q+BlVRUTQ5O6XjePoSPjUW4APPNviHJsF4E4IiqHkd9O5lMZU=
 =LNY2
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull more arm64 updates from Catalin Marinas:
 "The main 'feature' is a workaround for C1-Pro erratum 4193714
  requiring IPIs during TLB maintenance if a process is running in user
  space with SME enabled.

  The hardware acknowledges the DVMSync messages before completing
  in-flight SME accesses, with security implications. The workaround
  makes use of the mm_cpumask() to track the cores that need
  interrupting (arm64 hasn't used this mask before).

  The rest are fixes for MPAM, CCA and generated header that turned up
  during the merging window or shortly before.

  Summary:

  Core features:

   - Add workaround for C1-Pro erratum 4193714 - early CME (SME unit)
     DVMSync acknowledgement. The fix consists of sending IPIs on TLB
     maintenance to those CPUs running in user space with SME enabled

   - Include kernel-hwcap.h in list of generated files (missed in a
     recent commit generating the KERNEL_HWCAP_* macros)

  CCA:

   - Fix RSI_INCOMPLETE error check in arm-cca-guest

  MPAM:

   - Fix an unmount->remount problem with the CDP emulation,
     uninitialised variable and checker warnings"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm_mpam: resctrl: Make resctrl_mon_ctx_waiters static
  arm_mpam: resctrl: Fix the check for no monitor components found
  arm_mpam: resctrl: Fix MBA CDP alloc_capable handling on unmount
  virt: arm-cca-guest: fix error check for RSI_INCOMPLETE
  arm64/hwcap: Include kernel-hwcap.h in list of generated files
  arm64: errata: Work around early CME DVMSync acknowledgement
  arm64: cputype: Add C1-Pro definitions
  arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
  arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
2026-04-20 16:46:22 -07:00
Catalin Marinas
858fbd7248 Merge branch 'for-next/c1-pro-erratum-4193714' into for-next/core
* for-next/c1-pro-erratum-4193714:
  : Work around C1-Pro erratum 4193714 (CVE-2026-0995)
  arm64: errata: Work around early CME DVMSync acknowledgement
  arm64: cputype: Add C1-Pro definitions
  arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
  arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
2026-04-20 13:12:35 +01:00
Linus Torvalds
87768582a4 dma-mapping updates for Linux 7.0:
- added support for batched cache sync, what improves performance of
   dma_map/unmap_sg() operations on ARM64 architecture (Barry Song)
 
 - introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory
   used in confidential computing (Jiri Pirko)
 
 - refactored spaghetti-like code in drivers/of/of_reserved_mem.c and its
   clients (Marek Szyprowski, shared branch with device-tree updates to
   avoid merge conflicts)
 
 - prepared Contiguous Memory Allocator related code for making dma-buf
   drivers modularized (Maxime Ripard)
 
 - added support for benchmarking dma_map_sg() calls to tools/dma utility
   (Qinxin Xia)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaeCbdQAKCRCJp1EFxbsS
 RHbWAQCt70dzrU0lu0omTR1HdDP4GTYfuM6nZR91e8/itGN1+QD/XH4I/0wuybzk
 v5uxbIC6lR3abQRc3YNRXfi+i5j26A4=
 =Oee2
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping updates from Marek Szyprowski:

 - added support for batched cache sync, what improves performance of
   dma_map/unmap_sg() operations on ARM64 architecture (Barry Song)

 - introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory
   used in confidential computing (Jiri Pirko)

 - refactored spaghetti-like code in drivers/of/of_reserved_mem.c and
   its clients (Marek Szyprowski, shared branch with device-tree updates
   to avoid merge conflicts)

 - prepared Contiguous Memory Allocator related code for making dma-buf
   drivers modularized (Maxime Ripard)

 - added support for benchmarking dma_map_sg() calls to tools/dma
   utility (Qinxin Xia)

* tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: (24 commits)
  dma-buf: heaps: system: document system_cc_shared heap
  dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory
  dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory
  mm: cma: Export cma_alloc(), cma_release() and cma_get_name()
  dma: contiguous: Export dev_get_cma_area()
  dma: contiguous: Make dma_contiguous_default_area static
  dma: contiguous: Make dev_get_cma_area() a proper function
  dma: contiguous: Turn heap registration logic around
  of: reserved_mem: rework fdt_init_reserved_mem_node()
  of: reserved_mem: clarify fdt_scan_reserved_mem*() functions
  of: reserved_mem: rearrange code a bit
  of: reserved_mem: replace CMA quirks by generic methods
  of: reserved_mem: switch to ops based OF_DECLARE()
  of: reserved_mem: use -ENODEV instead of -ENOENT
  of: reserved_mem: remove fdt node from the structure
  dma-mapping: fix false kernel-doc comment marker
  dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg
  dma-mapping: Separate DMA sync issuing and completion waiting
  arm64: Provide dcache_inval_poc_nosync helper
  arm64: Provide dcache_clean_poc_nosync helper
  ...
2026-04-17 11:12:42 -07:00
Linus Torvalds
01f492e181 Arm:
- Add support for tracing in the standalone EL2 hypervisor code, which
   should help both debugging and performance analysis.  This uses the
   new infrastructure for 'remote' trace buffers that can be exposed
   by non-kernel entities such as firmware, and which came through the
   tracing tree.
 
 - Add support for GICv5 Per Processor Interrupts (PPIs), as the starting
   point for supporting the new GIC architecture in KVM.
 
 - Finally add support for pKVM protected guests, where pages are unmapped
   from the host as they are faulted into the guest and can be shared back
   from the guest using pKVM hypercalls.  Protected guests are created
   using a new machine type identifier.  As the elusive guestmem has not
   yet delivered on its promises, anonymous memory is also supported.
 
   This is only a first step towards full isolation from the host; for
   example, the CPU register state and DMA accesses are not yet isolated.
   Because this does not really yet bring fully what it promises, it is
   hidden behind CONFIG_ARM_PKVM_GUEST + 'kvm-arm.mode=protected', and
   also triggers TAINT_USER when a VM is created.  Caveat emptor.
 
 - Rework the dreaded user_mem_abort() function to make it more
   maintainable, reducing the amount of state being exposed to the
   various helpers and rendering a substantial amount of state immutable.
 
 - Expand the Stage-2 page table dumper to support NV shadow page tables
   on a per-VM basis.
 
 - Tidy up the pKVM PSCI proxy code to be slightly less hard to follow.
 
 - Fix both SPE and TRBE in non-VHE configurations so that they do not
   generate spurious, out of context table walks that ultimately lead
   to very bad HW lockups.
 
 - A small set of patches fixing the Stage-2 MMU freeing in error cases.
 
 - Tighten-up accepted SMC immediate value to be only #0 for host
   SMCCC calls.
 
 - The usual cleanups and other selftest churn.
 
 LoongArch:
 
 - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel().
 
 - Add DMSINTC irqchip in kernel support.
 
 RISC-V:
 
 - Fix steal time shared memory alignment checks
 
 - Fix vector context allocation leak
 
 - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()
 
 - Fix double-free of sdata in kvm_pmu_clear_snapshot_area()
 
 - Fix integer overflow in kvm_pmu_validate_counter_mask()
 
 - Fix shift-out-of-bounds in make_xfence_request()
 
 - Fix lost write protection on huge pages during dirty logging
 
 - Split huge pages during fault handling for dirty logging
 
 - Skip CSR restore if VCPU is reloaded on the same core
 
 - Implement kvm_arch_has_default_irqchip() for KVM selftests
 
 - Factored-out ISA checks into separate sources
 
 - Added hideleg to struct kvm_vcpu_config
 
 - Factored-out VCPU config into separate sources
 
 - Support configuration of per-VM HGATP mode from KVM user space
 
 s390:
 
 - Support for ESA (31-bit) guests inside nested hypervisors.
 
 - Remove restriction on memslot alignment, which is not needed anymore with
   the new gmap code.
 
 - Fix LPSW/E to update the bear (which of course is the breaking event
   address register).
 
 x86:
 
 - Shut up various UBSAN warnings on reading module parameter before they
   were initialized.
 
 - Don't zero-allocate page tables that are used for splitting hugepages in
   the TDP MMU, as KVM is guaranteed to set all SPTEs in the page table and
   thus write all bytes.
 
 - As an optimization, bail early when trying to unsync 4KiB mappings if the
   target gfn can just be mapped with a 2MiB hugepage.
 
 x86 generic:
 
 - Copy single-chunk MMIO write values into struct kvm_vcpu (more precisely
   struct kvm_mmio_fragment) to fix use-after-free stack bugs where KVM
   would dereference stack pointer after an exit to userspace.
 
 - Clean up and comment the emulated MMIO code to try to make it easier to
   maintain (not necessarily "easy", but "easier").
 
 - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not *all* of VMX
   and SVM enabling) as it is needed for trusted I/O.
 
 - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions
 
 - Immediately fail the build if a required #define is missing in one of
   KVM's headers that is included multiple times.
 
 - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected
   exception, mostly to prevent syzkaller from abusing the uAPI to
   trigger WARNs, but also because it can help prevent userspace from
   unintentionally crashing the VM.
 
 - Exempt SMM from CPUID faulting on Intel, as per the spec.
 
 - Misc hardening and cleanup changes.
 
 x86 (AMD):
 
 - Fix and optimize IRQ window inhibit handling for AVIC; make it per-vCPU
   so that KVM doesn't prematurely re-enable AVIC if multiple
   vCPUs have to-be-injected IRQs.
 
 - Clean up and optimize the OSVW handling, avoiding a bug in which KVM would
   overwrite state when enabling virtualization on multiple CPUs in parallel.
   This should not be a problem because OSVW should usually be the same for
   all CPUs.
 
 - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a
   "too large" size based purely on user input.
 
 - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION.
 
 - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as
   doing so for an SNP guest will crash the host due to an RMP violation
   page fault.
 
 - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries
   are required to hold kvm->lock, and enforce it by lockdep.  Fix various
   bugs where sev_guest() was not ensured to be stable for the whole
   duration of a function or ioctl.
 
 - Convert a pile of kvm->lock SEV code to guard().
 
 - Play nicer with userspace that does not enable KVM_CAP_EXCEPTION_PAYLOAD,
   for which KVM needs to set CR2 and DR6 as a response to ioctls such as
   KVM_GET_VCPU_EVENTS (even if the payload would end up in EXITINFO2
   rather than CR2, for example).  Only set CR2 and DR6 when consumption of
   the payload is imminent, but on the other hand force delivery of the
   payload in all paths where userspace retrieves CR2 or DR6.
 
 - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT instead
   of vmcb02->save.cr2.  The value is out of sync after a save/restore
   or after a #PF is injected into L2.
 
 - Fix a class of nSVM bugs where some fields written by the CPU are not
   synchronized from vmcb02 to cached vmcb12 after VMRUN, and so are not
   up-to-date when saved by KVM_GET_NESTED_STATE.
 
 - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE and
   KVM_SET_{S}REGS could cause vmcb02 to be incorrectly initialized after
   save+restore.
 
 - Add a variety of missing nSVM consistency checks.
 
 - Fix several bugs where KVM failed to correctly update VMCB fields on
   nested #VMEXIT.
 
 - Fix several bugs where KVM failed to correctly synthesize #UD or #GP for
   SVM-related instructions.
 
 - Add support for save+restore of virtualized LBRs (on SVM).
 
 - Refactor various helpers and macros to improve clarity and (hopefully)
   make the code easier to maintain.
 
 - Aggressively sanitize fields when copying from vmcb12, to guard against
   unintentionally allowing L1 to utilize yet-to-be-defined features.
 
 - Fix several bugs where KVM botched rAX legality checks when emulating SVM
   instructions.  There are remaining issues in that KVM doesn't handle size
   prefix overrides for 64-bit guests.
 
 - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails instead of
   somewhat arbitrarily synthesizing #GP (i.e. don't double down on AMD's
   architectural but sketchy behavior of generating #GP for "unsupported"
   addresses).
 
 - Cache all used vmcb12 fields to further harden against TOCTOU bugs.
 
 x86 (Intel):
 
 - Drop obsolete branch hint prefixes from the VMX instruction macros.
 
 - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a
   register input when appropriate.
 
 - Code cleanups.
 
 guest_memfd:
 
 - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't support
   reclaim, the memory is unevictable, and there is no storage to write
   back to.
 
 LoongArch selftests:
 
 - Add KVM PMU test cases
 
 s390 selftests:
 
 - Enable more memory selftests.
 
 x86 selftests:
 
 - Add support for Hygon CPUs in KVM selftests.
 
 - Fix a bug in the MSR test where it would get false failures on AMD/Hygon
   CPUs with exactly one of RDPID or RDTSCP.
 
 - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test for a
   bug where the kernel would attempt to collapse guest_memfd folios against
   KVM's will.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmnftRQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPAzwf+NKO4Ktv+7A22ImN0SBl0nlUuulsz
 vTcw3+hxdRoIw83GdNS+hG5js0wrpMDnbv3t4+VliDNBSSxrBzcSWX2wpilW0Xtw
 qGo1MWhs2lKPy1NlaRVOwPS6j7uF3AR0TQ1iQLGMedQuCU9WpiKJxyhNXJdbLrt3
 8EgFzsvtEsv+jKNRUNDf9+d0j4gZsFyIe+Brhianbw+u3/UCiUClLCdsKPc4+5ZX
 08otYXytacGNIf/5Ev1vT4pHkHL0yqKXAtX7LEtaS3+0KrPuLjV4slemivzE9vf5
 Evafm5AhA4wpaNMb1ZerhY3T94lsMaJpWxotjR//0Q7C9B59pCQnXCm8mg==
 =CcE0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Arm:

   - Add support for tracing in the standalone EL2 hypervisor code,
     which should help both debugging and performance analysis. This
     uses the new infrastructure for 'remote' trace buffers that can be
     exposed by non-kernel entities such as firmware, and which came
     through the tracing tree

   - Add support for GICv5 Per Processor Interrupts (PPIs), as the
     starting point for supporting the new GIC architecture in KVM

   - Finally add support for pKVM protected guests, where pages are
     unmapped from the host as they are faulted into the guest and can
     be shared back from the guest using pKVM hypercalls. Protected
     guests are created using a new machine type identifier. As the
     elusive guestmem has not yet delivered on its promises, anonymous
     memory is also supported

     This is only a first step towards full isolation from the host; for
     example, the CPU register state and DMA accesses are not yet
     isolated. Because this does not really yet bring fully what it
     promises, it is hidden behind CONFIG_ARM_PKVM_GUEST +
     'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is
     created. Caveat emptor

   - Rework the dreaded user_mem_abort() function to make it more
     maintainable, reducing the amount of state being exposed to the
     various helpers and rendering a substantial amount of state
     immutable

   - Expand the Stage-2 page table dumper to support NV shadow page
     tables on a per-VM basis

   - Tidy up the pKVM PSCI proxy code to be slightly less hard to
     follow

   - Fix both SPE and TRBE in non-VHE configurations so that they do not
     generate spurious, out of context table walks that ultimately lead
     to very bad HW lockups

   - A small set of patches fixing the Stage-2 MMU freeing in error
     cases

   - Tighten-up accepted SMC immediate value to be only #0 for host
     SMCCC calls

   - The usual cleanups and other selftest churn

  LoongArch:

   - Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel()

   - Add DMSINTC irqchip in kernel support

  RISC-V:

   - Fix steal time shared memory alignment checks

   - Fix vector context allocation leak

   - Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()

   - Fix double-free of sdata in kvm_pmu_clear_snapshot_area()

   - Fix integer overflow in kvm_pmu_validate_counter_mask()

   - Fix shift-out-of-bounds in make_xfence_request()

   - Fix lost write protection on huge pages during dirty logging

   - Split huge pages during fault handling for dirty logging

   - Skip CSR restore if VCPU is reloaded on the same core

   - Implement kvm_arch_has_default_irqchip() for KVM selftests

   - Factored-out ISA checks into separate sources

   - Added hideleg to struct kvm_vcpu_config

   - Factored-out VCPU config into separate sources

   - Support configuration of per-VM HGATP mode from KVM user space

  s390:

   - Support for ESA (31-bit) guests inside nested hypervisors

   - Remove restriction on memslot alignment, which is not needed
     anymore with the new gmap code

   - Fix LPSW/E to update the bear (which of course is the breaking
     event address register)

  x86:

   - Shut up various UBSAN warnings on reading module parameter before
     they were initialized

   - Don't zero-allocate page tables that are used for splitting
     hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in
     the page table and thus write all bytes

   - As an optimization, bail early when trying to unsync 4KiB mappings
     if the target gfn can just be mapped with a 2MiB hugepage

  x86 generic:

   - Copy single-chunk MMIO write values into struct kvm_vcpu (more
     precisely struct kvm_mmio_fragment) to fix use-after-free stack
     bugs where KVM would dereference stack pointer after an exit to
     userspace

   - Clean up and comment the emulated MMIO code to try to make it
     easier to maintain (not necessarily "easy", but "easier")

   - Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not *all* of
     VMX and SVM enabling) as it is needed for trusted I/O

   - Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions

   - Immediately fail the build if a required #define is missing in one
     of KVM's headers that is included multiple times

   - Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected
     exception, mostly to prevent syzkaller from abusing the uAPI to
     trigger WARNs, but also because it can help prevent userspace from
     unintentionally crashing the VM

   - Exempt SMM from CPUID faulting on Intel, as per the spec

   - Misc hardening and cleanup changes

  x86 (AMD):

   - Fix and optimize IRQ window inhibit handling for AVIC; make it
     per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple
     vCPUs have to-be-injected IRQs

   - Clean up and optimize the OSVW handling, avoiding a bug in which
     KVM would overwrite state when enabling virtualization on multiple
     CPUs in parallel. This should not be a problem because OSVW should
     usually be the same for all CPUs

   - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains
     about a "too large" size based purely on user input

   - Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION

   - Disallow synchronizing a VMSA of an already-launched/encrypted
     vCPU, as doing so for an SNP guest will crash the host due to an
     RMP violation page fault

   - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped
     queries are required to hold kvm->lock, and enforce it by lockdep.
     Fix various bugs where sev_guest() was not ensured to be stable for
     the whole duration of a function or ioctl

   - Convert a pile of kvm->lock SEV code to guard()

   - Play nicer with userspace that does not enable
     KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6
     as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the
     payload would end up in EXITINFO2 rather than CR2, for example).
     Only set CR2 and DR6 when consumption of the payload is imminent,
     but on the other hand force delivery of the payload in all paths
     where userspace retrieves CR2 or DR6

   - Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT
     instead of vmcb02->save.cr2. The value is out of sync after a
     save/restore or after a #PF is injected into L2

   - Fix a class of nSVM bugs where some fields written by the CPU are
     not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so
     are not up-to-date when saved by KVM_GET_NESTED_STATE

   - Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE
     and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly
     initialized after save+restore

   - Add a variety of missing nSVM consistency checks

   - Fix several bugs where KVM failed to correctly update VMCB fields
     on nested #VMEXIT

   - Fix several bugs where KVM failed to correctly synthesize #UD or
     #GP for SVM-related instructions

   - Add support for save+restore of virtualized LBRs (on SVM)

   - Refactor various helpers and macros to improve clarity and
     (hopefully) make the code easier to maintain

   - Aggressively sanitize fields when copying from vmcb12, to guard
     against unintentionally allowing L1 to utilize yet-to-be-defined
     features

   - Fix several bugs where KVM botched rAX legality checks when
     emulating SVM instructions. There are remaining issues in that KVM
     doesn't handle size prefix overrides for 64-bit guests

   - Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails
     instead of somewhat arbitrarily synthesizing #GP (i.e. don't double
     down on AMD's architectural but sketchy behavior of generating #GP
     for "unsupported" addresses)

   - Cache all used vmcb12 fields to further harden against TOCTOU bugs

  x86 (Intel):

   - Drop obsolete branch hint prefixes from the VMX instruction macros

   - Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a
     register input when appropriate

   - Code cleanups

  guest_memfd:

   - Don't mark guest_memfd folios as accessed, as guest_memfd doesn't
     support reclaim, the memory is unevictable, and there is no storage
     to write back to

  LoongArch selftests:

   - Add KVM PMU test cases

  s390 selftests:

   - Enable more memory selftests

  x86 selftests:

   - Add support for Hygon CPUs in KVM selftests

   - Fix a bug in the MSR test where it would get false failures on
     AMD/Hygon CPUs with exactly one of RDPID or RDTSCP

   - Add an MADV_COLLAPSE testcase for guest_memfd as a regression test
     for a bug where the kernel would attempt to collapse guest_memfd
     folios against KVM's will"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits)
  KVM: x86: use inlines instead of macros for is_sev_*guest
  x86/virt: Treat SVM as unsupported when running as an SEV+ guest
  KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails
  KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper
  KVM: SEV: use mutex guard in snp_handle_guest_req()
  KVM: SEV: use mutex guard in sev_mem_enc_unregister_region()
  KVM: SEV: use mutex guard in sev_mem_enc_ioctl()
  KVM: SEV: use mutex guard in snp_launch_update()
  KVM: SEV: Assert that kvm->lock is held when querying SEV+ support
  KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe"
  KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y
  KVM: SEV: WARN on unhandled VM type when initializing VM
  KVM: LoongArch: selftests: Add PMU overflow interrupt test
  KVM: LoongArch: selftests: Add basic PMU event counting test
  KVM: LoongArch: selftests: Add cpucfg read/write helpers
  LoongArch: KVM: Add DMSINTC inject msi to vCPU
  LoongArch: KVM: Add DMSINTC device support
  LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function
  LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch
  LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch
  ...
2026-04-17 07:18:03 -07:00
Linus Torvalds
440d6635b2 mm.git review status for linus..mm-nonmm-stable
Total patches:       126
 Reviews/patch:       0.92
 Reviewed rate:       76%
 
 - The 2 patch series "pid: make sub-init creation retryable" from Oleg
   Nesterov increases the robustness of our creation of init in a new
   namespace.  By clearing away some historical cruft which is no longer
   needed.  Also some documentation fixups are provided.
 
 - The 2 patch series "selftests/fchmodat2: Error handling and general"
   from Mark Brown has a fixup and a cleanup for the fchmodat2() syscall
   selftest.
 
 - The 3 patch series "lib: polynomial: Move to math/ and clean up" from
   Andy Shevchenko does as advertised.
 
 - The 3 patch series "hung_task: Provide runtime reset interface for
   hung task detector" from Aaron Tomlin gives administrators the ability
   to zero out /proc/sys/kernel/hung_task_detect_count.
 
 - The 2 patch series "tools/getdelays: use the static UAPI headers from
   tools/include/uapi" from Thomas Weißschuh teaches getdelays to use the
   in-kernel UAPI headers rather than the system-provided ones.
 
 - The 5 patch series "watchdog/hardlockup: Improvements to hardlockup"
   from Mayank Rungta provides several cleanups and fixups to the
   hardlockup detector code and its documentation.
 
 - The 2 patch series "lib/bch: fix undefined behavior from signed
   left-shifts" from Josh Law provides a couple of small/theoretical fixes
   in the bch code.
 
 - The 2 patch series "ocfs2/dlm: fix two bugs in dlm_match_regions()"
   from Junrui Luo does what is claims.
 
 - The 27 patch series "cleanup the RAID5 XOR library" from Christoph
   Hellwig is a quite far-reaching cleanup to this code.  I can't do better
   than to quote Christoph:
 
     The XOR library used for the RAID5 parity is a bit of a mess right
     now.  The main file sits in crypto/ despite not being cryptography and
     not using the crypto API, with the generic implementations sitting in
     include/asm-generic and the arch implementations sitting in an asm/
     header in theory.  The latter doesn't work for many cases, so
     architectures often build the code directly into the core kernel, or
     create another module for the architecture code.
 
     Change this to a single module in lib/ that also contains the
     architecture optimizations, similar to the library work Eric Biggers
     has done for the CRC and crypto libraries later.  After that it
     changes to better calling conventions that allow for smarter
     architecture implementations (although none is contained here yet),
     and uses static_call to avoid indirection function call overhead.
 
 - The 2 patch series "lib/list_sort: Clean up list_sort() scheduling
   workarounds" from Kuan-Wei Chiu cleans up this library code by removing
   a hacky thing which was added for UBIFS, which UBIFS doesn't actually
   need.
 
 - The 5 patch series "Fix bugs in extract_iter_to_sg()" from Christian
   Ehrhardt fixes a few bugs in the scatterlist code, adds in-kernel tests
   for the now-fixed bugs and fixes a leak in the test itself.
 
 - The 3 patch series "kdump: Enable LUKS-encrypted dump target support
   in ARM64 and PowerPC" from Coiby Xu eenables support of the
   LUKS-encrypted device dump target on arm64 and powerpc.
 
 - The 4 patch series "ocfs2: consolidate extent list validation into
   block read callbacks" from Joseph Qi addresses ocfs2's validation of
   extent list fields - cleanup, simplification, robustness.  (Kernel test
   robot loves mounting corrupted fs images!)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCad90rQAKCRDdBJ7gKXxA
 jl7rAQD4/Rq7ZSSnEv6FS4gOwc3MgTdWcZZaXkqL1KiWyYhRwAEA+cVCO344+AKb
 znBOjet/hUr+/kBwyViifiC8LHzchwM=
 =Nfnf
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "pid: make sub-init creation retryable" (Oleg Nesterov)

   Make creation of init in a new namespace more robust by clearing away
   some historical cruft which is no longer needed. Also some
   documentation fixups

 - "selftests/fchmodat2: Error handling and general" (Mark Brown)

   Fix and a cleanup for the fchmodat2() syscall selftest

 - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko)

 - "hung_task: Provide runtime reset interface for hung task detector"
   (Aaron Tomlin)

   Give administrators the ability to zero out
   /proc/sys/kernel/hung_task_detect_count

 - "tools/getdelays: use the static UAPI headers from
   tools/include/uapi" (Thomas Weißschuh)

   Teach getdelays to use the in-kernel UAPI headers rather than the
   system-provided ones

 - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta)

   Several cleanups and fixups to the hardlockup detector code and its
   documentation

 - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law)

   A couple of small/theoretical fixes in the bch code

 - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo)

 - "cleanup the RAID5 XOR library" (Christoph Hellwig)

   A quite far-reaching cleanup to this code. I can't do better than to
   quote Christoph:

     "The XOR library used for the RAID5 parity is a bit of a mess right
      now. The main file sits in crypto/ despite not being cryptography
      and not using the crypto API, with the generic implementations
      sitting in include/asm-generic and the arch implementations
      sitting in an asm/ header in theory. The latter doesn't work for
      many cases, so architectures often build the code directly into
      the core kernel, or create another module for the architecture
      code.

      Change this to a single module in lib/ that also contains the
      architecture optimizations, similar to the library work Eric
      Biggers has done for the CRC and crypto libraries later. After
      that it changes to better calling conventions that allow for
      smarter architecture implementations (although none is contained
      here yet), and uses static_call to avoid indirection function call
      overhead"

 - "lib/list_sort: Clean up list_sort() scheduling workarounds"
   (Kuan-Wei Chiu)

   Clean up this library code by removing a hacky thing which was added
   for UBIFS, which UBIFS doesn't actually need

 - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt)

   Fix a few bugs in the scatterlist code, add in-kernel tests for the
   now-fixed bugs and fix a leak in the test itself

 - "kdump: Enable LUKS-encrypted dump target support in ARM64 and
   PowerPC" (Coiby Xu)

   Enable support of the LUKS-encrypted device dump target on arm64 and
   powerpc

 - "ocfs2: consolidate extent list validation into block read callbacks"
   (Joseph Qi)

   Cleanup, simplify, and make more robust ocfs2's validation of extent
   list fields (Kernel test robot loves mounting corrupted fs images!)

* tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits)
  ocfs2: validate group add input before caching
  ocfs2: validate bg_bits during freefrag scan
  ocfs2: fix listxattr handling when the buffer is full
  doc: watchdog: fix typos etc
  update Sean's email address
  ocfs2: use get_random_u32() where appropriate
  ocfs2: split transactions in dio completion to avoid credit exhaustion
  ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()
  ocfs2: validate extent block list fields during block read
  ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()
  ocfs2: validate dx_root extent list fields during block read
  ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
  ocfs2: handle invalid dinode in ocfs2_group_extend
  .get_maintainer.ignore: add Askar
  ocfs2: validate bg_list extent bounds in discontig groups
  checkpatch: exclude forward declarations of const structs
  tools/accounting: handle truncated taskstats netlink messages
  taskstats: set version in TGID exit notifications
  ocfs2/heartbeat: fix slot mapping rollback leaks on error paths
  arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
  ...
2026-04-16 20:11:56 -07:00
Linus Torvalds
c43267e679 arm64 updates for 7.1:
Core features:
 
  - Add support for FEAT_LSUI, allowing futex atomic operations without
    toggling Privileged Access Never (PAN)
 
  - Further refactor the arm64 exception handling code towards the
    generic entry infrastructure
 
  - Optimise __READ_ONCE() with CONFIG_LTO=y and allow alias analysis
    through it
 
 Memory management:
 
  - Refactor the arm64 TLB invalidation API and implementation for better
    control over barrier placement and level-hinted invalidation
 
  - Enable batched TLB flushes during memory hot-unplug
 
  - Fix rodata=full block mapping support for realm guests (when
    BBML2_NOABORT is available)
 
 Perf and PMU:
 
  - Add support for a whole bunch of system PMUs featured in NVIDIA's
    Tegra410 SoC (cspmu extensions for the fabric and PCIe, new drivers
    for CPU/C2C memory latency PMUs)
 
  - Clean up iomem resource handling in the Arm CMN driver
 
  - Fix signedness handling of AA64DFR0.{PMUVer,PerfMon}
 
 MPAM (Memory Partitioning And Monitoring):
 
  - Add architecture context-switch and hiding of the feature from KVM
 
  - Add interface to allow MPAM to be exposed to user-space using resctrl
 
  - Add errata workaround for some existing platforms
 
  - Add documentation for using MPAM and what shape of platforms can use
    resctrl
 
 Miscellaneous:
 
  - Check DAIF (and PMR, where relevant) at task-switch time
 
  - Skip TFSR_EL1 checks and barriers in synchronous MTE tag check mode
    (only relevant to asynchronous or asymmetric tag check modes)
 
  - Remove a duplicate allocation in the kexec code
 
  - Remove redundant save/restore of SCS SP on entry to/from EL0
 
  - Generate the KERNEL_HWCAP_ definitions from the arm64 hwcap
    descriptions
 
  - Add kselftest coverage for cmpbr_sigill()
 
  - Update sysreg definitions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmnc8DEACgkQa9axLQDI
 XvFauRAAhc1cIgoRpgtdZd7+3/g457teDPYA3L/CjJzI28aesIpV/ECrEw2GL4xs
 HrQfijF4oyCDbBwh0sAascO/H7RoyOranlbuc+fVJ6Bj6gP9STzR4GmscsWkAMSJ
 vA3Jd1DREdDBO2sjw+hGhht84nRlcfY1FyORJP+1JaFH4oWTWsRNeOZIiI3BhxR8
 EtFP9E8r2Esxi/FmZb/47m7kYCEH+XsrzQvBQNLVCH899QX2Hn0kAY70ndq2ZiQl
 n+zLAe7FBFwKzUVmlgWuhjrWMmK+1TthK/XQuOtxg13dHmX+vE/j+A+dOqRWSfHY
 ktNcWaf6m4+TWKVeVTe4E1cnSuwTQTm4VQKd9zaeQxiZYyYJhCQjXuEZg3vDmDbq
 F6D3MpTaJHRRWp0rEurxnSBlmQPCBE2IxEBdSrjd/WJ6T9e1oYwWiSJSS7bGCgGr
 dd/XLsOY7Um5n4ooIFEZc1de6VO6/VTKjmxnBMgU+Sa1REbLpD438IX/6CjzG5qM
 l5Ulke/c6/a/faeVCEpZpD8JuvNOzo9RISDPrNg1KKAL+OSU+9tgmVjIFPhDDB0w
 zNTqT7YJIhxlJxnUGWDk8YNsTjT3OzyquY9UT1tBTBqC0k13J2i2ev30toUez7xj
 2aV+9qMpunbLtwYhXNun1hBFiYrCxpX7I8ha0hXiXL0CywVOPTI=
 =CnVn
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The biggest changes are MPAM enablement in drivers/resctrl and new PMU
  support under drivers/perf.

  On the core side, FEAT_LSUI lets futex atomic operations with EL0
  permissions, avoiding PAN toggling.

  The rest is mostly TLB invalidation refactoring, further generic entry
  work, sysreg updates and a few fixes.

  Core features:

   - Add support for FEAT_LSUI, allowing futex atomic operations without
     toggling Privileged Access Never (PAN)

   - Further refactor the arm64 exception handling code towards the
     generic entry infrastructure

   - Optimise __READ_ONCE() with CONFIG_LTO=y and allow alias analysis
     through it

  Memory management:

   - Refactor the arm64 TLB invalidation API and implementation for
     better control over barrier placement and level-hinted invalidation

   - Enable batched TLB flushes during memory hot-unplug

   - Fix rodata=full block mapping support for realm guests (when
     BBML2_NOABORT is available)

  Perf and PMU:

   - Add support for a whole bunch of system PMUs featured in NVIDIA's
     Tegra410 SoC (cspmu extensions for the fabric and PCIe, new drivers
     for CPU/C2C memory latency PMUs)

   - Clean up iomem resource handling in the Arm CMN driver

   - Fix signedness handling of AA64DFR0.{PMUVer,PerfMon}

  MPAM (Memory Partitioning And Monitoring):

   - Add architecture context-switch and hiding of the feature from KVM

   - Add interface to allow MPAM to be exposed to user-space using
     resctrl

   - Add errata workaround for some existing platforms

   - Add documentation for using MPAM and what shape of platforms can
     use resctrl

  Miscellaneous:

   - Check DAIF (and PMR, where relevant) at task-switch time

   - Skip TFSR_EL1 checks and barriers in synchronous MTE tag check mode
     (only relevant to asynchronous or asymmetric tag check modes)

   - Remove a duplicate allocation in the kexec code

   - Remove redundant save/restore of SCS SP on entry to/from EL0

   - Generate the KERNEL_HWCAP_ definitions from the arm64 hwcap
     descriptions

   - Add kselftest coverage for cmpbr_sigill()

   - Update sysreg definitions"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (109 commits)
  arm64: rsi: use linear-map alias for realm config buffer
  arm64: Kconfig: fix duplicate word in CMDLINE help text
  arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
  arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12
  arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps
  arm64: kexec: Remove duplicate allocation for trans_pgd
  ACPI: AGDI: fix missing newline in error message
  arm64: Check DAIF (and PMR) at task-switch time
  arm64: entry: Use split preemption logic
  arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
  arm64: entry: Consistently prefix arm64-specific wrappers
  arm64: entry: Don't preempt with SError or Debug masked
  entry: Split preemption from irqentry_exit_to_kernel_mode()
  entry: Split kernel mode logic from irqentry_{enter,exit}()
  entry: Move irqentry_enter() prototype later
  entry: Remove local_irq_{enable,disable}_exit_to_user()
  ...
2026-04-14 16:48:56 -07:00
Linus Torvalds
5d0d362330 Kbuild/Kconfig updates for 7.1
Kbuild changes
 ==============
 
   * tools/build: Reject unexpected values for LLVM=
 
   * kbuild: uapi: remove usage of toolchain headers
 
   * kbuild: Switch from '-fms-extensions' to '-fms-anonymous-structs'
     when available (currently: clang >= 23.0.0)
 
   * kbuild: Reduce the number of compiler-generated suffixes for clang
     thin-lto build
 
   * kbuild: reduce output spam ("GEN Makefile") when building out of tree
 
   * check-uapi: improve portability for testing headers
 
   * uapi: also test UAPI headers against C++ compilers
 
   * kbuild: vdso_install: drop build ID architecture allow-list
 
   * checksyscalls: only run when necessary
 
   * Documentation: kbuild: Update the debug information notes in
     reproducible-builds.rst
 
   * kconfig: forbid multiple entries with the same symbol in a choice
 
   * kbuild: expand inlining hints with -fdiagnostics-show-inlining-chain
 
 Kconfig changes
 ===============
 
   * kconfig: Error out on duplicated kconfig inclusion
 
 Cc: Alexander Coffin <alex@cyberialabs.net>
 Cc: Ard Biesheuvel <ardb@kernel.org>
 Cc: Arnd Bergmann <arnd@arndb.de>
 Cc: Bill Wendling <morbo@google.com>
 Cc: David Howells <dhowells@redhat.com>
 Cc: Dodji Seketeli <dodji@seketeli.org>
 Cc: H. Peter Anvin <hpa@zytor.com>
 Cc: Helge Deller <deller@gmx.de>
 Cc: John Moon <john@jmoon.dev>
 Cc: Jonathan Corbet <corbet@lwn.net>
 Cc: Josh Poimboeuf <jpoimboe@kernel.org>
 Cc: Justin Stitt <justinstitt@google.com>
 Cc: Kees Cook <kees@kernel.org>
 Cc: Masahiro Yamada <masahiroy@kernel.org>
 Cc: Nathan Chancellor <nathan@kernel.org>
 Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
 Cc: Shuah Khan <skhan@linuxfoundation.org>
 Cc: Song Liu <song@kernel.org>
 Cc: Thomas Weißschuh <linux@weissschuh.net>
 Cc: Yonghong Song <yonghong.song@linux.dev>
 Cc: kernel-team@fb.com
 Cc: linux-arm-kernel@lists.infradead.org
 Cc: linux-efi@vger.kernel.org
 Cc: linux-hexagon@vger.kernel.org
 Cc: linux-kbuild@vger.kernel.org
 Cc: linux-kernel@vger.kernel.org
 Cc: linux-parisc@vger.kernel.org
 Cc: linux-s390@vger.kernel.org
 Cc: linuxppc-dev@lists.ozlabs.org
 Cc: llvm@lists.linux.dev
 Cc: loongarch@lists.linux.dev
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEh0E3p4c3JKeBvsLGB1IKcBYmEmkFAmnatXEACgkQB1IKcBYm
 Eml6ww/9Hja/CTBoF+ZgMXN/9VcQhzNonPXIp8IGarX3+LCPh8RfUEywaOLnvR/U
 fE6FEIcwDw0M5drS0hEH7t1Xowc6AhDX05lKBj3aGBgn6JqGGQFAfnysQd5z0cwW
 Y/8+bMm+Y2XQ/xZNa0J92+3evPO04U7+2kCSVD051ZhRdmK4n290u4YsTgoKs7Fm
 1SBIr+tsFa1zMOG6r+J4uCLxXNnujQ5XcejnlmdBM0o19f9kttvVkYKuBVdXPHf4
 JaTLti22Td8SklDKMmkSRg+Ul/Wh2x8D8tP98VQAJe5B3f4Uk6YAu1BMrbQaX5Rk
 5SsGbhBEeOTDc4qCaS8DS+FJQU6T9W9cf/9+tBY510fXxAIonz5cPB06q5xeJWCd
 IkVB3KpmaVxo2B54Cy4b/fvd1J3VMkmFjBQWMNwkq6cnCG1ZK/b6Jmvh9BQSNctl
 IYJxWKBjlddrMuvZEMI0CewVq4GmarTLiOpweghDg8OYqya4E6PfOUGnaWMrWT5c
 2E8ZMnQSb68yFUaXK+Sy+Pw2Nig/VvxCUxHdaarHi/RmGeoN5dMGfjj/gGZvZrHt
 NUGt6qe+X62P0ZAUR8p+GpRcU3+p3uLhCyO7dkwqgLVZTnaXy5XtUQ/uyh2G60hv
 eJlFfrn8QXplvzrxcSTJya6PunoIhuWh2BfKhf0RDymJTPyMbBc=
 =+wTC
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild/Kconfig updates from Nicolas Schier:
 "Kbuild:
   - reject unexpected values for LLVM=
   - uapi: remove usage of toolchain headers
   - switch from '-fms-extensions' to '-fms-anonymous-structs' when
     available (currently: clang >= 23.0.0)
   - reduce the number of compiler-generated suffixes for clang thin-lto
     build
   - reduce output spam ("GEN Makefile") when building out of tree
   - improve portability for testing headers
   - also test UAPI headers against C++ compilers
   - drop build ID architecture allow-list in vdso_install
   - only run checksyscalls when necessary
   - update the debug information notes in reproducible-builds.rst
   - expand inlining hints with -fdiagnostics-show-inlining-chain

  Kconfig:
   - forbid multiple entries with the same symbol in a choice
   - error out on duplicated kconfig inclusion"

* tag 'kbuild-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (35 commits)
  kbuild: expand inlining hints with -fdiagnostics-show-inlining-chain
  kconfig: forbid multiple entries with the same symbol in a choice
  Documentation: kbuild: Update the debug information notes in reproducible-builds.rst
  checksyscalls: move instance functionality into generic code
  checksyscalls: only run when necessary
  checksyscalls: fail on all intermediate errors
  checksyscalls: move path to reference table to a variable
  kbuild: vdso_install: drop build ID architecture allow-list
  kbuild: vdso_install: gracefully handle images without build ID
  kbuild: vdso_install: hide readelf warnings
  kbuild: vdso_install: split out the readelf invocation
  kbuild: uapi: also test UAPI headers against C++ compilers
  kbuild: uapi: provide a C++ compatible dummy definition of NULL
  kbuild: uapi: handle UML in architecture-specific exclusion lists
  kbuild: uapi: move all include path flags together
  kbuild: uapi: move some compiler arguments out of the command definition
  check-uapi: use dummy libc includes
  check-uapi: honor ${CROSS_COMPILE} setting
  check-uapi: link into shared objects
  kbuild: reduce output spam when building out of tree
  ...
2026-04-14 09:18:40 -07:00
Linus Torvalds
2e31b16101 ACPI support updates for 7.1-rc1
- Update maintainers information regarding ACPICA (Rafael Wysocki)
 
  - Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy() (Kees
    Cook)
 
  - Trigger an ordered system power off after encountering a fatal error
    operator in AML (Armin Wolf)
 
  - Enable ACPI FPDT parsing on LoongArch (Xi Ruoyao)
 
  - Remove the temporary stop-gap acpi_pptt_cache_v1_full structure from
    the ACPI PPTT parser (Ben Horgan)
 
  - Add support for exposing ACPI FPDT subtables FBPT and S3PT (Nate
    DeSimone)
 
  - Address multiple assorted issues and clean up the code in the ACPI
    processor idle driver (Huisong Li)
 
  - Replace strlcat() in the ACPI processor idle drive with a better
    alternative (Andy Shevchenko)
 
  - Rearrange and clean up acpi_processor_errata_piix4() (Rafael Wysocki)
 
  - Move reference performance to capabilities and fix an uninitialized
    variable in the ACPI CPPC library (Pengjie Zhang)
 
  - Add support for the Performance Limited Register to the ACPI CPPC
    library (Sumit Gupta)
 
  - Add cppc_get_perf() API to read performance controls, extend
    cppc_set_epp_perf() for FFH/SystemMemory, and make the ACPI CPPC
    library warn on missing mandatory DESIRED_PERF register (Sumit Gupta)
 
  - Modify the cpufreq CPPC driver to update MIN_PERF/MAX_PERF in target
    callbacks to allow it to control performance bounds via standard
    scaling_min_freq and scaling_max_freq sysfs attributes and add sysfs
    documentation for the Performance Limited Register to it (Sumit Gupta)
 
  - Add ACPI support to the platform device interface in the CMOS RTC
    driver, make the ACPI core device enumeration code create a platform
    device for the CMOS RTC, and drop CMOS RTC PNP device support (Rafael
    Wysocki)
 
  - Consolidate the x86-specific CMOS RTC handling with the ACPI TAD
    driver and clean up the CMOS RTC ACPI address space handler (Rafael
    Wysocki)
 
  - Enable ACPI alarm in the CMOS RTC driver if advertised in ACPI FADT
    and allow that driver to work without a dedicated IRQ if the ACPI
    alarm is used (Rafael Wysocki)
 
  - Clean up the ACPI TAD driver in various ways and add an RTC class
    device interface, including both the RTC setting/reading and alarm
    timer support, to it (Rafael Wysocki)
 
  - Clean up the ACPI AC and ACPI PAD (processor aggregator device)
    drivers (Rafael Wysocki)
 
  - Rework checking for duplicate video bus devices and consolidate
    pnp.bus_id workarounds handling in the ACPI video bus driver (Rafael
    Wysocki)
 
  - Update the ACPI core device drivers to stop setting acpi_device_name()
    unnecessarily (Rafael Wysocki)
 
  - Rearrange code using acpi_device_class() in the ACPI core device
    drivers and update them to stop setting acpi_device_class()
    unnecessarily (Rafael Wysocki)
 
  - Define ACPI_AC_CLASS in one place (Rafael Wysocki)
 
  - Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver to
    bind to platform devices instead of ACPI devices (Rafael Wysocki)
 
  - Add devm_ghes_register_vendor_record_notifier(), use it in the PCI
    hisi driver, and Add NVIDIA vendor CPER record handler (Kai-Heng
    Feng)
 
  - Consolidate the interface for obtaining a CPU UID from ACPI across
    architectures and use it to address incorrect PCI TPH Steering Tag
    on ARM64 resulting from the invalid assumption that the ACPI
    Processor UID would always be the same as the corresponding logical
    CPU ID in Linux (Chengwen Feng)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmnY/bcSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1chAH/1cGRzh9lSgQ3ZdzIIA5rpRtwKC+CTNz
 iNDvQ97W73B2N+WYzMaloOh+ZVA1Vdqc+8921aH6HI+v7wtg/ZV3h/hU7TagHNY/
 bRFDYaeRXVj4aBXNfoVdn7G5UU9j/kIDcV25I2ubOBqZaO6T5p8p1BK0j0vEj+sG
 yR7XwpEhr2OUQwlIFGKskJwFaH57QJXPEY8wf+o+lMEx/7o/JQRJzKFwsYu01ZZV
 kQy9Ee08P/rsNJwU2ibmZu5P3JMnhategAT8VAMBvkfLScv2sKX+1Vz19NGXzm71
 ARaT7y8MSPNb7SAvWmNZ/rVYrYIL+D3a76Gd7MOGrbVWEn6oXIbCIhY=
 =6vEK
 -----END PGP SIGNATURE-----

Merge tag 'acpi-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI support updates from Rafael Wysocki:
 "These include an update of the CMOS RTC driver and the related ACPI
  and x86 code that, among other things, switches it over to using the
  platform device interface for device binding on x86 instead of the PNP
  device driver interface (which allows the code in question to be
  simplified quite a bit), a major update of the ACPI Time and Alarm
  Device (TAD) driver adding an RTC class device interface to it, and
  updates of core ACPI drivers that remove some unnecessary and not
  really useful code from them.

  Apart from that, two drivers are converted to using the platform
  driver interface for device binding instead of the ACPI driver one,
  which is slated for removal, support for the Performance Limited
  register is added to the ACPI CPPC library and there are some
  janitorial updates of it and the related cpufreq CPPC driver, the ACPI
  processor driver is fixed and cleaned up, and NVIDIA vendor CPER
  record handler is added to the APEI GHES code.

  Also, the interface for obtaining a CPU UID from ACPI is consolidated
  across architectures and used for fixing a problem with the PCI TPH
  Steering Tag on ARM64, there are two updates related to ACPICA, a
  minor ACPI OS Services Layer (OSL) update, and a few assorted updates
  related to ACPI tables parsing.

  Specifics:

   - Update maintainers information regarding ACPICA (Rafael Wysocki)

   - Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy()
     (Kees Cook)

   - Trigger an ordered system power off after encountering a fatal
     error operator in AML (Armin Wolf)

   - Enable ACPI FPDT parsing on LoongArch (Xi Ruoyao)

   - Remove the temporary stop-gap acpi_pptt_cache_v1_full structure
     from the ACPI PPTT parser (Ben Horgan)

   - Add support for exposing ACPI FPDT subtables FBPT and S3PT (Nate
     DeSimone)

   - Address multiple assorted issues and clean up the code in the ACPI
     processor idle driver (Huisong Li)

   - Replace strlcat() in the ACPI processor idle drive with a better
     alternative (Andy Shevchenko)

   - Rearrange and clean up acpi_processor_errata_piix4() (Rafael
     Wysocki)

   - Move reference performance to capabilities and fix an uninitialized
     variable in the ACPI CPPC library (Pengjie Zhang)

   - Add support for the Performance Limited Register to the ACPI CPPC
     library (Sumit Gupta)

   - Add cppc_get_perf() API to read performance controls, extend
     cppc_set_epp_perf() for FFH/SystemMemory, and make the ACPI CPPC
     library warn on missing mandatory DESIRED_PERF register (Sumit
     Gupta)

   - Modify the cpufreq CPPC driver to update MIN_PERF/MAX_PERF in
     target callbacks to allow it to control performance bounds via
     standard scaling_min_freq and scaling_max_freq sysfs attributes and
     add sysfs documentation for the Performance Limited Register to it
     (Sumit Gupta)

   - Add ACPI support to the platform device interface in the CMOS RTC
     driver, make the ACPI core device enumeration code create a
     platform device for the CMOS RTC, and drop CMOS RTC PNP device
     support (Rafael Wysocki)

   - Consolidate the x86-specific CMOS RTC handling with the ACPI TAD
     driver and clean up the CMOS RTC ACPI address space handler (Rafael
     Wysocki)

   - Enable ACPI alarm in the CMOS RTC driver if advertised in ACPI FADT
     and allow that driver to work without a dedicated IRQ if the ACPI
     alarm is used (Rafael Wysocki)

   - Clean up the ACPI TAD driver in various ways and add an RTC class
     device interface, including both the RTC setting/reading and alarm
     timer support, to it (Rafael Wysocki)

   - Clean up the ACPI AC and ACPI PAD (processor aggregator device)
     drivers (Rafael Wysocki)

   - Rework checking for duplicate video bus devices and consolidate
     pnp.bus_id workarounds handling in the ACPI video bus driver
     (Rafael Wysocki)

   - Update the ACPI core device drivers to stop setting
     acpi_device_name() unnecessarily (Rafael Wysocki)

   - Rearrange code using acpi_device_class() in the ACPI core device
     drivers and update them to stop setting acpi_device_class()
     unnecessarily (Rafael Wysocki)

   - Define ACPI_AC_CLASS in one place (Rafael Wysocki)

   - Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver
     to bind to platform devices instead of ACPI devices (Rafael
     Wysocki)

   - Add devm_ghes_register_vendor_record_notifier(), use it in the PCI
     hisi driver, and Add NVIDIA vendor CPER record handler (Kai-Heng
     Feng)

   - Consolidate the interface for obtaining a CPU UID from ACPI across
     architectures and use it to address incorrect PCI TPH Steering Tag
     on ARM64 resulting from the invalid assumption that the ACPI
     Processor UID would always be the same as the corresponding logical
     CPU ID in Linux (Chengwen Feng)"

* tag 'acpi-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (73 commits)
  ACPICA: Update maintainers information
  watchdog: ni903x_wdt: Convert to a platform driver
  ACPI: PAD: xen: Convert to a platform driver
  ACPI: processor: idle: Reset cpuidle on C-state list changes
  cpuidle: Extract and export no-lock variants of cpuidle_unregister_device()
  PCI/TPH: Pass ACPI Processor UID to Cache Locality _DSM
  ACPI: PPTT: Use acpi_get_cpu_uid() and remove get_acpi_id_for_cpu()
  perf: arm_cspmu: Switch to acpi_get_cpu_uid() from get_acpi_id_for_cpu()
  ACPI: Centralize acpi_get_cpu_uid() declaration in include/linux/acpi.h
  x86/acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
  RISC-V: ACPI: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
  LoongArch: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
  arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
  ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
  PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
  ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
  ACPI: tables: Enable FPDT on LoongArch
  ACPI: processor: idle: Fix NULL pointer dereference in hotplug path
  ACPI: processor: idle: Reset power_setup_done flag on initialization failure
  ACPI: TAD: Add alarm support to the RTC class device interface
  ...
2026-04-13 19:25:07 -07:00
Linus Torvalds
d568788baa hardening updates for v7.1-rc1
- randomize_kstack: Improve implementation across arches (Ryan Roberts)
 
 - lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
 
 - refcount: Remove unused __signed_wrap function annotations
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCad16PwAKCRA2KwveOeQk
 u7crAP4qz8gXCjes76KsZm/YQS8PtOG5JroAVu5Oa4ohw0RfaQD+K/XLow1plcNF
 4Bi8zSuv2ifcLysh9qEAbx5+wcHijgo=
 =woB3
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:

 - randomize_kstack: Improve implementation across arches (Ryan Roberts)

 - lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test

 - refcount: Remove unused __signed_wrap function annotations

* tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
  refcount: Remove unused __signed_wrap function annotations
  randomize_kstack: Unify random source across arches
  randomize_kstack: Maintain kstack_offset per task
2026-04-13 17:52:29 -07:00
Linus Torvalds
ef3da345cc vfs-7.1-rc1.misc
Please consider pulling these changes from the signed vfs-7.1-rc1.misc tag.
 
 Thanks!
 Christian
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCadjZCwAKCRCRxhvAZXjc
 ohhBAQCAmQMlMRAXAgUZFYMTZpeQlcujP5rv+/vT2Tf/xS76YwD/dRDaw1FH294+
 qtk/Z1NjleNixzE2sld1K9J32NxeyAc=
 =+g9q
 -----END PGP SIGNATURE-----

Merge tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:
   - coredump: add tracepoint for coredump events
   - fs: hide file and bfile caches behind runtime const machinery

  Fixes:
   - fix architecture-specific compat_ftruncate64 implementations
   - dcache: Limit the minimal number of bucket to two
   - fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START
   - fs/mbcache: cancel shrink work before destroying the cache
   - dcache: permit dynamic_dname()s up to NAME_MAX

  Cleanups:
   - remove or unexport unused fs_context infrastructure
   - trivial ->setattr cleanups
   - selftests/filesystems: Assume that TIOCGPTPEER is defined
   - writeback: fix kernel-doc function name mismatch for wb_put_many()
   - autofs: replace manual symlink buffer allocation in autofs_dir_symlink
   - init/initramfs.c: trivial fix: FSM -> Finite-state machine
   - fs: remove stale and duplicate forward declarations
   - readdir: Introduce dirent_size()
   - fs: Replace user_access_{begin/end} by scoped user access
   - kernel: acct: fix duplicate word in comment
   - fs: write a better comment in step_into() concerning .mnt assignment
   - fs: attr: fix comment formatting and spelling issues"

* tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  dcache: permit dynamic_dname()s up to NAME_MAX
  fs: attr: fix comment formatting and spelling issues
  fs: hide file and bfile caches behind runtime const machinery
  fs: write a better comment in step_into() concerning .mnt assignment
  proc: rename proc_notify_change to proc_setattr
  proc: rename proc_setattr to proc_nochmod_setattr
  affs: rename affs_notify_change to affs_setattr
  adfs: rename adfs_notify_change to adfs_setattr
  hfs: update comments on hfs_inode_setattr
  kernel: acct: fix duplicate word in comment
  fs: Replace user_access_{begin/end} by scoped user access
  readdir: Introduce dirent_size()
  coredump: add tracepoint for coredump events
  fs: remove do_sys_truncate
  fs: pass on FTRUNCATE_* flags to do_truncate
  fs: fix archiecture-specific compat_ftruncate64
  fs: remove stale and duplicate forward declarations
  init/initramfs.c: trivial fix: FSM -> Finite-state machine
  autofs: replace manual symlink buffer allocation in autofs_dir_symlink
  fs/mbcache: cancel shrink work before destroying the cache
  ...
2026-04-13 14:20:11 -07:00
Paolo Bonzini
e74c3a8891 KVM/arm64 updates for 7.1
* New features:
 
 - Add support for tracing in the standalone EL2 hypervisor code,
   which should help both debugging and performance analysis.
   This comes with a full infrastructure for 'remote' trace buffers
   that can be exposed by non-kernel entities such as firmware.
 
 - Add support for GICv5 Per Processor Interrupts (PPIs), as the
   starting point for supporting the new GIC architecture in KVM.
 
 - Finally add support for pKVM protected guests, with anonymous
   memory being used as a backing store. About time!
 
 * Improvements and bug fixes:
 
 - Rework the dreaded user_mem_abort() function to make it more
   maintainable, reducing the amount of state being exposed to
   the various helpers and rendering a substantial amount of
   state immutable.
 
 - Expand the Stage-2 page table dumper to support NV shadow
   page tables on a per-VM basis.
 
 - Tidy up the pKVM PSCI proxy code to be slightly less hard
   to follow.
 
 - Fix both SPE and TRBE in non-VHE configurations so that they
   do not generate spurious, out of context table walks that
   ultimately lead to very bad HW lockups.
 
 - A small set of patches fixing the Stage-2 MMU freeing in error
   cases.
 
 - Tighten-up accepted SMC immediate value to be only #0 for host
   SMCCC calls.
 
 - The usual cleanups and other selftest churn.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmnWdswACgkQI9DQutE9
 ekNYvBAAxj5Zmsx8sJ2CYDTJc2w4XkEjSgDugA+J/s0TMgrzExeBlWCstdhVTncy
 68nwOjQl3TotnIrt7q36kko9u7IdD0pHNrk34NtlggLjHfB61n9SNcAA6j4F6zJa
 GFkHpJSrSnZuUPqapkDnlyhuPkgTIAkEUk2Am9siksSfY4HvRyHZJm2FTdxsdIBn
 NN9wvQqw2wefTXOQ8gS+oHbPVp1cPbwrF2a3EhzXXv/6W3mUBstXgsijgo07UzCp
 W6vHCv2wqHbHdf67z3Q3hL+VXlVH6oHlyW99/swqISvqRkH/iSB90+oUojnMRrSm
 yB6Wmhh8jboCaajWMJhG+veZw+7GMXU4nOrGd1rbnY8cwRl/TQ5YibhRm7DIdvjO
 xeUluTLJ0NdweQUwE2k4OlgKOuGang3E2p0clmkUO4SstA48MdqR/kpST6guIlWw
 U5syuNaaaiuwP5QOi9qZmMCNmQ3ZfnZG3nseJFdoyGjhVhf5jyQyv4Du9vGZQFF/
 Zkg7yTqC4OWiC+3GkW9YYAySM1MyetivLtd47PGzHPTdtaZziWhNvQ0y+8QjQ+R+
 CJNvyS/DvsT7epSya4sLgMP1ZAlih9xkz5sQ6k8NJLBYYXi0v33qwqditErgLLyj
 S4Ci4WNhHHWIusvCVM7JUBkH0AElpmi506f7F6iHoFLlkYR4t9U=
 =/SuQ
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 7.1

* New features:

- Add support for tracing in the standalone EL2 hypervisor code,
  which should help both debugging and performance analysis.
  This comes with a full infrastructure for 'remote' trace buffers
  that can be exposed by non-kernel entities such as firmware.

- Add support for GICv5 Per Processor Interrupts (PPIs), as the
  starting point for supporting the new GIC architecture in KVM.

- Finally add support for pKVM protected guests, with anonymous
  memory being used as a backing store. About time!

* Improvements and bug fixes:

- Rework the dreaded user_mem_abort() function to make it more
  maintainable, reducing the amount of state being exposed to
  the various helpers and rendering a substantial amount of
  state immutable.

- Expand the Stage-2 page table dumper to support NV shadow
  page tables on a per-VM basis.

- Tidy up the pKVM PSCI proxy code to be slightly less hard
  to follow.

- Fix both SPE and TRBE in non-VHE configurations so that they
  do not generate spurious, out of context table walks that
  ultimately lead to very bad HW lockups.

- A small set of patches fixing the Stage-2 MMU freeing in error
  cases.

- Tighten-up accepted SMC immediate value to be only #0 for host
  SMCCC calls.

- The usual cleanups and other selftest churn.
2026-04-13 11:49:54 +02:00
Catalin Marinas
0baba94a97 arm64: errata: Work around early CME DVMSync acknowledgement
C1-Pro acknowledges DVMSync messages before completing the SME/CME
memory accesses. Work around this by issuing an IPI to the affected CPUs
if they are running in EL0 with SME enabled.

Note that we avoid the local DSB in the IPI handler as the kernel runs
with SCTLR_EL1.IESB=1. This is sufficient to complete SME memory
accesses at EL0 on taking an exception to EL1. On the return to user
path, no barrier is necessary either. See the comment in
sme_set_active() and the more detailed explanation in the link below.

To avoid a potential IPI flood from malicious applications (e.g.
madvise(MADV_PAGEOUT) in a tight loop), track where a process is active
via mm_cpumask() and only interrupt those CPUs.

Link: https://lore.kernel.org/r/ablEXwhfKyJW1i7l@J2N7QTR9R3
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-10 19:46:14 +01:00
Catalin Marinas
d9fb08ba94 arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()
The mm structure will be used for workarounds that need limiting to
specific tasks.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-10 19:46:14 +01:00
Catalin Marinas
480a9e57cc Merge branches 'for-next/misc', 'for-next/tlbflush', 'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core
* arm64/for-next/perf:
  : Perf updates
  perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc()
  perf/arm-cmn: Fix incorrect error check for devm_ioremap()
  perf: add NVIDIA Tegra410 C2C PMU
  perf: add NVIDIA Tegra410 CPU Memory Latency PMU
  perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU
  perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU
  perf/arm_cspmu: Add arm_cspmu_acpi_dev_get
  perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU
  perf/arm_cspmu: nvidia: Rename doc to Tegra241
  perf/arm-cmn: Stop claiming entire iomem region
  arm64: cpufeature: Use pmuv3_implemented() function
  arm64: cpufeature: Make PMUVer and PerfMon unsigned
  KVM: arm64: Read PMUVer as unsigned

* arm64/for-next/read-once:
  : Fixes for __READ_ONCE() with CONFIG_LTO=y
  arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y
  arm64: Optimize __READ_ONCE() with CONFIG_LTO=y

* for-next/misc:
  : Miscellaneous cleanups/fixes
  arm64: rsi: use linear-map alias for realm config buffer
  arm64: Kconfig: fix duplicate word in CMDLINE help text
  arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
  arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps
  arm64: kexec: Remove duplicate allocation for trans_pgd
  arm64: mm: Use generic enum pgtable_level
  arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0
  arm64: remove ARCH_INLINE_*

* for-next/tlbflush:
  : Refactor the arm64 TLB invalidation API and implementation
  arm64: mm: __ptep_set_access_flags must hint correct TTL
  arm64: mm: Provide level hint for flush_tlb_page()
  arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range()
  arm64: mm: More flags for __flush_tlb_range()
  arm64: mm: Refactor __flush_tlb_range() to take flags
  arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid()
  arm64: mm: Simplify __flush_tlb_range_limit_excess()
  arm64: mm: Simplify __TLBI_RANGE_NUM() macro
  arm64: mm: Re-implement the __flush_tlb_range_op macro in C
  arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range()
  arm64: mm: Push __TLBI_VADDR() into __tlbi_level()
  arm64: mm: Implicitly invalidate user ASID based on TLBI operation
  arm64: mm: Introduce a C wrapper for by-range TLB invalidation
  arm64: mm: Re-implement the __tlbi_level macro as a C function

* for-next/ttbr-macros-cleanup:
  : Cleanups of the TTBR1_* macros
  arm64/mm: Directly use TTBRx_EL1_CnP
  arm64/mm: Directly use TTBRx_EL1_ASID_MASK
  arm64/mm: Describe TTBR1_BADDR_4852_OFFSET

* for-next/kselftest:
  : arm64 kselftest updates
  selftests/arm64: Implement cmpbr_sigill() to hwcap test

* for-next/feat_lsui:
  : Futex support using FEAT_LSUI instructions to avoid toggling PAN
  arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present
  arm64: Kconfig: Add support for LSUI
  KVM: arm64: Use CAST instruction for swapping guest descriptor
  arm64: futex: Support futex with FEAT_LSUI
  arm64: futex: Refactor futex atomic operation
  KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI
  KVM: arm64: Expose FEAT_LSUI to guests
  arm64: cpufeature: Add FEAT_LSUI

* for-next/mpam: (40 commits)
  : Expose MPAM to user-space via resctrl:
  :  - Add architecture context-switch and hiding of the feature from KVM.
  :  - Add interface to allow MPAM to be exposed to user-space using resctrl.
  :  - Add errata workaoround for some existing platforms.
  :  - Add documentation for using MPAM and what shape of platforms can use resctrl
  arm64: mpam: Add initial MPAM documentation
  arm_mpam: Quirk CMN-650's CSU NRDY behaviour
  arm_mpam: Add workaround for T241-MPAM-6
  arm_mpam: Add workaround for T241-MPAM-4
  arm_mpam: Add workaround for T241-MPAM-1
  arm_mpam: Add quirk framework
  arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl
  arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
  arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
  arm_mpam: resctrl: Update the rmid reallocation limit
  arm_mpam: resctrl: Add resctrl_arch_rmid_read()
  arm_mpam: resctrl: Allow resctrl to allocate monitors
  arm_mpam: resctrl: Add support for csu counters
  arm_mpam: resctrl: Add monitor initialisation and domain boilerplate
  arm_mpam: resctrl: Add kunit test for control format conversions
  arm_mpam: resctrl: Add support for 'MB' resource
  arm_mpam: resctrl: Wait for cacheinfo to be ready
  arm_mpam: resctrl: Add rmid index helpers
  arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
  arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT
  ...

* for-next/hotplug-batched-tlbi:
  : arm64/mm: Enable batched TLB flush in unmap_hotplug_range()
  arm64/mm: Reject memory removal that splits a kernel leaf mapping
  arm64/mm: Enable batched TLB flush in unmap_hotplug_range()

* for-next/bbml2-fixes:
  : Fixes for realm guest and BBML2_NOABORT
  arm64: mm: Remove pmd_sect() and pud_sect()
  arm64: mm: Handle invalid large leaf mappings correctly
  arm64: mm: Fix rodata=full block mapping support for realm guests

* for-next/sysreg:
  : arm64 sysreg updates
  arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12
  arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06

* for-next/generic-entry:
  : More arm64 refactoring towards using the generic entry code
  arm64: Check DAIF (and PMR) at task-switch time
  arm64: entry: Use split preemption logic
  arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
  arm64: entry: Consistently prefix arm64-specific wrappers
  arm64: entry: Don't preempt with SError or Debug masked
  entry: Split preemption from irqentry_exit_to_kernel_mode()
  entry: Split kernel mode logic from irqentry_{enter,exit}()
  entry: Move irqentry_enter() prototype later
  entry: Remove local_irq_{enable,disable}_exit_to_user()
  entry: Fix stale comment for irqentry_enter()

* for-next/acpi:
  : arm64 ACPI updates
  ACPI: AGDI: fix missing newline in error message
2026-04-10 14:22:24 +01:00
Aneesh Kumar K.V (Arm)
34e563947c arm64: rsi: use linear-map alias for realm config buffer
rsi_get_realm_config() passes its argument to virt_to_phys(), but
&config is a kernel image address and not a linear-map alias.
On arm64 this triggers the below warning:

 virt_to_phys used for non-linear address: (____ptrval____) (config+0x0/0x1000)
 WARNING: arch/arm64/mm/physaddr.c:15 at __virt_to_phys+0x50/0x70, CPU#0: swapper/0
 Modules linked in:
 .....
 Hardware name: linux,dummy-virt (DT)
 pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __virt_to_phys+0x50/0x70
 lr : __virt_to_phys+0x4c/0x70
 .....
 ......
 Call trace:
  __virt_to_phys+0x50/0x70 (P)
  arm64_rsi_init+0xa0/0x1b8
  setup_arch+0x13c/0x1a0
  start_kernel+0x68/0x398
  __primary_switched+0x88/0x90

Pass lm_alias(&config) instead so the RSI call uses the linear-map
alias of the same buffer and avoids the boot-time warning.

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-10 11:52:04 +01:00
Muhammad Usama Anjum
249bf97331 arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
With KASAN_HW_TAGS (MTE) in synchronous mode, tag check faults are
reported as immediate Data Abort exceptions. The TFSR_EL1.TF1 bit is
never set since faults never go through the asynchronous path.
Therefore, reading TFSR_EL1 and executing data and instruction barriers
on kernel entry, exit, context switch and suspend is unnecessary
overhead.

As with the check_mte_async_tcf and clear_mte_async_tcf paths for
TFSRE0_EL1, extend the same optimisation to kernel entry/exit, context
switch and suspend.

All mte kselftests pass. The kunit before and after the patch show same
results.

A selection of test_vmalloc benchmarks running on a arm64 machine.
v6.19 is the baseline. (>0 is faster, <0 is slower, (R)/(I) =
statistically significant Regression/Improvement). Based on significance
and ignoring the noise, the benchmarks improved.

* 77 result classes were considered, with 9 wins, 0 losses and 68 ties

Results of fastpath [1] on v6.19 vs this patch:

+----------------------------+----------------------------------------------------------+------------+
| Benchmark                  | Result Class                                             |   barriers |
+============================+==========================================================+============+
| micromm/fork               | fork: p:1, d:10 (seconds)                                |  (I) 2.75% |
|                            | fork: p:512, d:10 (seconds)                              |      0.96% |
+----------------------------+----------------------------------------------------------+------------+
| micromm/munmap             | munmap: p:1, d:10 (seconds)                              |     -1.78% |
|                            | munmap: p:512, d:10 (seconds)                            |      5.02% |
+----------------------------+----------------------------------------------------------+------------+
| micromm/vmalloc            | fix_align_alloc_test: p:1, h:0, l:500000 (usec)          |     -0.56% |
|                            | fix_size_alloc_test: p:1, h:0, l:500000 (usec)           |      0.70% |
|                            | fix_size_alloc_test: p:4, h:0, l:500000 (usec)           |      1.18% |
|                            | fix_size_alloc_test: p:16, h:0, l:500000 (usec)          |     -5.01% |
|                            | fix_size_alloc_test: p:16, h:1, l:500000 (usec)          |     13.81% |
|                            | fix_size_alloc_test: p:64, h:0, l:100000 (usec)          |      6.51% |
|                            | fix_size_alloc_test: p:64, h:1, l:100000 (usec)          |     32.87% |
|                            | fix_size_alloc_test: p:256, h:0, l:100000 (usec)         |      4.17% |
|                            | fix_size_alloc_test: p:256, h:1, l:100000 (usec)         |      8.40% |
|                            | fix_size_alloc_test: p:512, h:0, l:100000 (usec)         |     -0.48% |
|                            | fix_size_alloc_test: p:512, h:1, l:100000 (usec)         |     -0.74% |
|                            | full_fit_alloc_test: p:1, h:0, l:500000 (usec)           |      0.53% |
|                            | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |     -2.81% |
|                            | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |     -2.06% |
|                            | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec)     |     -0.56% |
|                            | pcpu_alloc_test: p:1, h:0, l:500000 (usec)               |     -0.41% |
|                            | random_size_align_alloc_test: p:1, h:0, l:500000 (usec)  |      0.89% |
|                            | random_size_alloc_test: p:1, h:0, l:500000 (usec)        |      1.71% |
|                            | vm_map_ram_test: p:1, h:0, l:500000 (usec)               |      0.83% |
+----------------------------+----------------------------------------------------------+------------+
| schbench/thread-contention | -m 16 -t 1 -r 10 -s 1000, avg_rps (req/sec)              |      0.05% |
|                            | -m 16 -t 1 -r 10 -s 1000, req_latency_p99 (usec)         |      0.60% |
|                            | -m 16 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec)      |      0.00% |
|                            | -m 16 -t 4 -r 10 -s 1000, avg_rps (req/sec)              |     -0.34% |
|                            | -m 16 -t 4 -r 10 -s 1000, req_latency_p99 (usec)         |     -0.58% |
|                            | -m 16 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec)      |      9.09% |
|                            | -m 16 -t 16 -r 10 -s 1000, avg_rps (req/sec)             |     -0.74% |
|                            | -m 16 -t 16 -r 10 -s 1000, req_latency_p99 (usec)        |     -1.40% |
|                            | -m 16 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec)     |      0.00% |
|                            | -m 16 -t 64 -r 10 -s 1000, avg_rps (req/sec)             |     -0.78% |
|                            | -m 16 -t 64 -r 10 -s 1000, req_latency_p99 (usec)        |     -0.11% |
|                            | -m 16 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec)     |      0.11% |
|                            | -m 16 -t 256 -r 10 -s 1000, avg_rps (req/sec)            |      2.64% |
|                            | -m 16 -t 256 -r 10 -s 1000, req_latency_p99 (usec)       |      3.15% |
|                            | -m 16 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec)    |     17.54% |
|                            | -m 32 -t 1 -r 10 -s 1000, avg_rps (req/sec)              |     -1.22% |
|                            | -m 32 -t 1 -r 10 -s 1000, req_latency_p99 (usec)         |      0.85% |
|                            | -m 32 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec)      |      0.00% |
|                            | -m 32 -t 4 -r 10 -s 1000, avg_rps (req/sec)              |     -0.34% |
|                            | -m 32 -t 4 -r 10 -s 1000, req_latency_p99 (usec)         |      1.05% |
|                            | -m 32 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec)      |      0.00% |
|                            | -m 32 -t 16 -r 10 -s 1000, avg_rps (req/sec)             |     -0.41% |
|                            | -m 32 -t 16 -r 10 -s 1000, req_latency_p99 (usec)        |      0.58% |
|                            | -m 32 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec)     |      2.13% |
|                            | -m 32 -t 64 -r 10 -s 1000, avg_rps (req/sec)             |      0.67% |
|                            | -m 32 -t 64 -r 10 -s 1000, req_latency_p99 (usec)        |      2.07% |
|                            | -m 32 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec)     |     -1.28% |
|                            | -m 32 -t 256 -r 10 -s 1000, avg_rps (req/sec)            |      1.01% |
|                            | -m 32 -t 256 -r 10 -s 1000, req_latency_p99 (usec)       |      0.69% |
|                            | -m 32 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec)    |     13.12% |
|                            | -m 64 -t 1 -r 10 -s 1000, avg_rps (req/sec)              |     -0.25% |
|                            | -m 64 -t 1 -r 10 -s 1000, req_latency_p99 (usec)         |     -0.48% |
|                            | -m 64 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec)      |     10.53% |
|                            | -m 64 -t 4 -r 10 -s 1000, avg_rps (req/sec)              |     -0.06% |
|                            | -m 64 -t 4 -r 10 -s 1000, req_latency_p99 (usec)         |      0.00% |
|                            | -m 64 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec)      |      0.00% |
|                            | -m 64 -t 16 -r 10 -s 1000, avg_rps (req/sec)             |     -0.36% |
|                            | -m 64 -t 16 -r 10 -s 1000, req_latency_p99 (usec)        |      0.52% |
|                            | -m 64 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec)     |      0.11% |
|                            | -m 64 -t 64 -r 10 -s 1000, avg_rps (req/sec)             |      0.52% |
|                            | -m 64 -t 64 -r 10 -s 1000, req_latency_p99 (usec)        |      3.53% |
|                            | -m 64 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec)     |     -0.10% |
|                            | -m 64 -t 256 -r 10 -s 1000, avg_rps (req/sec)            |      2.53% |
|                            | -m 64 -t 256 -r 10 -s 1000, req_latency_p99 (usec)       |      1.82% |
|                            | -m 64 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec)    |     -5.80% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/getpid             | mean (ns)                                                | (I) 15.98% |
|                            | p99 (ns)                                                 | (I) 11.11% |
|                            | p99.9 (ns)                                               | (I) 16.13% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/getppid            | mean (ns)                                                | (I) 14.82% |
|                            | p99 (ns)                                                 | (I) 17.86% |
|                            | p99.9 (ns)                                               |  (I) 9.09% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/invalid            | mean (ns)                                                | (I) 17.78% |
|                            | p99 (ns)                                                 | (I) 11.11% |
|                            | p99.9 (ns)                                               |     13.33% |
+----------------------------+----------------------------------------------------------+------------+

[1] https://gitlab.arm.com/tooling/fastpath

Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-09 18:56:22 +01:00
Wang Wensheng
ee020bf6f1 arm64: kexec: Remove duplicate allocation for trans_pgd
trans_pgd would be allocated in trans_pgd_create_copy(), so remove the
duplicate allocation before calling trans_pgd_create_copy().

Fixes: 3744b5280e ("arm64: kexec: install a copy of the linear-map")
Signed-off-by: Wang Wensheng <wsw9603@163.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-08 17:49:08 +01:00
Mark Rutland
8d13386c76 arm64: Check DAIF (and PMR) at task-switch time
When __switch_to() switches from a 'prev' task to a 'next' task, various
pieces of CPU state are expected to have specific values, such that
these do not need to be saved/restored. If any of these hold an
unexpected value when switching away from the prev task, they could lead
to surprising behaviour in the context of the next task, and it would be
difficult to determine where they were configured to their unexpected
value.

Add some checks for DAIF and PMR at task-switch time so that we can
detect such issues.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-08 17:40:04 +01:00
Mark Rutland
ae654112ea arm64: entry: Use split preemption logic
The generic irqentry code now provides
irqentry_exit_to_kernel_mode_preempt() and
irqentry_exit_to_kernel_mode_after_preempt(), which can be used
where architectures have different state requirements for involuntary
preemption and exception return, as is the case on arm64.

Use the new functions on arm64, aligning our exit to kernel mode logic
with the style of our exit to user mode logic. This removes the need for
the recently-added bodge in arch_irqentry_exit_need_resched(), and
allows preemption to occur when returning from any exception taken from
kernel mode, which is nicer for RT.

In an ideal world, we'd remove arch_irqentry_exit_need_resched(), and
fold the conditionality directly into the architecture-specific entry
code. That way all the logic necessary to avoid preempting from a
pseudo-NMI could be constrained specifically to the EL1 IRQ/FIQ paths,
avoiding redundant work for other exceptions, and making the flow a bit
clearer. At present it looks like that would require a larger
refactoring (e.g. for the PREEMPT_DYNAMIC logic), and so I've left that
as-is for now.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-08 17:40:04 +01:00
Mark Rutland
a07b7b2142 arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode()
The generic irqentry code now provides irqentry_enter_from_kernel_mode()
and irqentry_exit_to_kernel_mode(), which can be used when an exception
is known to be taken from kernel mode. These can be inlined into
architecture-specific entry code, and avoid redundant work to test
whether the exception was taken from user mode.

Use these in arm64_enter_from_kernel_mode() and
arm64_exit_to_kernel_mode(), which are only used for exceptions known to
be taken from kernel mode. This will remove a small amount of redundant
work, and will permit further changes to arm64_exit_to_kernel_mode() in
subsequent patches.

There should be no funcitonal change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-08 17:40:04 +01:00
Mark Rutland
6879ef1302 arm64: entry: Consistently prefix arm64-specific wrappers
For historical reasons, arm64's entry code has arm64-specific functions
named enter_from_kernel_mode() and exit_to_kernel_mode(), which are
wrappers for similarly-named functions from the generic irqentry code.
Other arm64-specific wrappers have an 'arm64_' prefix to clearly
distinguish them from their generic counterparts, e.g.
arm64_enter_from_user_mode() and arm64_exit_to_user_mode().

For consistency and clarity, add an 'arm64_' prefix to these functions.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-08 17:40:04 +01:00
Marc Zyngier
d77f4792db Merge branch kvm-arm64/vgic-fixes-7.1 into kvmarm-master/next
* kvm-arm64/vgic-fixes-7.1:
  : .
  : FIrst pass at fixing a number of vgic-v5 bugs that were found
  : after the merge of the initial series.
  : .
  KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE
  KVM: arm64: vgic-v5: Fold PPI state for all exposed PPIs
  KVM: arm64: set_id_regs: Allow GICv3 support to be set at runtime
  KVM: arm64: Don't advertises GICv3 in ID_PFR1_EL1 if AArch32 isn't supported
  KVM: arm64: Correctly plumb ID_AA64PFR2_EL1 into pkvm idreg handling
  KVM: arm64: Move GICv5 timer PPI validation into timer_irqs_are_valid()
  KVM: arm64: Remove evaluation of timer state in kvm_cpu_has_pending_timer()
  KVM: arm64: Kill arch_timer_context::direct field
  KVM: arm64: vgic-v5: Correctly set dist->ready once initialised
  KVM: arm64: vgic-v5: Make the effective priority mask a strict limit
  KVM: arm64: vgic-v5: Cast vgic_apr to u32 to avoid undefined behaviours
  KVM: arm64: vgic-v5: Transfer edge pending state to ICH_PPI_PENDRx_EL2
  KVM: arm64: vgic-v5: Hold config_lock while finalizing GICv5 PPIs
  KVM: arm64: Account for RESx bits in __compute_fgt()
  KVM: arm64: Fix writeable mask for ID_AA64PFR2_EL1
  arm64: Fix field references for ICH_PPI_DVIR[01]_EL2
  KVM: arm64: Don't skip per-vcpu NV initialisation
  KVM: arm64: vgic: Don't reset cpuif/redist addresses at finalize time

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08 12:26:00 +01:00
Marc Zyngier
b693940e81 Merge branch kvm-arm64/pkvm-psci into kvmarm-master/next
* kvm-arm64/pkvm-psci:
  : .
  : Cleanup of the pKVM PSCI relay CPU entry code, making it slightly
  : easier to follow, should someone have to wade into these waters
  : ever again.
  : .
  KVM: arm64: Remove extra ISBs when using msr_hcr_el2
  KVM: arm64: pkvm: Use direct function pointers for cpu_{on,resume}
  KVM: arm64: pkvm: Turn __kvm_hyp_init_cpu into an inner label
  KVM: arm64: pkvm: Simplify BTI handling on CPU boot
  KVM: arm64: pkvm: Move error handling to the end of kvm_hyp_cpu_entry

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08 12:23:24 +01:00
Marc Zyngier
2de32a25a3 Merge branch kvm-arm64/hyp-tracing into kvmarm-master/next
* kvm-arm64/hyp-tracing: (40 commits)
  : .
  : EL2 tracing support, adding both 'remote' ring-buffer
  : infrastructure and the tracing itself, courtesy of
  : Vincent Donnefort. From the cover letter:
  :
  : "The growing set of features supported by the hypervisor in protected
  : mode necessitates debugging and profiling tools. Tracefs is the
  : ideal candidate for this task:
  :
  :   * It is simple to use and to script.
  :
  :   * It is supported by various tools, from the trace-cmd CLI to the
  :     Android web-based perfetto.
  :
  :   * The ring-buffer, where are stored trace events consists of linked
  :     pages, making it an ideal structure for sharing between kernel and
  :     hypervisor.
  :
  : This series first introduces a new generic way of creating remote events and
  : remote buffers. Then it adds support to the pKVM hypervisor."
  : .
  tracing: selftests: Extend hotplug testing for trace remotes
  tracing: Non-consuming read for trace remotes with an offline CPU
  tracing: Adjust cmd_check_undefined to show unexpected undefined symbols
  tracing: Restore accidentally removed SPDX tag
  KVM: arm64: avoid unused-variable warning
  tracing: Generate undef symbols allowlist for simple_ring_buffer
  KVM: arm64: tracing: add ftrace dependency
  tracing: add more symbols to whitelist
  tracing: Update undefined symbols allow list for simple_ring_buffer
  KVM: arm64: Fix out-of-tree build for nVHE/pKVM tracing
  tracing: selftests: Add hypervisor trace remote tests
  KVM: arm64: Add selftest event support to nVHE/pKVM hyp
  KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp
  KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote
  KVM: arm64: Add trace reset to the nVHE/pKVM hyp
  KVM: arm64: Sync boot clock with the nVHE/pKVM hyp
  KVM: arm64: Add trace remote for the nVHE/pKVM hyp
  KVM: arm64: Add tracing capability for the nVHE/pKVM hyp
  KVM: arm64: Support unaligned fixmap in the pKVM hyp
  KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp
  ...

Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08 12:21:51 +01:00
Chengwen Feng
7cd5f5659a arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
As a step towards unifying the interface for retrieving ACPI CPU UID
across architectures, introduce a new function acpi_get_cpu_uid() for
arm64. While at it, add input validation to make the code more robust.

Reimplement get_cpu_for_acpi_id() based on acpi_get_cpu_uid() for
consistency, and move its implementation next to the new function for
code coherence.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://patch.msgid.link/20260401081640.26875-2-fengchengwen@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-04-06 16:55:15 +02:00
Marc Zyngier
7e629348df KVM: arm64: Advertise ID_AA64PFR2_EL1.GCIE
As we are missing ID_AA64PFR2_EL1.GCIE from the kernel feature set,
userspace cannot write ID_AA64PFR2_EL1 with GCIE set, even if we are
on a GICv5 host.

Add the required field description.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://patch.msgid.link/20260401170017.369529-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-05 08:15:35 +01:00
Linus Torvalds
441c63ff42 arm64 fix for v7.0
- Implement a basic static call trampoline to fix CFI failures with the
   generic implementation.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmnPh0AQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNHRVB/97IOb/LZAq2yguGy6rMptm3tCdCsUmgPkh
 aPBeI4BE1JXofRcyM1oaavM/wC6M3ASb8JJbg5Ceta3wXwPfjzR2F9+6OEzipXzC
 nQzm0Da5GvwiHOY6GGhOgUy91+JJB1g7402ALIRjCiaadDBTLgys/YzDFUGC4+8N
 QKToOJykO4sCUR4lpYpuJvd1NQv1VkJo4ZgtlWvanHo9ovkTXOuCJsCTBv6EHMo6
 nJg9iSZOMj3L20VSmnY5fa0MpCNCXH8cfYtbmHBYBxI3e3sKYI8A2j0H22FP4oIH
 2+tkIg5TxQsmejf9u9V1JES2/0712SmG/hS0y1BsQtYzVuDp7pBZ
 =qSXb
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Will Deacon:

 - Implement a basic static call trampoline to fix CFI failures with the
   generic implementation

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: Use static call trampolines when kCFI is enabled
2026-04-03 08:47:13 -07:00
Coiby Xu
e3a84be1ec arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
CONFIG_CRASH_DM_CRYPT has been introduced to support LUKS-encrypted
device dump target by addressing two challenges [1],
 - Kdump kernel may not be able to decrypt the LUKS partition. For some
   machines, a system administrator may not have a chance to enter the
   password to decrypt the device in kdump initramfs after the 1st kernel
   crashes

 - LUKS2 by default use the memory-hard Argon2 key derivation function
   which is quite memory-consuming compared to the limited memory reserved
   for kdump.

To also enable this feature for ARM64 and PowerPC, the missing piece is to
let the kdump kernel know where to find the dm-crypt keys which are
randomly stored in memory reserved for kdump.  Introduce a new device tree
property dmcryptkeys [2] as similar to elfcorehdr to pass the memory
address of the stored info of dm-crypt keys to the kdump kernel.  Since
this property is only needed by the kdump kernel, it won't be exposed to
userspace.

Link: https://lkml.kernel.org/r/20260225060347.718905-4-coxu@redhat.com
Link: https://lore.kernel.org/all/20250502011246.99238-1-coxu@redhat.com/ [1]
Link: https://github.com/devicetree-org/dt-schema/pull/181 [2]
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Arnaud Lefebvre <arnaud.lefebvre@clever-cloud.com>
Cc: Baoquan he <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Thomas Staudt <tstaudt@de.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-02 23:36:24 -07:00
Ard Biesheuvel
54ac9ff8f1 arm64: Use static call trampolines when kCFI is enabled
Implement arm64 support for the 'unoptimized' static call variety, which
routes all calls through a trampoline that performs a tail call to the
chosen function, and wire it up for use when kCFI is enabled. This works
around an issue with kCFI and generic static calls, where the prototypes
of default handlers such as __static_call_nop() and __static_call_ret0()
don't match the expected prototype of the call site, resulting in kCFI
false positives [0].

Since static call targets may be located in modules loaded out of direct
branching range, this needs an ADRP/LDR pair to load the branch target
into R16 and a branch-to-register (BR) instruction to perform an
indirect call.

Unlike on x86, there is no pressing need on arm64 to avoid indirect
calls at all cost, but hiding it from the compiler as is done here does
have some benefits:
- the literal is located in .rodata, which gives us the same robustness
  advantage that code patching does;
- no D-cache pollution from fetching hash values from .text sections.

From an execution speed PoV, this is unlikely to make any difference at
all.

Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will McVicker <willmcvicker@google.com>
Reported-by: Carlos Llamas <cmllamas@google.com>
Closes: https://lore.kernel.org/all/20260311225822.1565895-1-cmllamas@google.com/ [0]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-04-01 15:29:59 +01:00
Yeoreum Yun
e223258ed8 arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present
The purpose of supporting LSUI is to eliminate PAN toggling. CPUs that
support LSUI are unlikely to support a 32-bit runtime. Rather than
emulating the SWP instruction using LSUI instructions in order to remove
PAN toggling, simply disable SWP emulation.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
[catalin.marinas@arm.com: some tweaks to the in-code comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-03-27 17:29:10 +00:00
Ben Horgan
37fe0f984d arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
The MPAMSM_EL1 sets the MPAM labels, PMG and PARTID, for loads and stores
generated by a shared SMCU. Disable the traps so the kernel can use it and
set it to the same configuration as the per-EL cpu MPAM configuration.

If an SMCU is not shared with other cpus then it is implementation
defined whether the configuration from MPAMSM_EL1 is used or that from
the appropriate MPAMy_ELx. As we set the same, PMG_D and PARTID_D,
configuration for MPAM0_EL1, MPAM1_EL1 and MPAMSM_EL1 the resulting
configuration is the same regardless.

The range of valid configurations for the PARTID and PMG in MPAMSM_EL1 is
not currently specified in Arm Architectural Reference Manual but the
architect has confirmed that it is intended to be the same as that for the
cpu configuration in the MPAMy_ELx registers.

Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2026-03-27 15:29:02 +00:00
James Morse
735dad9999 arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
The MPAM system registers will be lost if the CPU is reset during PSCI's
CPU_SUSPEND.

Add a PM notifier to restore them.

mpam_thread_switch(current) can't be used as this won't make any changes if
the in-memory copy says the register already has the correct value. In
reality the system register is UNKNOWN out of reset.

Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2026-03-27 15:28:54 +00:00
James Morse
831a7f1672 arm64: mpam: Advertise the CPUs MPAM limits to the driver
Requesters need to populate the MPAM fields for any traffic they send on
the interconnect. For the CPUs these values are taken from the
corresponding MPAMy_ELx register. Each requester may have a limit on the
largest PARTID or PMG value that can be used. The MPAM driver has to
determine the system-wide minimum supported PARTID and PMG values.

To do this, the driver needs to be told what each requestor's limit is.

CPUs are special, but this infrastructure is also needed for the SMMU and
GIC ITS. Call the helper to tell the MPAM driver what the CPUs can do.

The return value can be ignored by the arch code as it runs well before the
MPAM driver starts probing.

Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Co-developed-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
[ morse: requestor->requester as argued by ispell ]
Signed-off-by: James Morse <james.morse@arm.com>
2026-03-27 15:28:42 +00:00
James Morse
87b78a5d70 arm64: mpam: Re-initialise MPAM regs when CPU comes online
Now that the MPAM system registers are expected to have values that change,
reprogram them based on the previous value when a CPU is brought online.

Previously MPAM's 'default PARTID' of 0 was always used for MPAM in
kernel-space as this is the PARTID that hardware guarantees to
reset. Because there are a limited number of PARTID, this value is exposed
to user-space, meaning resctrl changes to the resctrl default group would
also affect kernel threads.  Instead, use the task's PARTID value for
kernel work on behalf of user-space too. The default of 0 is kept for both
user-space and kernel-space when MPAM is not enabled.

Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2026-03-27 15:28:25 +00:00
James Morse
8e06d04ff1 arm64: mpam: Context switch the MPAM registers
MPAM allows traffic in the SoC to be labeled by the OS, these labels are
used to apply policy in caches and bandwidth regulators, and to monitor
traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
equivalent calls these CLOSID and RMID, but they don't map precisely.

MPAM has two CPU system registers that is used to hold the PARTID and PMG
values that traffic generated at each exception level will use. These can
be set per-task by the resctrl file system. (resctrl is the defacto
interface for controlling this stuff).

Add a helper to switch this.

struct task_struct's separate CLOSID and RMID fields are insufficient to
implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
PMG (sort of like the RMID) separately. On x86, the rmid is an independent
number, so a race that writes a mismatched closid and rmid into hardware is
benign. On arm64, the pmg bits extend the partid.
(i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).  In
this case, mismatching the values will 'dirty' a pmg value that resctrl
believes is clean, and is not tracking with its 'limbo' code.

To avoid this, the partid and pmg are always read and written as a
pair. This requires a new u64 field. In struct task_struct there are two
u32, rmid and closid for the x86 case, but as we can't use them here do
something else. Add this new field, mpam_partid_pmg, to struct thread_info
to avoid adding more architecture specific code to struct task_struct.
Always use READ_ONCE()/WRITE_ONCE() when accessing this field.

Resctrl allows a per-cpu 'default' value to be set, this overrides the
values when scheduling a task in the default control-group, which has
PARTID 0. The way 'code data prioritisation' gets emulated means the
register value for the default group needs to be a variable.

The current system register value is kept in a per-cpu variable to avoid
writing to the system register if the value isn't going to change.  Writes
to this register may reset the hardware state for regulating bandwidth.

Finally, there is no reason to context switch these registers unless there
is a driver changing the values in struct task_struct. Hide the whole thing
behind a static key. This also allows the driver to disable MPAM in
response to errors reported by hardware. Move the existing static key to
belong to the arch code, as in the future the MPAM driver may become a
loadable module.

All this should depend on whether there is an MPAM driver, hide it behind
CONFIG_ARM64_MPAM.

Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
CC: Amit Singh Tomar <amitsinght@marvell.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
2026-03-27 15:27:59 +00:00
Yeoreum Yun
7181f718cb arm64: cpufeature: Add FEAT_LSUI
Since Armv9.6, FEAT_LSUI introduces atomic instructions that allow
privileged code to access user memory without clearing the PSTATE.PAN
bit. Add CPU feature detection for FEAT_LSUI.

Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
[catalin.marinas@arm.com: Remove commit log references to SW_PAN]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-03-26 18:19:07 +00:00
Ryan Roberts
a96ef5848c randomize_kstack: Unify random source across arches
Previously different architectures were using random sources of
differing strength and cost to decide the random kstack offset. A number
of architectures (loongarch, powerpc, s390, x86) were using their
timestamp counter, at whatever the frequency happened to be. Other
arches (arm64, riscv) were using entropy from the crng via
get_random_u16().

There have been concerns that in some cases the timestamp counters may
be too weak, because they can be easily guessed or influenced by user
space. And get_random_u16() has been shown to be too costly for the
level of protection kstack offset randomization provides.

So let's use a common, architecture-agnostic source of entropy; a
per-cpu prng, seeded at boot-time from the crng. This has a few
benefits:

  - We can remove choose_random_kstack_offset(); That was only there to
    try to make the timestamp counter value a bit harder to influence
    from user space [*].

  - The architecture code is simplified. All it has to do now is call
    add_random_kstack_offset() in the syscall path.

  - The strength of the randomness can be reasoned about independently
    of the architecture.

  - Arches previously using get_random_u16() now have much faster
    syscall paths, see below results.

[*] Additionally, this gets rid of some redundant work on s390 and x86.
Before this patch, those architectures called
choose_random_kstack_offset() under arch_exit_to_user_mode_prepare(),
which is also called for exception returns to userspace which were *not*
syscalls (e.g. regular interrupts). Getting rid of
choose_random_kstack_offset() avoids a small amount of redundant work
for the non-syscall cases.

In some configurations, add_random_kstack_offset() will now call
instrumentable code, so for a couple of arches, I have moved the call a
bit later to the first point where instrumentation is allowed. This
doesn't impact the efficacy of the mechanism.

There have been some claims that a prng may be less strong than the
timestamp counter if not regularly reseeded. But the prng has a period
of about 2^113. So as long as the prng state remains secret, it should
not be possible to guess. If the prng state can be accessed, we have
bigger problems.

Additionally, we are only consuming 6 bits to randomize the stack, so
there are only 64 possible random offsets. I assert that it would be
trivial for an attacker to brute force by repeating their attack and
waiting for the random stack offset to be the desired one. The prng
approach seems entirely proportional to this level of protection.

Performance data are provided below. The baseline is v6.18 with rndstack
on for each respective arch. (I)/(R) indicate statistically significant
improvement/regression. arm64 platform is AWS Graviton3 (m7g.metal).
x86_64 platform is AWS Sapphire Rapids (m7i.24xlarge):

+-----------------+--------------+---------------+---------------+
| Benchmark       | Result Class |  per-cpu-prng |  per-cpu-prng |
|                 |              | arm64 (metal) |   x86_64 (VM) |
+=================+==============+===============+===============+
| syscall/getpid  | mean (ns)    |    (I) -9.50% |   (I) -17.65% |
|                 | p99 (ns)     |   (I) -59.24% |   (I) -24.41% |
|                 | p99.9 (ns)   |   (I) -59.52% |   (I) -28.52% |
+-----------------+--------------+---------------+---------------+
| syscall/getppid | mean (ns)    |    (I) -9.52% |   (I) -19.24% |
|                 | p99 (ns)     |   (I) -59.25% |   (I) -25.03% |
|                 | p99.9 (ns)   |   (I) -59.50% |   (I) -28.17% |
+-----------------+--------------+---------------+---------------+
| syscall/invalid | mean (ns)    |   (I) -10.31% |   (I) -18.56% |
|                 | p99 (ns)     |   (I) -60.79% |   (I) -20.06% |
|                 | p99.9 (ns)   |   (I) -61.04% |   (I) -25.04% |
+-----------------+--------------+---------------+---------------+

I tested an earlier version of this change on x86 bare metal and it
showed a smaller but still significant improvement. The bare metal
system wasn't available this time around so testing was done in a VM
instance. I'm guessing the cost of rdtsc is higher for VMs.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://patch.msgid.link/20260303150840.3789438-3-ryan.roberts@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
2026-03-24 21:12:03 -07:00
James Clark
15ed3fa23c arm64: cpufeature: Use pmuv3_implemented() function
Other places that are doing this version comparison are already using
pmuv3_implemented(), so might as well use it here too for consistency.

Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-03-24 12:33:49 +00:00
James Clark
d1dcc20bcc arm64: cpufeature: Make PMUVer and PerfMon unsigned
On the host, this change doesn't make a difference because the fields
are defined as FTR_EXACT. However, KVM allows userspace to set these
fields for a guest and overrides the type to be FTR_LOWER_SAFE. And
while KVM used to do an unsigned comparison to validate that the new
value is lower than what the hardware provides, since the linked commit
it uses the generic sanitization framework which does a signed
comparison.

Fix it by defining these fields as unsigned. In theory, without this
fix, userspace could set a higher PMU version than the hardware supports
by providing any value with the top bit set.

Fixes: c118cead07 ("KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Colton Lewis <coltonlewis@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-03-24 12:33:49 +00:00
Christoph Hellwig
e43dce8a0b
fs: fix archiecture-specific compat_ftruncate64
The "small" argument to do_sys_ftruncate indicates if > 32-bit size
should be reject, but all the arch-specific compat ftruncate64
implementations get this wrong.  Merge do_sys_ftruncate and
ksys_ftruncate, replace the integer as boolean small flag with a
descriptive one about LFS semantics, and use it correctly in the
architecture-specific ftruncate64 implementations.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: 3dd681d944 ("arm64: 32-bit (compat) applications support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260323070205.2939118-2-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-23 12:41:57 +01:00
Marc Zyngier
54a3cc1454 KVM: arm64: Remove extra ISBs when using msr_hcr_el2
The msr_hcr_el2 macro is slightly awkward, as it provides an ISB
when CONFIG_AMPERE_ERRATUM_AC04_CPU_23 is present, and none
otherwise. Note that this this option is 'default y', meaning that
it is likely to be selected.

Most instances of msr_hcr_el2 are also immediately followed by an ISB,
meaning that in most cases, you end-up with two back-to-back ISBs.
This isn't a big deal, but once you have seen that, you can't unsee it.

Rework the msr_hcr_el2 macro to always provide the ISB, and drop
the superfluous ISBs everywhere else.

Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260321212419.2803972-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-23 11:03:53 +00:00
Linus Torvalds
165160265e arm64 fixes for -rc5
- Fix DWARF parsing for SCS/PAC patching to work with very large modules
   (such as the amdgpu driver).
 
 - Fixes to the mpam resctrl driver.
 
 - Fix broken handling of 52-bit physical addresses when sharing memory
   from within a realm.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmm9KmsQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNMBvB/4xtXt77u6Bx6vG3b9LZVU7XxIZN+svvIWs
 S1unNTqcLPjNhkqC7kJeTgOjbUOJ6jtCm3NQRg66fUDXOknwHp8d1yjoNI+eS6Ki
 hhRWtWZm+vGNb0YAJTNAATuNSmvn0qx3KMlHEQKnKUsAdzuVTTxwln0GjASLcP7H
 gMl0h46/CtvTRoSlBzTd5ObR8bcQeD1tRBHlXaCZI4i0rF9d3Aur3n1Vz7DfOUP9
 YzHjNZIdWd/6+hIqVAzrhiJE3kxLRv46OXh71Q2YKWe48/USCUskueGLK3c27Gs1
 o6xsc9ZlItVRTO6J1rFCN2No2Pigmdqkqu1moeZCb37R79ilVb/i
 =Wv4u
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "There's a small crop of fixes for the MPAM resctrl driver, a fix for
  SCS/PAC patching with the AMDGPU driver and a page-table fix for
  realms running with 52-bit physical addresses:

   - Fix DWARF parsing for SCS/PAC patching to work with very large
     modules (such as the amdgpu driver)

   - Fixes to the mpam resctrl driver

   - Fix broken handling of 52-bit physical addresses when sharing
     memory from within a realm"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: realm: Fix PTE_NS_SHARED for 52bit PA support
  arm_mpam: Force __iomem casts
  arm_mpam: Disable preemption when making accesses to fake MSC in kunit test
  arm_mpam: Fix null pointer dereference when restoring bandwidth counters
  arm64/scs: Fix handling of advance_loc4
2026-03-20 09:23:01 -07:00
Suzuki K Poulose
8c6e9b60f5 arm64: realm: Fix PTE_NS_SHARED for 52bit PA support
With LPA/LPA2, the top bits of the PFN (Bits[51:48]) end up in the lower bits
of the PTE. So, simply creating a mask of the "top IPA bit" doesn't work well
for these configurations to set the "top" bit at the output of Stage1
translation.

Fix this by using the __phys_to_pte_val() to do the right thing for all
configurations.

Tested using, kvmtool, placing the memory at a higher address (-m <size>@<Addr>).

 e.g:
 # lkvm run --realm -c 4 -m 512M@@128T -k Image --console serial

 sh-5.0# dmesg | grep "LPA2\|RSI"
[    0.000000] RME: Using RSI version 1.0
[    0.000000] CPU features: detected: 52-bit Virtual Addressing (LPA2)
[    0.777354] CPU features: detected: 52-bit Virtual Addressing for KVM (LPA2)

Fixes: 3993069549 ("arm64: realm: Query IPA size from the RMM")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2026-03-19 12:46:05 +00:00
Linus Torvalds
11e8c7e947 ARM:
- Correctly handle deeactivation of interrupts that were activated from
   LRs.  Since EOIcount only denotes deactivation of interrupts that
   are not present in an LR, start EOIcount deactivation walk *after*
   the last irq that made it into an LR.
 
 - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when
   pKVM is already enabled -- not only thhis isn't possible (pKVM
   will reject the call), but it is also useless: this can only
   happen for a CPU that has already booted once, and the capability
   will not change.
 
 - Fix a couple of low-severity bugs in our S2 fault handling path,
   affecting the recently introduced LS64 handling and the even more
   esoteric handling of hwpoison in a nested context
 
 - Address yet another syzkaller finding in the vgic initialisation,
   where we would end-up destroying an uninitialised vgic with nasty
   consequences
 
 - Address an annoying case of pKVM failing to boot when some of the
   memblock regions that the host is faulting in are not page-aligned
 
 - Inject some sanity in the NV stage-2 walker by checking the limits
   against the advertised PA size, and correctly report the resulting
   faults
 
 PPC:
 
 - Fix a PPC e500 build error due to a long-standing wart that was exposed by
   the recent conversion to kmalloc_obj(); rip out all the ugliness that
   led to the wart.
 
 RISC-V:
 
 - Prevent speculative out-of-bounds access using array_index_nospec()
   in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access,
   float register access, and PMU counter access
 
 - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
   kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()
 
 - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei()
 
 - Fix off-by-one array access in SBI PMU
 
 - Skip THP support check during dirty logging
 
 - Fix error code returned for Smstateen and Ssaia ONE_REG interface
 
 - Check host Ssaia extension when creating AIA irqchip
 
 x86:
 
 - Fix cases where CPUID mitigation features were incorrectly marked as
   available whenever the kernel used scattered feature words for them.
 
 - Validate _all_ GVAs, rather than just the first GVA, when processing
   a range of GVAs for Hyper-V's TLB flush hypercalls.
 
 - Fix a brown paper bug in add_atomic_switch_msr().
 
 - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
   to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu.
 
 - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local
   APIC (and AVIC is enabled at the module level).
 
 - Update CR8 write interception when AVIC is (de)activated, to fix a bug
   where the guest can run in perpetuity with the CR8 intercept enabled.
 
 - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow
   L1 hypervisors to set FREEZE_IN_SMM.  This reverts (by default) an
   unintentional tightening of userspace ABI in 6.17, and provides some
   amount of backwards compatibility with hypervisors who want to freeze
   PMCs on VM-Entry.
 
 - Validate the VMCS/VMCB on return to a nested guest from SMM, because
   either userspace or the guest could stash invalid values in memory
   and trigger the processor's consistency checks.
 
 Generic:
 
 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
   unnecessary and confusing, triggered compiler warnings due to
   -Wflex-array-member-not-at-end.
 
 - Document that vcpu->mutex is take outside of kvm->slots_lock and
   kvm->slots_arch_lock, which is intentional and desirable despite being
   rather unintuitive.
 
 Selftests:
 
 - Increase the maximum number of NUMA nodes in the guest_memfd selftest to
   64 (from 8).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmmy6n8UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNX7ggAhWoCG+AE6P3yrp6Mi+nRYpeRGC3q
 q2IiZCn0UoCg6q3c2kgn7b/N2zLJs0Q8FZRCEp2Je+2uvptpmdp/BMEfiIU3n2/a
 61z+Dydbpyc+kUmhJzUJ+aotq5FnMNmAAmqSKoc19GhAx2OQhQmBP/JOZ0P/eqLE
 Is0qNBgr/Zms2ib3GFf/JT+urysL2mX47qe92HTzq1T9EEG0KleID0Jz8vYQI8Fr
 I5N9+lTxagQDi8ytwOM85Cn8K7wh+CQIgzmciHcVErpAvAWkrEjrPlQltpEz2C5B
 aWEcRgw46utEaAiwPQGJRW6TeoKUG0pUR3v6T90nBkjjJ1npm6gPVE6TBA==
 =7nQ9
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Quite a large pull request, partly due to skipping last week and
  therefore having material from ~all submaintainers in this one. About
  a fourth of it is a new selftest, and a couple more changes are large
  in number of files touched (fixing a -Wflex-array-member-not-at-end
  compiler warning) or lines changed (reformatting of a table in the API
  documentation, thanks rST).

  But who am I kidding---it's a lot of commits and there are a lot of
  bugs being fixed here, some of them on the nastier side like the
  RISC-V ones.

  ARM:

   - Correctly handle deactivation of interrupts that were activated
     from LRs. Since EOIcount only denotes deactivation of interrupts
     that are not present in an LR, start EOIcount deactivation walk
     *after* the last irq that made it into an LR

   - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM
     is already enabled -- not only thhis isn't possible (pKVM will
     reject the call), but it is also useless: this can only happen for
     a CPU that has already booted once, and the capability will not
     change

   - Fix a couple of low-severity bugs in our S2 fault handling path,
     affecting the recently introduced LS64 handling and the even more
     esoteric handling of hwpoison in a nested context

   - Address yet another syzkaller finding in the vgic initialisation,
     where we would end-up destroying an uninitialised vgic with nasty
     consequences

   - Address an annoying case of pKVM failing to boot when some of the
     memblock regions that the host is faulting in are not page-aligned

   - Inject some sanity in the NV stage-2 walker by checking the limits
     against the advertised PA size, and correctly report the resulting
     faults

  PPC:

   - Fix a PPC e500 build error due to a long-standing wart that was
     exposed by the recent conversion to kmalloc_obj(); rip out all the
     ugliness that led to the wart

  RISC-V:

   - Prevent speculative out-of-bounds access using array_index_nospec()
     in APLIC interrupt handling, ONE_REG regiser access, AIA CSR
     access, float register access, and PMU counter access

   - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
     kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()

   - Fix potential null pointer dereference in
     kvm_riscv_vcpu_aia_rmw_topei()

   - Fix off-by-one array access in SBI PMU

   - Skip THP support check during dirty logging

   - Fix error code returned for Smstateen and Ssaia ONE_REG interface

   - Check host Ssaia extension when creating AIA irqchip

  x86:

   - Fix cases where CPUID mitigation features were incorrectly marked
     as available whenever the kernel used scattered feature words for
     them

   - Validate _all_ GVAs, rather than just the first GVA, when
     processing a range of GVAs for Hyper-V's TLB flush hypercalls

   - Fix a brown paper bug in add_atomic_switch_msr()

   - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
     to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu

   - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel
     local APIC (and AVIC is enabled at the module level)

   - Update CR8 write interception when AVIC is (de)activated, to fix a
     bug where the guest can run in perpetuity with the CR8 intercept
     enabled

   - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to
     allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by
     default) an unintentional tightening of userspace ABI in 6.17, and
     provides some amount of backwards compatibility with hypervisors
     who want to freeze PMCs on VM-Entry

   - Validate the VMCS/VMCB on return to a nested guest from SMM,
     because either userspace or the guest could stash invalid values in
     memory and trigger the processor's consistency checks

  Generic:

   - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from
     being unnecessary and confusing, triggered compiler warnings due to
     -Wflex-array-member-not-at-end

   - Document that vcpu->mutex is take outside of kvm->slots_lock and
     kvm->slots_arch_lock, which is intentional and desirable despite
     being rather unintuitive

  Selftests:

   - Increase the maximum number of NUMA nodes in the guest_memfd
     selftest to 64 (from 8)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
  KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8
  Documentation: kvm: fix formatting of the quirks table
  KVM: x86: clarify leave_smm() return value
  selftests: kvm: add a test that VMX validates controls on RSM
  selftests: kvm: extract common functionality out of smm_test.c
  KVM: SVM: check validity of VMCB controls when returning from SMM
  KVM: VMX: check validity of VMCS controls when returning from SMM
  KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
  KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
  KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
  KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
  KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
  KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
  KVM: x86: synthesize CPUID bits only if CPU capability is set
  KVM: PPC: e500: Rip out "struct tlbe_ref"
  KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type
  KVM: selftests: Increase 'maxnode' for guest_memfd tests
  KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug
  KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail
  KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2()
  ...
2026-03-15 12:22:10 -07:00
Anshuman Khandual
be6e9dee0e arm64/mm: Directly use TTBRx_EL1_CnP
Replace all TTBR_CNP_BIT macro instances with TTBRx_EL1_CNP_BIT which
is a standard field from tools sysreg format. Drop the now redundant
custom macro TTBR_CNP_BIT. No functional change.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kvmarm@lists.linux.dev
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-03-14 16:12:27 +00:00
Anshuman Khandual
d989010bbe arm64/mm: Directly use TTBRx_EL1_ASID_MASK
Replace all TTBR_ASID_MASK macro instances with TTBRx_EL1_ASID_MASK which
is a standard field mask from tools sysreg format. Drop the now redundant
custom macro TTBR_ASID_MASK. No functional change.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kvmarm@lists.linux.dev
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-03-14 16:12:27 +00:00
Barry Song
2c92eff008 arm64: Provide dcache_by_myline_op_nosync helper
dcache_by_myline_op ensures completion of the data cache operations for a
region, while dcache_by_myline_op_nosync only issues them without waiting.
This enables deferred synchronization so completion for multiple regions
can be handled together later.

Cc: Leon Romanovsky <leon@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
Signed-off-by: Barry Song <baohua@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260228221216.59886-1-21cnbao@gmail.com
2026-03-13 23:46:32 +01:00